Except for trivial Software Under TestSUTs, testing all possible casesexhaustive testing is not practical because such testing often requires a massive/infinite number of test cases.
Consider the test cases for adding a string object to a Java: ArrayList
,
Python: list
collection:
Exhaustive testing of this operation can take many more test cases.
Program testing can be used to show the presence of bugs, but never to show their absence!
--Edsger Dijkstra
Every test case adds to the cost of testing. In some systems, a single test case can cost thousands of dollars e.g. on-field testing of flight-control software. Therefore, test cases need to be designed to make the best use of testing resources. In particular:
Testing should be effective i.e., it finds a high percentage of existing bugs e.g., a set of test cases that finds 60 defects is more effective than a set that finds only 30 defects in the same system.
Testing should be efficient i.e., it has a high rate of success (bugs found/test cases) a set of 20 test cases that finds 8 defects is more efficient than another set of 40 test cases that finds the same 8 defects.
For testing to be Efficient and EffectiveE&E, each new test you add should be targeting a potential fault that is not already targeted by existing test cases. There are test case design techniques that can help us improve the E&E of testing.
Exercises
Test cases for TriangleDetector
Given below is the sample output from a text-based program TriangleDetector
that determines whether the three input numbers make up the three sides of a valid triangle. List test cases you would use to test this software. Two sample test cases are given below.
C:\> java TriangleDetector
Enter side 1: 34
Enter side 2: 34
Enter side 3: 32
Can this be a triangle?: Yes
Enter side 1:
Sample test cases,
34, 34, 34: Yes
0, any valid, any valid: No
In addition to obvious test cases such as
We may also devise some interesting test cases such as the ones depicted below.
Note that their applicability depends on the context in which the software is operating.
0
, numbers formatted differently (e.g. 13F
), very large numbers (e.g. MAX_INT
), numbers with many decimal places, empty strings, ...The main point to note is how difficult it is to test exhaustively, even on a trivial system.
Exhaustive testing in Minesweeper
Explain why exhaustive testing is not practical using the example of testing the newGame()
operation in the Logic
class of a Minesweeper game.
Consider this sequence of test cases:
newGame()
and see if it works.newGame()
. Activate newGame()
again and see if it works.newGame()
three times consecutively and see if it works.newGame()
267 times consecutively and see if it works.Well, you get the idea. Exhaustive testing of newGame()
is not practical.
Statements about the E&E of testing
Improving the efficiency and effectiveness of test case design can,
(a)(b)(c)(d)(e)(f)
A positive test case is when the test is designed to produce an expected/valid behavior. On the other hand, a negative test case is designed to produce a behavior that indicates an invalid/unexpected situation, such as an error message.
Consider the testing of the method print(Integer i)
which prints the value of i
.
i == new Integer(50);
i == null;
Test case design can be of three types, based on how much of the SUT's internal details are considered when designing test cases:
Black-box (aka specification-based or responsibility-based) approach: test cases are designed exclusively based on the SUT’s specified external behavior.
White-box (aka glass-box or structured or implementation-based) approach: test cases are designed based on what is known about the SUT’s implementation, i.e. the code.
Gray-box approach: test case design uses some important information about the implementation. For example, if the implementation of a sort operation uses different algorithms to sort lists shorter than 1000 items and lists longer than 1000 items, more meaningful test cases can then be added to verify the correctness of both algorithms.
Black-box and white-box testing
Note: these videos are from the Udacity course Software Development Process by Georgia Tech
Consider the testing of the following operation.
isValidMonth(m)
: returns true
if m
(and int
) is in the range [1..12]
It is inefficient and impractical to test this method for all integer values [-MIN_INT to MAX_INT]
. Fortunately, there is no need to test all possible input values. For example, if the input value 233
fails to produce the correct result, the input 234
is likely to fail too; there is no need to test both.
In general, most SUTs do not treat each input in a unique way. Instead, they process all possible inputs in a small number of distinct ways. That means a range of inputs is treated the same way inside the SUT. Equivalence partitioning (EP) is a test case design technique that uses the above observation to improve the E&E of testing.
Equivalence partition (aka equivalence class): A group of test inputs that are likely to be processed by the SUT in the same way.
By dividing possible inputs into equivalence partitions you can,
Equivalence partitions (EPs) are usually derived from the specifications of the SUT.
These could be EPs for the
true
(produces false
)true
true
(produces false
)isValidMonth
isValidMonth(m)
: returns true
if m
(and int
) is in the range [1..12]
When the SUT has multiple inputs, you should identify EPs for each input.
Consider the method duplicate(String s, int n): String
which returns a String
that contains s
repeated n
times.
Example EPs for s
:
Example EPs for n
:
0
An EP may not have adjacent values.
Consider the method isPrime(int i): boolean
that returns true if i
is a prime number.
EPs for i
:
Some inputs have only a small number of possible values and a potentially unique behavior for each value. In those cases, you have to consider each value as a partition by itself.
Consider the method showStatusMessage(GameStatus s): String
that returns a unique String
for each of the possible values of s (GameStatus
is an enum
). In this case, each possible value of s
will have to be considered as a partition.
Note that the EP technique is merely a heuristic and not an exact science, especially when applied manually (as opposed to using an automated program analysis tool to derive EPs). The partitions derived depend on how one ‘speculates’ the SUT to behave internally. Applying EP under a glass-box or gray-box approach can yield more precise partitions.
Consider the EPs given above for the method isValidMonth
. A different tester might use these EPs instead:
true
false
Some more examples:
Specification | Equivalence partitions |
---|---|
|
[ |
|
[ |
Exercises
EPs for isValidName
method
Consider this SUT:
isValidName(String s): boolean
Description: returns true if s
is not null
and not longer than 50 characters.
A. Which one of these is least likely to be an equivalence partition for the parameter s
of the isValidName
method given above?
B. If you had to choose 3 test cases from the 4 given below, which one will you leave out based on the EP technique?
A. (d)
Explanation: The description does not mention anything about the content of the string. Therefore, the method is unlikely to behave differently for strings consisting of numbers.
B. (a) or (c)
Explanation: both belong to the same EP.
When deciding EPs of OOP methods, you need to identify the EPs of all data participants that can potentially influence the behaviour of the method, such as,
Consider this method in the DataStack
class:
push(Object o): boolean
o
to the top of the stack if the stack is not full.true
if the push operation was a success.MutabilityException
if the global flag FREEZE==true
.InvalidValueException
if o
is null.EPs:
DataStack
object: [full] [not full]o
: [null] [not null]FREEZE
: [true][false] Consider a simple Minesweeper app. What are the EPs for the newGame()
method of the Logic
component?
As newGame()
does not have any parameters, the only obvious participant is the Logic
object itself.
Note that if the glass-box or the grey-box approach is used, other associated objects that are involved in the method might also be included as participants. For example, the Minefield
object can be considered as another participant of the newGame()
method. Here, the black-box approach is assumed.
Next, let us identify equivalence partitions for each participant. Will the newGame()
method behave differently for different Logic
objects? If yes, how will it differ? In this case, yes, it might behave differently based on the game state. Therefore, the equivalence partitions are:
PRE_GAME
: before the game starts, minefield does not exist yetREADY
: a new minefield has been created and the app is waiting for the player’s first moveIN_PLAY
: the current minefield is already in useWON
, LOST
: let us assume that newGame()
behaves the same way for these two values Consider the Logic
component of the Minesweeper application. What are the EPs for the markCellAt(int x, int y)
method? The partitions in bold represent valid inputs.
Logic
: PRE_GAME, READY, IN_PLAY, WON, LOSTx
: [MIN_INT..-1] [0..(W-1)] [W..MAX_INT] (assuming a minefield size of WxH)y
: [MIN_INT..-1] [0..(H-1)] [H..MAX_INT]Cell
at (x,y)
: HIDDEN, MARKED, CLEAREDBoundary Value Analysis (BVA) is a test case design heuristic that is based on the observation that bugs often result from incorrect handling of boundaries of equivalence partitions. This is not surprising, as the end points of boundaries are often used in branching instructions, etc., where the programmer can make mistakes.
The markCellAt(int x, int y)
operation could contain code such as if (x > 0 && x <= (W-1))
which involves the boundaries of x’s equivalence partitions.
BVA suggests that when picking test inputs from an equivalence partition, values near boundaries (i.e. boundary values) are more likely to find bugs.
Boundary values are sometimes called corner cases.
Exercises
What BVA recommends
Boundary value analysis recommends testing only values that reside on the equivalence class boundary.
False
Explanation: It does not recommend testing only those values on the boundary. It merely suggests that values on and around a boundary are more likely to cause errors.
Typically, you should choose three values around the boundary to test: one value from the boundary, one value just below the boundary, and one value just above the boundary. The number of values to pick depends on other factors, such as the cost of each test case.
Some examples:
Equivalence partition | Some possible test values (boundaries are in bold) |
---|---|
[1-12] |
0,1,2, 11,12,13 |
[MIN_INT, 0] |
MIN_INT, MIN_INT+1, -1, 0 , 1 |
[any non-null String] |
Empty String, a String of maximum possible length |
[prime numbers] |
No specific boundary |
[non-empty Stack] |
Stack with: no elements, one element, two elements, no empty spaces, only one empty space |
An SUT can take multiple inputs. You can select values for each input (using equivalence partitioning, boundary value analysis, or some other technique).
An SUT that takes multiple inputs and some values chosen for each input:
calculateGrade(participation, projectGrade, isAbsent, examScore)
Input | Valid values to test | Invalid values to test |
---|---|---|
participation | 0, 1, 19, 20 | 21, 22 |
projectGrade | A, B, C, D, F | |
isAbsent | true, false | |
examScore | 0, 1, 69, 70, | 71, 72 |
Testing all possible combinations is effective but not efficient. If you test all possible combinations for the above example, you need to test 6x5x2x6=360 cases. Doing so has a higher chance of discovering bugs (i.e. effective) but the number of test cases will be too high (i.e. not efficient). Therefore, you need smarter ways to combine test inputs that are both effective and efficient.
Given below are some basic strategies for generating a set of test cases by combining multiple test inputs.
Let's assume the SUT has the following three inputs and you have selected the given values for testing:
SUT: foo(char p1, int p2, boolean p3)
Values to test:
Input | Values |
---|---|
p1 | a, b, c |
p2 | 1, 2, 3 |
p3 | T, F |
The all combinations strategy generates test cases for each unique combination of test inputs.
This strategy generates 3x3x2=18 test cases.
Test Case | p1 | p2 | p3 |
---|---|---|---|
1 | a | 1 | T |
2 | a | 1 | F |
3 | a | 2 | T |
... | ... | ... | ... |
18 | c | 3 | F |
The at least once strategy includes each test input at least once.
This strategy generates 3 test cases.
Test Case | p1 | p2 | p3 |
---|---|---|---|
1 | a | 1 | T |
2 | b | 2 | F |
3 | c | 3 | VV/IV |
VV/IV = Any Valid Value / Any Invalid Value
The all pairs strategy creates test cases so that for any given pair of inputs, all combinations between them are tested. It is based on the observation that a bug is rarely the result of more than two interacting factors. The resulting number of test cases is lower than the all combinations strategy, but higher than the at least once approach.
This strategy generates 9 test cases:
See steps
Let's first consider inputs p1 and p2:
Input | Values |
---|---|
p1 | a, b, c |
p2 | 1, 2, 3 |
These values can generate (a,1)(a,2)(a,3)(b,1)(b,2),...3x3=9 combinations, and the test cases should cover all of them.
Next, let's consider p1 and p3.
Input | Values |
---|---|
p1 | a, b, c |
p3 | T, F |
These values can generate (a,T)(a,F)(b,T)(b,F),...3x2=6 combinations, and the test cases should cover all of them.
Similarly, inputs p2 and p3 generate another 6 combinations.
The 9 test cases given below cover all the above 9+6+6 combinations.
Test Case | p1 | p2 | p3 |
---|---|---|---|
1 | a | 1 | T |
2 | a | 2 | T |
3 | a | 3 | F |
4 | b | 1 | F |
5 | b | 2 | T |
6 | b | 3 | F |
7 | c | 1 | T |
8 | c | 2 | F |
9 | c | 3 | T |
A variation of this strategy is to test all pairs of inputs but only for inputs that could influence each other.
Testing all pairs between p1 and p3 only while ensuring all p2 values are tested at least once:
Test Case | p1 | p2 | p3 |
---|---|---|---|
1 | a | 1 | T |
2 | a | 2 | F |
3 | b | 3 | T |
4 | b | VV/IV | F |
5 | c | VV/IV | T |
6 | c | VV/IV | F |
The random strategy generates test cases using one of the other strategies and then picks a subset randomly (presumably because the original set of test cases is too big).
There are other strategies that can be used too.
Consider the following scenario.
SUT: printLabel(String fruitName, int unitPrice)
Selected values for fruitName
(invalid values are underlined):
Values | Explanation |
---|---|
Apple | Label format is round |
Banana | Label format is oval |
Cherry | Label format is square |
Dog | Not a valid fruit |
Selected values for unitPrice
:
Values | Explanation |
---|---|
1 | Only one digit |
20 | Two digits |
0 | Invalid because 0 is not a valid price |
-1 | Invalid because negative prices are not allowed |
Suppose these are the test cases being considered.
Case | fruitName | unitPrice | Expected |
---|---|---|---|
1 | Apple | 1 | Print label |
2 | Banana | 20 | Print label |
3 | Cherry | 0 | Error message “invalid price” |
4 | Dog | -1 | Error message “invalid fruit" |
It looks like the test cases were created using the at least once strategy. After running these tests, can you confirm that the square-format label printing is done correctly?
Cherry
-- the only input that can produce a square-format label -- is in a negative test case which produces an error message instead of a label. If there is a bug in the code that prints labels in square-format, these tests cases will not trigger that bug.In this case, a useful heuristic to apply is each valid input must appear at least once in a positive test case. Cherry
is a valid test input and you must ensure that it appears at least once in a positive test case. Here are the updated test cases after applying that heuristic.
Case | fruitName | unitPrice | Expected |
---|---|---|---|
1 | Apple | 1 | Print round label |
2 | Banana | 20 | Print oval label |
2.1 | Cherry | VV | Print square label |
3 | VV | 0 | Error message “invalid price” |
4 | Dog | -1 | Error message “invalid fruit" |
VV/IV = Any Invalid or Valid Value VV = Any Valid Value
Consider the
Case | fruitName | unitPrice | Expected |
---|---|---|---|
1 | Apple | 1 | Print round label |
2 | Banana | 20 | Print oval label |
2.1 | Cherry | VV | Print square label |
3 | VV | 0 | Error message “invalid price” |
4 | Dog | -1 | Error message “invalid fruit" |
VV/IV = Any Invalid or Valid Value VV = Any Valid Value
After running these test cases, can you be sure that the error message “invalid price” is shown for negative prices?
-1
-- the only input that is a negative price -– is in a test case that produces the error message “invalid fruit”.In this case, a useful heuristic to apply is no more than one invalid input in a test case. After applying that, you get the following test cases.
Case | fruitName | unitPrice | Expected |
---|---|---|---|
1 | Apple | 1 | Print round label |
2 | Banana | 20 | Print oval label |
2.1 | Cherry | VV | Print square label |
3 | VV | 0 | Error message “invalid price” |
4 | VV | -1 | Error message “invalid price" |
4.1 | Dog | VV | Error message “invalid fruit" |
VV/IV = Any Invalid or Valid Value VV = Any Valid Value
Exercises
Can define test cases precisely
Applying the heuristics covered so far, we can determine the precise number of test cases required to test any given SUT effectively.
False
Explanation: These heuristics are, well, heuristics only. They will help you to make better decisions about test case design. However, they are speculative in nature (especially, when testing in black-box fashion) and cannot give you the precise number of test cases.
Consider the calculateGrade scenario given below:
calculateGrade(participation, projectGrade, isAbsent, examScore)
To get the first cut of test cases, let’s apply the at least once strategy.
Test cases for calculateGrade V1
Case No. | participation | projectGrade | isAbsent | examScore | Expected |
---|---|---|---|---|---|
1 | 0 | A | true | 0 | ... |
2 | 1 | B | false | 1 | ... |
3 | 19 | C | VV/IV | 69 | ... |
4 | 20 | D | VV/IV | 70 | ... |
5 | 21 | F | VV/IV | 71 | Err Msg |
6 | 22 | VV/IV | VV/IV | 72 | Err Msg |
VV/IV = Any Valid or Invalid Value, Err Msg = Error Message
Next, let’s apply the each valid input at least once in a positive test case heuristic. Test case 5 has a valid value for projectGrade=F
that doesn't appear in any other positive test case. Let's replace test case 5 with 5.1 and 5.2 to rectify that.
Test cases for calculateGrade V2
Case No. | participation | projectGrade | isAbsent | examScore | Expected |
---|---|---|---|---|---|
1 | 0 | A | true | 0 | ... |
2 | 1 | B | false | 1 | ... |
3 | 19 | C | VV | 69 | ... |
4 | 20 | D | VV | 70 | ... |
5.1 | VV | F | VV | VV | ... |
5.2 | 21 | VV/IV | VV/IV | 71 | Err Msg |
6 | 22 | VV/IV | VV/IV | 72 | Err Msg |
VV = Any Valid Value VV/IV = Any Valid or Invalid Value
Next, you have to apply the no more than one invalid input in a test case heuristic. Test cases 5.2 and 6 don't follow that heuristic. Let's rectify the situation as follows:
Test cases for calculateGrade V3
Case No. | participation | projectGrade | isAbsent | examScore | Expected |
---|---|---|---|---|---|
1 | 0 | A | true | 0 | ... |
2 | 1 | B | false | 1 | ... |
3 | 19 | C | VV | 69 | ... |
4 | 20 | D | VV | 70 | ... |
5.1 | VV | F | VV | VV | ... |
5.2 | 21 | VV | VV | VV | Err Msg |
5.3 | 22 | VV | VV | VV | Err Msg |
6.1 | VV | VV | VV | 71 | Err Msg |
6.2 | VV | VV | VV | 72 | Err Msg |
Next, you can assume that there is a dependency between the inputs examScore
and isAbsent
such that an absent student can only have examScore=0
. To cater for the hidden invalid case arising from this, you can add a new test case where isAbsent=true
and examScore!=0
. In addition, test cases 3-6.2 should have isAbsent=false
so that the input remains valid.
Test cases for calculateGrade V4
Case No. | participation | projectGrade | isAbsent | examScore | Expected |
---|---|---|---|---|---|
1 | 0 | A | true | 0 | ... |
2 | 1 | B | false | 1 | ... |
3 | 19 | C | false | 69 | ... |
4 | 20 | D | false | 70 | ... |
5.1 | VV | F | false | VV | ... |
5.2 | 21 | VV | false | VV | Err Msg |
5.3 | 22 | VV | false | VV | Err Msg |
6.1 | VV | VV | false | 71 | Err Msg |
6.2 | VV | VV | false | 72 | Err Msg |
7 | VV | VV | true | !=0 | Err Msg |
Exercises
Statements about test input combinations
Which one of these contradicts the heuristics recommended when creating test cases with multiple inputs?
a
Explanation: If you test all invalid test inputs together, you will not know if each of the invalid inputs is handled correctly by the SUT. This is because most SUTs return an error message upon encountering the first invalid input.
Combine test inputs for the consume
method
Apply heuristics for combining multiple test inputs to improve the E&E of the following test cases, assuming all 6 values in the table need to be tested. Underlines indicate invalid values. Point out where the heuristics are contradicted and how to improve the test cases.
SUT: consume(food, drink)
Test case | food | drink |
---|---|---|
TC1 | bread | water |
TC2 | rice | lava |
TC3 | rock | acid |