Item 12: Use PIT mutation testing

By Ashley Waldron

PIT is a framework that messes with the code under test by modifying the byte code, and then runs your unit tests against that mutated code. If the tests still pass after a mutation then it flags is as a failure and generates a report after all PIT tests are run. To see it in action let’s add the dependency to the POM under the build → plugins section:

Java

...........           
            <!-- PITEST mvn clean test-compile org.pitest:pitest-maven:mutationCoverage   -->
            <plugin>
                <groupId>org.pitest</groupId>
                <artifactId>pitest-maven</artifactId>
                <version>${pitest.plugin.version}</version>
                <dependencies>
                    <dependency>
                        <groupId>org.pitest</groupId>
                        <artifactId>pitest-junit5-plugin</artifactId>
                        <version>${pitest.junit5.plugin.version}</version>
                    </dependency>
                </dependencies>
                <configuration>
                    <withHistory>false</withHistory>
                    <mutationThreshold>100</mutationThreshold>
                    <targetClasses>
                        <param>com.effective.unit.tests.service.item8.*</param>
                    </targetClasses>
                    <targetTests>
                        <param>com.effective.unit.tests.service.item8.*</param>
                    </targetTests>
                </configuration>
            </plugin>
        </plugins>
...........

Note: As you can see from the updated POM entry, we’re only going to run the PIT tests against the item8 test class because that class has 100% code coverage and all tests in it pass. Our other test classes have failing tests because they were for example purposes and PIT will not generate mutated classes if any tests in your test run fail. So we’ll stick with Item8UserServiceTest for this item.

We can now run the PIT tests using the following maven command:

mvn clean test-compile org.pitest:pitest-maven:mutationCoverage

This will give the following output:

10:57:14 PIT >> INFO : Created 1 mutation test units in pre scan
10:57:15 PIT >> INFO : Sending 2 test classes to minion
10:57:15 PIT >> INFO : Sent tests to minion
10:57:15 PIT >> INFO : MINION : OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
|10:57:16 PIT >> INFO : Calculated coverage in 1 seconds.
10:57:16 PIT >> INFO : Created 1 mutation test units
/10:57:19 PIT >> INFO : Completed in 4 seconds
================================================================================
- Mutators
================================================================================
> org.pitest.mutationtest.engine.gregor.mutators.ConditionalsBoundaryMutator
>> Generated 1 Killed 0 (0%)
> KILLED 0 SURVIVED 1 TIMED_OUT 0 NON_VIABLE 0
> MEMORY_ERROR 0 NOT_STARTED 0 STARTED 0 RUN_ERROR 0
> NO_COVERAGE 0
--------------------------------------------------------------------------------
> org.pitest.mutationtest.engine.gregor.mutators.VoidMethodCallMutator
>> Generated 11 Killed 11 (100%)
> KILLED 11 SURVIVED 0 TIMED_OUT 0 NON_VIABLE 0
> MEMORY_ERROR 0 NOT_STARTED 0 STARTED 0 RUN_ERROR 0
> NO_COVERAGE 0
--------------------------------------------------------------------------------
> org.pitest.mutationtest.engine.gregor.mutators.returns.NullReturnValsMutator
>> Generated 1 Killed 1 (100%)
> KILLED 1 SURVIVED 0 TIMED_OUT 0 NON_VIABLE 0
> MEMORY_ERROR 0 NOT_STARTED 0 STARTED 0 RUN_ERROR 0
> NO_COVERAGE 0
--------------------------------------------------------------------------------
> org.pitest.mutationtest.engine.gregor.mutators.NegateConditionalsMutator
>> Generated 3 Killed 3 (100%)
> KILLED 3 SURVIVED 0 TIMED_OUT 0 NON_VIABLE 0
> MEMORY_ERROR 0 NOT_STARTED 0 STARTED 0 RUN_ERROR 0
> NO_COVERAGE 0
--------------------------------------------------------------------------------
================================================================================
- Timings
================================================================================
> pre-scan for mutations : < 1 second
> scan classpath : < 1 second
> coverage and dependency analysis : 1 seconds
> build mutation tests : < 1 second
> run mutation analysis : 2 seconds
--------------------------------------------------------------------------------
> Total : 4 seconds
--------------------------------------------------------------------------------
================================================================================
- Statistics
================================================================================
>> Line Coverage (for mutated classes only): 20/20 (100%)
>> Generated 16 mutations Killed 15 (94%)
>> Mutations with no coverage 0. Test strength 94%
>> Ran 18 tests (1.12 tests per mutation)
Enhanced functionality available at https://www.arcmutate.com/
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.104 s
[INFO] Finished at: 2024-09-19T10:57:19+01:00
[INFO] ------------------------------------------------------------------------

[ERROR] Failed to execute goal org.pitest:pitest-maven:1.15.8:mutationCoverage (default-cli) on project examples: Mutation score of 94 is below threshold of 100 -> [Help 1]

[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:

[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureExceptionttp://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

If you look at each section you can see what types of mutations that PIT is making on the Item8UserService class. You can see how many of each type are made, i.e. Generated 3 Killed 3 (100%). This means that PIT made 3 of these types of mutations and all of them killed (failed) the unit test. In PIT terminology ‘Killed’ is good as your tests should ‘fail’ when these types of mutations are made to the code. When your tests don’t fail after mutations, this is listed as ‘Survived’, meaning that the unit test survived this change (it still passed) which is bad. So the ‘Generated 1 Killed 0 (0%)‘ under the conditionalsBoundaryMutator section means that PIT made 1 boundary condition change and 0 tests were killed, giving 0% PIT coverage for that mutation section (which is bad of course). The rest of the sections show that all of their mutations ‘Killed’ the test and so give 100% PIT coverage for each (which is good).

PIT also generates a more readable test report which you can view by opening ……..\target\pit-reports\index.html and that will look like this:

Similar to code coverage tools, you can see that line 46 is marked in red. Indicating that there’s a problem with it. And if you hover your mouse over the number 3 link on the left you’ll see the following pop-up.

PIT mutation reports are currently a bit cryptic and can be awkward to read but what the red line (and its pop-up) are telling us here, is that PIT made 3 separate mutations associated with the if statement on line 46 and only 2 of the mutations [where the conditionals were negated] failed the tests (were killed). But 1 mutation [the boundary condition mutation] left the tests still passing (survived).

Depending on the code, it can sometimes be a bit of a pain to figure out what boundary condition was changed but in this case what PIT did was it changed the createUserProfileRequest.getRegion()< 30 condition to createUserProfileRequest.getRegion() < 29 (reducing the boundary of the less-than condition by 1) and expected that some test would fail. But the fact that the tests passed flags that they might not be as robust as we thought. The reason for this is that if we’re saying that over 30 characters for the region is invalid, then that means that we should be testing with lengths up to that exact length cutoff. We should be testing with regions 30 characters in length. But if we look inside TestDataCreator we can see that the TEST_REGION constant (that we’re using to set the region field on our test createUserProfileRequest object) is set to “US-EAST” which is only 7 characters long. This means that we’re missing out on the range 8 – 30 for positive validation of the region. And remember, if the logic of a method changes then some test should fail. But in this case PIT changed the validation logic of the region from disallowing regions greater than 30 characters, to disallowing regions greater than 29 characters and nothing failed to signal this change to the method behavior.

We can fix this testing weakness by setting the TestDataCreator.TEST_REGION constant to a value of length 30 characters. So if we make the following change:

TestDataCreator

public static final String TEST_REGION = "US-EAST - THIS IS 30 CHAR LONG";

And run the PIT tests again then the build is successful and the report is all green. PIT mutated that code to createUserProfileRequest.getRegion() < 29 and since our region is now 30 characters our happy path tests (at least) fail and PIT marks them as successfully killed.

The other mutations that PIT made can be viewed by hovering over the underlined number at the start of each line of code in the report. The following is a list of the mutations that PIT made in the above code:

Line 24: Removed this line (cancelling the call to validate())
Line 26-29: Removed each of these lines (which removes the setting of the fields in the UserEntity object)
Line 32: Removed this line (cancelling the call to eventBroadcaster.broadcast())
Line 33: Negated the conditional by changing if(createUserProfileRequest.shouldReceiveMarketingEmails()) to if(!createUserProfileRequest.shouldReceiveMarketingEmails())
Line 34: Removed this line (cancelling the call to marketingEmailService.shouldReceiveMarketingEmails())
Line 38-42: Removed each of these lines (which removes the setting of the fields in the createUserProfileResponse object)
Line 46: Three Changes:
- Negated the first OR conditional by changing createUserProfileRequest.getRegion() == null to createUserProfileRequest.getRegion() != null
- Negated the second OR conditional by changing createUserProfileRequest.getRegion().length() > 30 to createUserProfileRequest.getRegion().length() < 30
- (As discussed above) Changed the boundary of the second OR conditional by changing createUserProfileRequest.getRegion().length() > 30 to createUserProfileRequest.getRegion().length() > 29

Going forward all applications should start to use PIT testing. But introducing it all at once would be unrealistic for existing applications. Instead, it can be introduced gradually by initially adding it with a threshold set to a low number like 5%. Then, over time you can increase the threshold bit by bit, fixing the failures along the way until you get to a threshold of 100% like we have in the POM above.

PIT is a brilliant tool for preventing the problems associated with coding to coverage from item 1. It makes it very difficult for developers to write low quality unit test with weak asserts/verifications and should be considered an indispensable tool to keep the quality of tests in your codebase (and thus the quality of your codebase) high.