How do I analyze "unit test" results as part of a software project?

Question

Many software projects have "unit tests", which vary in definition and implementation, but are generally agreed to specific tests of a small "unit" of software. For example, an "addition" function

function add(x, y) {
    x + y
}

may have a unit test like this (where assert is some generic function checking that the statement is true).

function test_result_is_positive_when_operands_are_positive() {
    assert(add(13, 42) > 0)
}

There is usually an entire "suite" of tests, testing many of the properties and examples of the functions-under-test.

The result of any test is pass or fail, depending on if all assertions have passed or if any have failed (although it is recommended to have only a single assertion per test). The result of any suite is also pass or fail, depending on if all tests have passed or if any have failed. Test tools tend to report individual test and suite pass/fail results and ratios (70 tests passed out of 100).

Here comes the problem... I've seen teams take pass/fail data and turn out reports like this (over time or by test run):

 Project | Tests | Pass | Fail |  %
---------|-------|------|------|-----
 A       |   102 |   91 |   11 | 89%
 B       |    27 |   26 |    1 | 96%
 C       |    39 |   25 |   14 | 64%

 Average success: 83%

I'm pretty certain test data are discrete and that we should analyze them differently. I'm pretty certain that % or average is useless (the distribution is not normal; are tests, suites, and test runs independent?).

What kinds of analysis are appropriate on this data and how could I use that to make an impact?

I hope the question provides enough context. We should start with better questions and a coherent hypothesis. Like, what do we want to know or influence, why look at unit test results, and how should we look at those results? But, we have a problem in software development and management of skipping directly to action ("something is better than nothing").

Simply speaking: if anything fails, then it has to be fixed as soon as possible. If you have a software team which has 17% functionality in a bad shape, then there is something wrong. The only conclusion would be to inspect the tests if they still make sense and fix the functionality for those that do. Starting at the core functionality towards the nice-to-have features. — Karel Macek, Jun 12 '17 at 16:51
@KarelMacek I agree with the suggested practice and I'm a seasoned developer myself... I'm more looking for info on how to stop bad analysis and provide more appropriate analysis of this kind of data. Once someone starts sharing bad data/analysis with management (especially) it becomes hard to shut down. — Anthony Mastrean, Jun 12 '17 at 18:56
I a professional software developer with a background in statistics, and I see nothing statistically wrong with this "analysis" of the data. It might not be answering the right questions from a development best practices perspective, but it's certainly statistically okay to compute means of discrete count variables. — David Wright, Jun 12 '17 at 19:08

How do I analyze "unit test" results as part of a software project?

0 Answers0