Metrics for measuring successful refactoring

Question

Are there objective metrics for measuring code refactoring?

Would running findbugs, CRAP or checkstyle before and after a refactoring be a useful way of checking whether the code was actually improved rather than just changed?

I'm looking for metrics that can be can determined and tested for to help improve the code review process.

While you're at it, could you define "good design" objectively, also? It would help if there was an objective score for "elegant", "sensible" and "coherent". — S.Lott, Apr 28 '09 at 14:27
Just completed my answer and added other criteria to measure the value of refactoring — VonC, Apr 28 '09 at 17:20

score 5 · Answer 1 · answered Apr 28 '09 at 14:27

5

Number of failed unittests must be less or equal to zero :)

answered Apr 28 '09 at 14:27

Alexander Artemenko

18,838
8
38
36

4

I love it when I have a negative number of failed tests. – Michael Myers Apr 28 '09 at 14:27
1

... That means I'm passing even the ones I haven't written yet, I guess. – Michael Myers Apr 28 '09 at 14:34

score 5 · Answer 2 · answered Apr 28 '09 at 14:31

5

Depending on your specific goals, metrics like cyclomatic complexity can provide an indicator for success. In the end every metric can be subverted, since they cannot capture intelligence and/or common sense.

A healthy code review process might do wonders though.

answered Apr 28 '09 at 14:31

David Schmitt

56,693
26
120
165

Its my hope that we could capture trends and use them to improve our code review process. I've seen a lot of loops being re-written and re-re-written without improving readability, speed or anything we can objectively measure. – sal Apr 28 '09 at 17:22

score 4 · Accepted Answer · edited May 23 '17 at 12:30

Would running findbugs, CRAP or checkstyle before and after a refactoring be a useful way of checking if the code was actually improved rather than just changed?

Actually, as I have detailed in the question "What is the fascination with code metrics?", the trend of any metrics (findbugs, CRAP, whatever) is the true added value of metrics.
It (the evolution of metrics) allows you to prioritize the main fixing action you really need to make to your code (as opposed to blindly try to respect every metric out there)

A tool like Sonar can, in this domain (monitoring of metrics) can be very useful.

Sal adds in the comments:

The real issue is on checking what code changes add value rather than just adding change

For that, test coverage is very important, because only tests (unit tests, but also larger "functional tests") will give you a valid answer.
But refactoring should not be done without a clear objective anyway. To do it only because it would be "more elegant" or even "easier to maintain" may be not in itself a good reason enough to change the code.
There should be other measures like some bugs which will be fixed in the process, or some new functions which will be implemented much faster as a result of the "refactored" code.
In short, the added value of a refactoring is not solely measured with metrics, but should also be evaluated against objectives and/or milestones.

I agree about the trend line being the value. If code test coverage is increasing and the number of broken builds in a month is zero and the number of warnings or findbugs hits is decreasing; clearly things are ok. The real issue is on checking what code changes add value rather than just adding change. — sal, Apr 28 '09 at 15:40

score 2 · Answer 4 · answered Apr 28 '09 at 14:43

2

Code size. Anything that reduces it without breaking functionality is an improvement in my book (removing comments and shortening identifiers would not count, of course)

answered Apr 28 '09 at 14:43

Michael Borgwardt

335,521
76
467
706

1

Smaller code size is good, but clarity and understandability is better - even at the expense of a few lines of code. – Alister Bulman Apr 28 '09 at 17:49
I actually think that *reducing* code size has a much better chance to improve clarity and understandability than increasing it. – Michael Borgwardt Apr 28 '09 at 20:29

Dave Schweisguth · Answer 5 · 2016-09-04T14:35:01.053

Yes, several measures of code quality can tell you if a refactoring improves the quality of your code.

Duplication. In general, less duplication is better. However, duplication finders that I've used sometimes identify duplicated blocks that are merely structurally similar but have nothing to do with one another semantically and so should not be deduplicated. Be prepared to suppress or ignore those false positives.
Code coverage. This is by far my favorite metric in general, but it's only indirectly related to refactoring. You can and should raise low coverage by writing more tests, but that's not refactoring. However, you should monitor code coverage while refactoring (as with any other change to the code) to be sure it doesn't go down. Refactoring can improve code coverage by removing untested copies of duplicated code.
Size metrics such as lines of code, total and per class, method, function, etc. A Jeff Atwood post lists a few more. If a refactoring reduces lines of code while maintaining clarity, quality has increased. Unusually long classes, methods, etc. are likely to be good targets for refactoring. Be prepared to use judgement in deciding when a class, method, etc. really does need to be longer than usual to get its job done.
Complexity metrics such as cyclomatic complexity. Refactoring should try to decrease complexity and not increase it without a well thought out reason. Methods/functions with high complexity are good refactoring targets.
Robert C. Martin's package-design metrics: Abstractness, Instability and Distance from the abstractness-instability main sequence. He described them in his article on Stability in C++ Report and his book Agile Software Development, Principles, Patterns, and Practices. JDepend is one tool that measures them. Refactoring that improves package design should minimize D.

I have used and continue to use all of these to monitor the quality of my software projects.

score 1 · Answer 6 · answered Apr 28 '09 at 14:45

1

No matter what you do just make sure this metric thing is not used for evaluating programmer performance, deciding promotion or anything like that.

answered Apr 28 '09 at 14:45

Journeyman Programmer

1,326
1
8
15

score 1 · Answer 7 · answered Apr 28 '09 at 14:52

I would stay away from metrics for measuring refactoring success (aside from #unit test failures == 0). Instead, I'd go with code reviews.

It doesn't take much work to find obvious targets for refactoring: "Haven't I seen that exact same code before?" For the rest, you should create certain guidelines around what not to do, and make sure your developers know about them. Then they'll be able to find places where the other developer didn't follow the standards.

For higher-level refactorings, the more senior developers and architects will need to look at code in terms of where they see the code base moving. For instance, it may be perfectly reasonable for the code to have a static structure today; but if they know or suspect that a more dynamic structure will be required, they may suggest using a factory method instead of using new, or extracting an interface from a class because they know there will be another implementation in the next release.

None of these things would benefit from metrics.

Actually, I'm interesting in gathering this data to improve the quality of the code reviews. I suspect that there is too much change of questionable value — sal, Apr 28 '09 at 15:44
There may be too much change to be of _measurable_ value, but that will be due to the fact that not everything can be measured. How do you measure how easy it is to understand the code? How do you measure how flexible the code is? There are metrics that purport to measure these things, but I belive they're all wrong, by definition. — John Saunders, Apr 28 '09 at 18:44

score 0 · Answer 8 · answered Mar 20 '16 at 20:30

I see the question from the smell point of view. Smells could be treated as indicators of quality problems and hence, the volume of identified smell instances could reveal the software code quality.

Smells can be classified based on their granularity and their potential impact. For instance, there could be implementation smells, design smells, and architectural smells. You need to identify smells at all granularity levels before and after to show the gain from a refactoring exercise. In fact, refactoring could be guided by identified smells.

Examples:

Implementation smells: Long method, Complex conditional, Missing default case, Complex method, Long statement, and Magic numbers.
Design smells: Multifaceted abstraction, Missing abstraction, Deficient encapsulation, Unexploited encapsulation, Hub-like modularization, Cyclically-dependent modularization, Wide hierarchy, and Broken hierarchy. More information about design smells can be found in this book.
Architecture smells: Missing layer, Cyclical dependency in packages, Violated layer, Ambiguous Interfaces, and Scattered Parasitic Functionality. Find more information about architecture smells here.

score 0 · Answer 9 · answered May 02 '09 at 00:26

There are two outcomes you want from refactoring. You want to team to maintain sustainable pace and you want zero defects in production.

Refactoring takes place on the code and the unit test build during Test Driven Development (TDD). Refactoring can be small and completed on a piece of code necessary to finish a story card. Or, refactoring can be large and required a technical story card to address technical debt. The story card can be placed on the product backlog and prioritized with the business partner.

Furthermore, as you write unit tests as you do TDD, you will continue to refactor the test as the code is developed.

Remember, in agile, the management practices as defined in SCRUM will provide you collaboration and ensure you understand the needs of the business partner and the code you have developed meets the business need. However, without proper engineering practices (as defined by Extreme Programming) you project will loss sustainable pace. Many agile project that did not employ engineering practices were in need of rescue. On the other hand, a team that was disciplined and employed both management and engineering agile practice were able to sustain delivery indefinitely.

So, if your code is released with many defects or your team looses velocity, refactoring, and other engineering practices (TDD, paring, automated testing, simple evolutionary design etc), are not being properly employed.

Metrics for measuring successful refactoring

9 Answers9