Let’s get back to the car manufacturing example mentioned in our Measuring code quality article. Controlling car pieces’ quality is a very specific process. This quality can be measured by any car manufacturer the same way across all factories thanks to a clear scale, an homogeneous method of measurement and legally constrained criterias for the industry. Quality metrics are for most industries usually surrounded by a legal standard to match (or excel) and it enables consumers to make an informed choice by comparing based on the security that quality does not go below the line.
The software industry is having a hard time to set such homogeneous standards and users just as much as non tech managers have no visibility over the quality of what’s delivered to them. From there controlling performance, maintaining standards and taking decisions becomes a whole lot harder.
Both the private sector and researchers have looked into finding a satisfying answer to this issue. Analyzing code and evaluating its quality have resulted in complex metrics that are too specific or that are not capable of gathering tech and non tech stakeholders around a comprehensible language (see the section about ISO of the How to measure code quality article). We previously introduced ISO 9126 and ISE/IEC 15939 but the fact that they are barely followed in the industry shows how disappointing those standards have been so far. They lack clarity, either too complex or not representative enough and consequently they don’t bear the qualities that make great consensual metrics: their complexity affects their capacity to convince all team members with a one fits for all standard.
Current analytics tools and their metrics do not support high code quality well by delivering an objective synthetic piece of information the way industrial manufacturing metrics should do.
Avoid compensation effects at all costs. Ratios and average hides discrepancies and the code is not an uniform mass that can be treated as such. For example, to measure unit tests we would check your test coverage with a goal of 80%. Unfortunately an 80% code coverage even though it looks like a best practice and a high quality standard can hide that the 20% untested part of your code is the most sensitive and the most complex thus the most likely to generate regressions and bugs in your development process. High code coverage targets can only be efficient if it is coupled with a sense of risk prioritization. Other bounded scale based metrics can create threshold effects and remove granularity from the monitoring, look out for those.
Meaning that if A is strictly superior to B then the metric to measure A needs to be strictly superior to B’s metric. The metric should quantify perception while respecting it. This seems obvious at first but most metric systems for code are not respecting this rule.
One of the existing metrics for documentation quality is the comment ratio. It enables you to check if your development team has been thoroughly building documentation to support the collaboration around the codebase. But the benefits of a document lies in its quality, in its content and how well maintained it is. Work with your team to find how metric can support their effort towards documentation quality and you’ll make not only your project more sustainable but your team motivation around quality effort higher.
That’s just 3 examples which illustrate how a green light on your metric dashboard can hide flaws and bad quality code and how to get around it. They are metrics who create false positives and hurt the entire set of metrics by discrediting them and forcing you to spend more time and resources analyzing code.
Challenging your code quality monitoring plans
Your metrics are supposed to create opportunities to improve your software development process by pointing out downward trends and weaknesses. Thanks to them you can make sure your project improves the right way. Don’t wait another minute and start improving your code quality processes but make sure the metrics you will base your analysis on have not been manipulated or misused voluntarily or involuntarily by your team. Peer pressure, company politics or quality phobic constraints are amongst the many reasons leading to creating opportunities to avoid or dismiss code quality targets and the metrics attached to it.
There are different categorizations of code metrics. If you are talking to your team of developers they might segment metrics depending on how to capture them. On one side you will have static measures which do not rely on software execution against dynamic metrics that can only be collected by running the code. On top of this you can add quality in use which are dynamic metrics that are only applicable on the live version of the software (i.e. when the code is running in real conditions).
For the sake of lisibility we choose to cover some key dimensions of code quality and the metrics attached to them in order to give you a foundational understanding of what’s available to you to control these dimensions
We will list some simple and some more complex metrics so you can learn about the range of KPIs available to you and your team and get a discussion started over what fits best for you.
Measuring how well tested your code is will make a big impact to measure how strong your development and quality control processes are and consequently how efficiently and how early you are capable of detecting flaws in the code. Beyond catching bugs, tests serve other purposes such as a role of documentation (i.e. understand what the code is meant for and what logic it follows) which will accelerate maintainability and collaboration.
Code coverage metrics
Coverage is a volume based metric: it checks how many of your lines, functions or branches are covered by a test. Line coverage is today one of the most used metrics in the corporate world and that is why we will spend a minute to review it. Most companies have a goal of 80% to 90% code coverage while their legacy code usually does not reach this level because this standard didn’t exist before. High code coverage helps to not add up unnecessary technical debt and catch regressions early on in the software development life cycle in a shift left approach. Unfortunately code coverage can be worked around by developers facing tight delays. Moreover code coverage does not give any indication over the quality of written tests and how exhaustive they are and that’s where you might be missing out on rich information. As mentioned earlier if you are monitoring code coverage but high risk code isn’t well covered or if high complexity functions aren’t well tested then you would end up with really low quality without you noticing it ahead of trouble.
Coverage of Statement: Number of articulations executed during a test isolated by all assertions
Coverage of Branch: Number of executed conditions isolated by all conditions
Coverage of Capacity/Performance: Number of executed capacities isolated by all capacities
Coverage of Lines: Number of lines ran during a test isolated by all lines
Number of new defects detected in each phase of testing
Monthly defects in regression testing, unit tests, integration tests...
Use raw numbers to compare from a version to another if your code gains or loses in complexity. Otherwise create a ratio of your metrics against the volume of code created so you can put things in perspective.
Software maintainability is the most researched code quality metric
Weighted Micro Function Points
Breaking down the code into micro functions and valuing the complexity of each of those in order to grade the codebase complexity.
Halstead Complexity Measures
Includes a range of metrics (Program length, Program volume, Program level, Minimum potential volume, Program difficulty, Programming effort, Language level, Intelligence content, Programming time) to assess the computational complexity and assess how maintainable the code is.
It assesses the difficulty of code for testing, maintaining, and troubleshooting through counting the total linearly independent paths (above 10 the probability for defects skyrockets). This metric does not have to be applied to a whole project but it can be narrowed down to a method,a class,a namespace or a module. It helps to measure the minimum number of test cases to obtain full test coverage. A problem with cyclomatic complexity along with other metrics are that not only the definition of their components is sometime open to debate which means the metrics change along with different interpretation but it also does not give a clear clue over where there is a loss of quality and which actions needs to be taken in order to solve it. On top of this the result given by the cyclomatic complexity calculation does not have a clear scale which preserves distances as we mentioned for a success criteria of code quality metrics.
The maintainability index
It measures how easy it is to support and change the source code by computing in a simple formula some metrics. The static code analysis metrics you need to compute are easy to gather hence its popularity. It takes into account the cyclomatic complexity mentioned above as well as the percentage of lines of comments. This relationship means that the higher the proportion of comments in the source code, the better, but that the beneficial effect of comments decreases as the comments proportion is higher. This can be widely debated (example: how is the number of lines of comment relevant if these comments are outdated?) but the MI has been validated multiple times for several procedural programming languages: C, Pascal, FORTRAN and Ada [Ash et al., 1994; Coleman et al., 1994; Oman and Hagemeister, 1994; Coleman et al., 1995; Oman, 1995; Pearse and Oman, 1995]. It can also be used to quantify maintainability in object-oriented code even if the fit is somewhat less than perfect." The Maintainability Index may provide a starting point to measure maintainability on different versions of the code over time. However, the fact that it is a single value is a disadvantage as it lacks detailed information provided by the raw metrics. Moreover the metrics underlying this model are sometimes criticized:, the lack of an unambiguous definition of how operands and operators should be counted for the Halstead complexity is one of the top issues. Unclear and hard to explain: the MI has strong weaknesses when it comes to its usability in the real world context.
The SIG maintainability model
A strong point of the SMM is that the model uses metrics that are clearly defined and easily calculated. It is sensitive to small changes in the system, but not so sensitive that its indication will be 'off the scale' when a large change has occurred in the system. Finally, the way the ratings are presented (on a scale ranging from -- to ++) makes them easily interpretable for management. A weak point of the SMM is that it is as yet not validated; thus it is unknown to SMM users how good it indicates maintainability. It is still a young metric being refined.
Weighted Micro Function Points
=∑(WiMi) II Dq
M: source metrics value measured by the Weighted Micro Function Points stage
W: adjusted weight assigned to metrics by the average programmer profile weights model N: count of metric types
i: current metric type index (iteration)
D: cost drivers factor supplied by the user input
q: current cost driver index (iteration)
K: count of cost drivers
MI3 = 171 – 5.2 * ln(aveV) – 0.23 * aveV(g’) – 16.2 * ln(aveLOC) (1)
MI4 = 171 – 5.2 * ln(aveV) – 0.23 * aveV(g’) – 16.2 * ln(aveLOC) + 50 * sin (sqrt(2.4 * perCM)) (2)
aveV = average Halstead Volume V per module
aveV(g’) = average extended cyclomatic complexity per module aveLOC = average count of line of code per module
perCM = average percent of lines of comments per module
The Maintainability Index easily computes metrics, to predict software maintainability. A higher MI value indicates better maintainability.
The word ‘module’ used here means the smallest unit of functionality. Depending on the programming language, this is a function, procedure, method, subroutine or section.
With this approach, one can identify the modules with the lowest MI value (meaning with the greatest necessity to be improved).
Lines of Code (LOC) metric hides behind its simplicity an open debate on what constitutes one LOC. Halstead Volume is based on four scalar numbers derived directly from a program's source code: n1 = the number of distinct operators
n2 = the number of distinct operands
N1 = the total number of operators
N2 = the total number of operands
How about metrics set by corporations themselves?
It’s worth mentioning the HIS (Hersteller Initiative Software) metrics. It has been defined by several large automotive manufacturers Audi, BMW Group, Daimler, Chrysler, Porsche and Volkswagen amongst others), to provide a standard of high quality code for car systems. It’s made of two distinct sets:
1. Metrics with limits which generally measure the complexity of the code.
2. Metrics without limits that measure the change in the number of statements in code between versions to give a stability index.
These metrics are solely focused on maintainability and quality at the coding phase with metrics such as the number of GOTO Statements which would indicate how difficult the code is to test (because the number of paths) or the number of return points within a function which can help you improve your maintainability.
Measuring your code quality monitoring efforts
Last but not least you should keep in sight your capacity to spot defects and weaknesses and remove them. In order to do that you should calculate and consistently monitor how good your code quality processes are with a defect detection ratio (i.e the ratio of defects found prior to release against the ones found after release). This will enable you to measure improvements of your code quality efforts at each release cycle.
From the most simple to the most complex metrics, it’s time to start your own dashboard. Fortunately tools are available to accelerate this step and quickly start collecting data that will support your code quality optimization effort. Let’s review a few of them.