Rethinking software development, inspection and testing
What are the benefits and costs of inspecting code in large scale software development?
By Matthew Heusser | CIO US | Published: 15:45, 02 February 2012
Setting the Scene - Where Formal Inspections Were Born: The early 1970s were a bit like the old Monty Python Skit "The Four Yorkshiremen," having a reel-to-reel tape drive was "luxury" and most people still had physical punched cards. Robert C. Martin, a co-author of the Agile Manifesto, described this era in his book The Clean Coder
These tapes could only be moved in one direction. So when there was a read error, there was no way for the tape drive to back up and read again. You had to stop what you were doing, send the tape back to the load point, and then start again. This happened two or three times per day. Write errors were also very common, and the drive had no way to detect them. So we always wrote the tapes in pairs and then checked the pairs when we were done. If one of the tapes was bad we immediately made a copy. If both were bad, which was very infrequent, we started the whole operation over ...
We edited the tapes on a screen ... . We would read a "page" from tape, edit the contents, and then write that page out and read the next one. A page was typically 50 lines of code. You could not look ahead on the tape to see the pages that were coming, and you could not look back on the tape to see the pages you had edited. So we used listings. Indeed, we would mark up our listings with all the changes we wanted to make, and then we'd edit the files according to our markups. Nobody wrote or modified code at the terminal! That was suicide. Once the changes were made to all the files we needed to edit, we'd merge those files with the master to create a working tape. This is the tape we'd use to run our compiles and tests..
Related Articles on Techworld
Think about the consequences of a compile error in this sort of environment. You'd need to find the right tape, set it to the right point, edit the page, resave the tape, then run it into the machine ... only to find the next compile error and start the process all over again. With a dozen compile errors, you might lose a few days of productivity; if your team was using punched cards that were compiled in an overnight print run that might be a few weeks.
In that sort of environment, it makes perfect sense to print out the code on paper and review it before the compile step, perhaps even hand-writing the code and reviewing it before typing it in.
Today, it is likely that before the code can be reviewed, it not only compiles, but if the team is doing test-driven development, it may pass an entire suite of unit tests.
If that is the case, the cost/benefit ratio for the inspections themselves may be very different.
Time for some research
Adam Porter is not well-known in the commercial software community, but in an academic community his is a tower of evidence-based, peer-reviewed research. A full professor at the University Of Maryland, his publication credits include "Sources of Variation in Software Inspections," published in ACM Transactions on Software Engineering, "An Experiment to Assess the Cost/Benefit of Code Inspections In Large Scale Software Development" published in IEEE Transactions on Software Engineering, " Reducing Inspection Intervals in Large-Scale Software Development " in EEE Transactions, and dozens of other peer-reviewed journal articles, with 11 on the theme of software inspections.
In his cost-benefit experiment, which he co-authored with a research assistant and two engineers at Bell Laboratories, Porter treated the size of the review team as a variable. He measured reviews done with one person, to several, to formal, "Fagan-Style" inspections, with defined author, reader, moderator and reviewer roles. He also separated his data, ignoring findings that were not fixed (as this had no impact on the code released) and "soft" maintenance issues like code style. In the conclusion, Porter found that inspection teams of more than two people, and multiple inspections found no more significant defects than those done by two reviewers and an author.
He also found that adding people requires a meeting to be scheduled, possibly with a meeting room, and that increases the time it takes to hold the meeting, which can make the code old and stale.
In the end, once you ignore false positives and soft corrections, Porter found that only 13 percent of the "issues" found by inspections actually resulted in a material bug fix. He concludes the study with the phase, "For practitioners this suggests that a good deal of effort is currently being expended on issues that might better be handled by automated tools or standards."
Porter's did most of research on inspections in the 1990's. The next decade brought another wrinkle: Agile Software Development.
Agile and extreme programming
Extreme Programming, or XP, brought many insights into software development, but at least two are important for purposes of code inspections. First of all, the idea of pair programming implies continuous peer code review, while Test Driven Development, or TDD, prevents the sort of compile-time, off-by-one, index and pointer errors that inspections were designed to address.
The second insight extreme programming brings involves the "cost of finding a defect late."
The historic interpretation is that the cost of fixing a defect grows exponentially with each phase. That is, a defect found and fixed in production might cost hundreds, if not thousands, times more than if it were prevented before the requirements were "signed off." One classic image for this is this one, based on the work of Barry Boehm in his book "Software Engineering Economics" often referenced by Steve McConnell and others *
The graph above illustrates a general principlet--that the costs of fixing a defect go up over time. It is important to note that this graph is an illustration. There is no numbered Y axis, and your actual results may vary. The numbers are based on averages and assume a bell-curve distribution, but that distribution may not be accurate. The cost of fix curve can look very different for different categories of bugs, something that Alan Page, the lead author of "How We Test Software at Microsoft" pointed out in a conference paper in 2011.
The image also adds an interpretation: that price goes up by phase. An alternative interpretation is that cost goes up over time. Remember: When Boehm did his this research, nearly all projects followed a waterfall model, so linear time and phase of the project were the same thing.
Today, an Agile Software team will move from requirements to design to code and test in a matter of days, possibly hours or minutes. That means for two identical projects, one with a two-week iteration and the other organized as a six-month waterfall, the agile project can find the defect in a week or two, before the review would even occur in the waterfall project, keeping the costs low.
At the same time, the internet was rapidly reducing the replacement of new software. Suddenly, instead of having to ship CD's and physical floppy disks in boxes to customers, we could deploy to one server or server farm.
All this comes together to mean that as the value of inspections has decreased, the very risks they were designed to mitigate have also decreased, or may be addressed by other means, such as TDD, Pair Programming, or rapid exploratory testing.
That said, you may wonder how inspections turn out in practice, especially at organizations where TDD and Pair Programming are not common place.
I thought it was time to find out for myself.
Peer review in practice
Peter Walen, a technologist and software tester from Grand Rapids, Michigan, has used formal inspections on and off (mostly off), for the past 15 years. In his words "Inspections can have a place." After all, he adds "Every time I have implemented an inspection program, it has had good effect - knocking some heads together and getting people to realize that there are bugs, obvious bugs, bugs you can see, right in the code. If your organization is in that shape, inspections might be for you."
Then again, he goes on to say, don't be too hasty. "Within a year or two of implementing these programs, I find the need for the inspections goes away. Team members have found ways to get the obvious sort of code errors out before the inspection, if nothing else than to save themselves embarrassment. The discussions start to move to be around what things mean in the specification and design - what will be built, no what was built. The reviews become forward-looking things, not a sort of parsing of the results after the fact." In other words, he concludes "Reviews can solve a specific sort of problem, if not in an elegant way. If you don't have that problem, or if the problem goes away, so can the reviews."
What to do next
Formal inspections were born in a different time in software development, a time when a compile might take hours, when applications had no GUI and ran in "batch mode." If the inspection is a formal step that requires people not in the room, it will add time to the project and require a meeting. The research results suggest that this extra time can outweigh the benefit, but your team may get similar benefit from informal inspections conducted immediately after the code is produced, or, in some cases, as it is produced through pair programming.
If you are looking at creating an inspection program, the first question to ask is what problem does the inspection solve, and does your team have that problem. As Walen indicates, if your team has a great many obvious defects in business logic, the kind that can be found and reduced by inspections, your team may benefit from introducing an inspection process. If the defects are in other areas, such as functionality that should exist but does not, or in complex combinations of the user interface one has to see to understand, your team may benefit from a different approach.
Besides "just" finding defects, there are other reasons to perform inspections and walkthroughs: To share a knowledge of the codebase, to help testers and analysts understand what was built, and, if you include non-programmers, to keep the code itself from becoming overly complex. These other benefits should influence how you conduct the inspection, which I will discuss next month.
Editor's Notes Matt Heusser would like to thank Lanette Creamer, Alan Page, Ben Simo and Dorothy Graham for their peer review.
The graph above is used by permission and is a modified form of the cost-of-defect curve used by Steve McConnell in his book "Software Project Survival Guide."