Testing Software is not Expensive - It's Free

A common criticism of (aka excuse for not doing) test-driven-development is that it's too expensive in terms of developer time. Critics who take this position usually point to the time developers spend writing test cases, which at first seems like a sensible observation. There are (at least) two problems with this.

First - the same people that label TDD as waste are often people who will happily spend - or allow their staff to spend - hours or days at a time in a debugger. Testing to find defects is waste.

Second - and more importantly - writing test cases is not the same as running them to test the software.

At some point, somebody has an idea. They say, I have this problem (for our purposes here, we'll assume they know this exactly), and if I can write a program to do these things, then my problem will be solved. That person has the ultimate test case for the as-yet-unwritten software: if it behaves how they want it to, their problem will go away.

Now, a developer takes over, and turns this conversation into a set of ideas about what the code should do to implement this behaviour. At the very least, having written something, he should run it and inspect what it does, to verify that it behaves as they expect. (Some don't even do that much...) This is simple manual testing. But note two things:
  • if he doesn't think about what it must do, he has zero chance of designing the right solution
  • if he doesn't test his assumptions about the software's behaviour, he pushes errors downstream, where they become slower and more expensive to correct
Now given that the developer here must think about what he is doing - the most effective way to think about it is to express it in an unambiguous form. A form that something stupid and mindless can understand - say, a computer. If he can specify the problem in a way a computer can understand, the only source of error is in getting this spec right in the first place. But fortunately, as he's thinking about what he's doing, this is usually not a large source of errors. (If it is, you have a bigger problem on your hands.)

How does our developer know if the computer has understood the spec for this code? The only way is to make the computer able to verify the program against the spec. Otherwise, the spec is about as useful as a stray Word file, such as a signed-off requirements document. We want booleans here. Flashing lights. Possibly red and green.

When this developer runs his spec program against his solution program, the computer is doing what he should do anyway before releasing it to his customer. The only difference is it can do it many orders of magnitudes faster than he can. So fast, in fact, that it is effectively instant. How much does it cost to fire off the test run? A few seconds of developer time. Or, if you're using an automatic test runner, exactly nothing.

Up to this point, we've established two things
  • writing test cases is the process of formalising a spec so that a computer can be employed for testing
  • running tests is effectively free
But, how free?


The inspiration for this post came from chapter 4 of Don Reinertsen's Managing the Design Factory (It's All About Information). The purpose of this chapter is to explain ways to efficiently generate valuable information. The examples in the chapter are largely from circuit engineering, but even there, there exists a continuum. From page 76:
[Testing costs] could be twice as high with four iterations instead of two. This means that when testing costs dominate the economics we should concentrate on quality per iteration. We do not want to incur extra, expensive trials when the cost of a trial is high. In contrast, when testing costs are lower, we will get to higher quality faster by using multiple iterations.
So, if testing software is essentially free, how many iterations should we have? The answer is hinted to on page 74: this is an economic order quantity (Wikipedia) problem in disguise[1]. Out of sheer laziness to get an equation editor working, I'll reuse the slightly arcane, CC-licensed Wikipedia equation:
  • Q* is the optimal order quantity - how many tests you should batch before you start a test run
  • C is the order cost - the cost of a test run
  • D is the rate at which the product is demanded - arguably requests for features (this is not explained in MtDF, presumably because you can demand features arbitrarily fast) 
  • H is the holding cost - the cost of running tests late in development, when change is more expensive

The key, though, is that if C, the cost of running tests, is at or near zero, and H, the cost of making changes late is high (and every developer's experience is that tracking down bugs in old code is much harder than in freshly-written code) the optimal batch size of tests to hold is also at or near 0. Which in reality means:

You should strive to keep the cost of testing software at effectively 0,
and to run all your tests every time you make a change

If you've done TDD for a while, you'll know this intuitively. But expressing it in terms of existing economic models, already in use in other forms of engineering, puts it on solid ground.

I'll leave it open to interpretation exactly what I include in the scope of a "test", but that will be touched on in my next post. And if you doubt just how free software testing can be, take inspiration from IMVU's continuous deployment: Doing the impossible fifty times a day.

[1] You mean you didn't spot it either? :)