The smouldering situation
You’re the lead developer in a team of five. You’re all burnt out. Each of you is in the office from early morning til late into the evening, trying to hack away at the relentlessly growing backlog. In fact, evenings are better for work, because in the day you’re swamped with bug reports and operational issues, and developers rarely get time to work on “new” features. Marketing just signed up a new client with a whole load of new feature requests, and you barely have time to speak to the clients already in production. What’s more, you’re losing more and more of your time in the day to meetings, trying to get the situation under control. You’re probably pulling your hair out thinking “we just don’t have enough time to do all this!”. You need to do something, so you think.
On the back of an envelope you sketch out the situation. Your 5 developers are putting in at least 60 hours a week each, that’s 300 developer-hours a week in total. Out of that 300 hours, you estimate you’re spending 50 hours a week bug-fixing, 30 hours a week on ops issues and 20 hours a week on meetings. 100 hours a week before you get to the new features – that’s a third of your time! But look at the backlog: since the new client came on board, it’s not going down, it’s going up!
The conclusion is obvious: there’s just too much work on. You’re already all working overtime, so you need more people. If you had just two new developers, they could handle the bug fixes and ops issues, and still have time to chip away at the backlog. (Well, they’d only be paid for 40 hours a week each, but they’ll soon pick up the corporate culture of going the extra mile, right?) The solution is now even more obvious: you go to your line manager and ask to start recruiting. Right?
Wrong. The very last thing you want to do in this situation is hire more developers.
The new hire stokes the fire
Your first new developer, Alice, starts on a Monday. She ties up another developer for the whole first day getting her machine partially installed. Tuesday morning she goes off on her own because everyone is in a meeting, but then has to spend the afternoon unpicking what she did because it turns out you use a custom build of one tool. Bob knew this, but forgot to document it because some time last month he was … called into an meeting at short notice. Wednesday you set Alice loose on an easy bug-fix task. It takes her all day as she learns to navigate the code, but she commits it and moves on. Thursday and Friday she spends trying to implement one of the easy features in the backlog, but over half that is spent with another developer because first an old bug got in the way. (It might be in the bug tracker, but since it hit 200 open tickets, nobody really checks it any more.) Anyway, a week goes by, and the work goes out in a Friday evening deploy.
There is a weekend. You’ve all learnt to turn off over the weekend, overtime hasn’t crept in that far yet.
Monday is chaos. In fixing the first bug she tackled, Alice changed something she thought was an error, but was actually an obscure edge case of a business rule. Nobody reviewed it because they lost enough time helping her get set up, and everybody knew it was easy anyway! So the deploy is rolled back. Conversation quickly reveals that the feature Alice committed on Friday was design on top of her misunderstanding of the business rules. Now someone in the team has to do a thorough code review. Even without counting the hours attached to this, it’s clear the team is significantly behind, and in a large or complex code base, there is no reason to believe this will improve soon.
Is this all Alice’s fault? Could she have tried harder? Or is the system at fault?
What is throttling the performance of this company? It’s clear it’s not in marketing – they’re bringing in clients quicker than the software can be rolled out. And it’s not in analysis – the requirements are building up faster than the developers can turn them into code. (We’ll assume for now that these requirements are actually effective.) It’s not even operations – a week’s work went out on Friday evening, and even if it broke the business rules, it was operational. That leaves us with the bottleneck squarely in development. So if development is the bottleneck, why was it wrong to start hiring developers, to increase the capacity of this overstretched skill?
The assumptions behind hiring
To explain this situation I’m going to make explicit some of the tacit assumptions that often underly hiring in an overworked team. This is quite a crucial point, as much of the shared mindset in software organisations is tacit, and influences decisions without ever being held accountable. It is comparable to the difference between invisible work and work visualised on, for example, a kanban board. (Note that even organisations that use kanban boards often have other, unvisualised work.) The following is not an exhaustive list, but it will serve the point. Many teams act as if the following are true:
- Developers are fungible
- Productivity is proportional to developer-hours
- Fixing bugs is valuable
- The requirements are all necessary
Developers are fungible
Tom deMarco calls this The Myth of the Fungible Resource (in Slack). Many factory and warehouse jobs are largely fungible, in that the time to bring someone up to full productivity is inconsequential (hours or days). This is not true of development, where even if a new hire knows the programming language, framework and even the generic business domain, it will still take a long time for tacit knowledge of the codebase to flow into his head.
I don’t think developers actually believe they are fungible (at least, none I’ve met would say so), yet I’ve seen teams hiring as if this assumption was valid. Any time you act as if a new developer working alone will immediately increase team productivity, you are acting as if it was true. This tacit assumption is in contradiction to what most developers will explicitly state the nature of their work is like. In a contradiction, at best one side is right.
Productivity is proportional to developer-hours
There are two forms of this assumption: first, the idea that a developer working a 10-hour day will be 25% more productive than a developer working an 8-hour day; second, the idea that a team of 10 developers is 25% more productive than a team of 8.
To address the first, remember that the nature of software development is creating new knowledge, which I explained previously in the post Why Can’t Developers Estimate Time?. One consequence of this is that development is a creative task that involves constantly making logical decisions. (For example, is it time to break up this long block of code? To use XML or JSON? To replace the application framework?) As explained in the article Do You Suffer From Decision Fatigue?, the human brain has a limited capacity to make these types of choice, and once tired, it will take shortcuts. The feeling of “I just want to go home” may be causing you to introduce bugs. Using overtime as evidence the team has too little capacity is therefore in contradiction to what scientific studies show. That one side of this contradiction paints a picture of developer heroism does not make it any more true.
The second form of the productivity-time assumption is based on the idea that the productivity of a team scales linearly. This is not true for the simple reason that the complexity in managing a team is not the number of people involved but the paths and amount of communication involved. Compare, for example, how easy it is to get 50 people to pass a ball down a line, versus getting even 5 people to agree on the menu for a meal in a Chinese restaurant.
Fixing bugs is valuable
Bugs are, by definition, something the system was not intended to do. There are times when nobody knows if an idea will work (this is the realm of the Lean Startup). But there are many, many defects in the world where the developers had, at the time they wrote the bug into the system, the knowledge to determine the behaviour was wrong, yet for some reason they didn’t. Imagine you’ve taken your otherwise immaculate car in to have the brakes replaced, and when you drive it afterwards it starts pulling to one side. Exactly how much value would you see in having the wheels re-aligned, even if it was done for free?
When these sorts of bugs are fixed, what is actually happening is not work, but rework. The developer must load the knowledge of that bit of code into his head, including the requirements, the way it is implemented, the dependencies it has, and then make the change. Even in the case where the bug fix is purely the addition of code, and not changing existing code, the developer must still repeat the process of understanding the subsystem to make that addition. When a new developer is doing this, they may have to learn from scratch a whole area of code, along with any tacit knowledge required for it, and then cross their fingers they don’t break anything (if a test suite catches a bug here, that knowledge has already been made explicit). If bug-fixing is waste, fixing bugs introduced while bug-fixing is doubly so. I call it whack-a-mole development, a term I’m deeply saddened hasn’t caught on yet.
If your team is spending time any significant amount of time fixing bugs, it has much more capacity than you realise. That’s not to say it’s an easy reserve to tap into, but it is there. The attitude that bugs are inevitable is harmful, as it will give strength to the tacit assumption that fixing bugs is valuable.
The requirements are all necessary
I’ve saved this for last as it has a different nature to the other assumptions: it necessarily involves decisions made outside the development team. Unless the team is making the software entirely for itself, someone else will be involved specifying the development work being done. If it turns out that 30% of the features in your software are never used or unnecessary, then at least 30% of the development time is pure waste. (It may be more, due to the complexity of managing the larger codebase, and the waste due to bugs in the surplus code.) However, as many teams are contractually obliged to deliver a fixed spec without reference to the value of the features in that spec, this may be a difficult source of waste to fix. Because of this, I won’t say much more about it. It is in any case usually easier to get someone to clean our their garden shed if you can show you can keep your bedroom tidy first.
The reality of the busy team
Remember that we came here because Alice was brought in to increase the capacity of an “overworked” team. Yet we’ve seen that the assumptions underlying the need to hire her were false:
- The team is not running at full capacity, it is spending at least 25% of its time on rework and avoidable maintenance, even taking into account overtime
- The team is not even producing maximum quality given the existing skills of the team, because some of the bugs were introduced due to developers being fatigued and over-stressed
- Alice can’t be brought in to give immediate relief, because the communication overhead actually reduces productivity in the short term at least
A note on team sizes
You may hold a valid reservation about my bold statement that you shouldn’t hire more developers: increasing capacity isn’t the only reason you may want to do so. A very valid one is redundancy, as very small teams are vulnerable to Murphy’s Law. If your only developer is run over by a bus, your project is in immediate jeopardy. (It was in jeopardy before, it just took a bus to show it.) Then again, it is possible to have a team of 10 devastated by a single errant bus incident, if the team has formed knowledge silos.
Christopher Allen’s article The Dunbar Number as a Limit to Group Sizes explains some of the consequences of various team sizes.
Small team sizes may be less of a risk than they appear though. In my personal experience, developers are very rarely run over by buses. And they very rarely leave because of pay. But developers do very frequently leave because of unsatisfactory working conditions. If you’re the manager of a situation like the one in the story, one of them has probably told you so, in as many words.
There’s also another situation you might want to increase the size of your team: when the person you’re bringing in has the knowledge and experience to help improve everyone else’s effectiveness. In this case, though, their responsibilities will have to extend far beyond pure development.
What to do
The first thing is step back and check if you’re trying to solve a problem fundamentally caused by systematic waste by throwing more effort at it. This is akin to putting more sailors on water-bailing duty when the ship’s engineer should be welding the hull shut. Fred Brooks stated Brooks’s Law over thirty years ago: “Adding manpower to a late software project makes it later”. Please don’t ignore the past unless you want to turn your office into a historical re-enactment. I’ve had someone personally tell me “We have a perfect graph showing velocity going down as we started adding more people!”.
Improving the productivity of a software team is hard. It involves understanding the business, the team, the history, the obstacles blocking progress. It is a complex, context-sensitive problem. This being a blog post, one already in need of a TL;DR summary, I’ll just point you in the direction of a suitable body of knowledge, and suggest you read The Goal.
We see the world filtered by the metaphors we hold. The Goal (by Eli Goldratt) shows how our common assumptions blind us to the real causes of the problems we face every day. It has sold millions of copies, has been used in thousands of corporations, and is taught in hundreds of colleges and universities. The Goal is the archetypal book on how to focus on what matters. It will take you only a couple of days to read, and will teach you to see the real source of bottlenecks in your organisation. (This is not an affiliate link.)
I’ll end with a rule of thumb though: when faced with a situation like the one described above, try to exploit what you already have before throwing more effort and money at the problem. You’ll often realise you can be more effective with the people and resources you already have, once you discover the real reason things are going wrong.
Thanks for reading
My name is Ash Moran. I’m a software developer and agile coach, and owner of PatchSpace Ltd (Twitter). If you have any feedback, questions, or would like to know more about my services, feel free to contact me at firstname.lastname@example.org, or continue the discussion in the comments.