Hacked by Chinese: The danger of complexity: More code, more bugs

The old method of counting lines of code to judge programmer productivity may have helped contribute to the current deplorable state of software security.

Antoine de Saint-Exupery once said, "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." He lived from 1900 to 1944, before the job title of "software engineer" was even a twinkle in someone's eye.
Aside from being the inspiring author of a number of books including The Little Prince, he was also an aviator and an engineer, which may help explain how he produced such a timeless quote that is so very relevant to the world of software development today.
A more obvious, but more specialized, statement in that regard was made by Edsger W. Dijkstra: "My point today is that, if we wish to count lines of code, we should not regard them as 'lines produced' but as 'lines spent': the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger."
Recent source lines of code (SLOC) reviews and estimates suggest that a very conservative guess would place the number of bugs in most modern software at the rate of about one per 1000 lines of extremely well-written source code with great attention to security detail. Most software is not written nearly this well, and I am sure my own bug rate is somewhat higher than this conservative estimate.
Writing code for patches intended to fix bugs surely does help reduce the number of bugs in a system, but most software systems get much more code added to them every year to add features than to eliminate bugs. Bug fixes help keep people happy with current versions of the software, but new features actually sell new versions. Worse yet, even bug fixes are certainly not immune to containing bugs.
According to some estimates, between ten and fifteen percent of security patches actually introduce new vulnerabilities. The implications of this are frightening.
If you have ever wondered how so many bugs are found in your software every year, wonder no more. In 2003, something on the order of five thousand new security vulnerabilities were reported to CERT, and that number per year has only grown since then. The reason we find all these bugs every year is simple: some of the most popular pieces of software in the world are freaking huge.
It gets even worse. Software does not only tend to be really, really big--it also tends to get bigger at an alarming rate. Consider the growth rate of Microsoft Windows operating systems that use the NT kernel over the years, for instance[1]:

Year	Operating System	SLOC (Millions)	Delta (Millions)	Delta Per Year (Millions)
1993	Windows NT 3.1	4.5	N/A	N/A
1994	Windows NT 3.5	7.5	+3	+3
1996	Windows NT 4.0	11.5	+4	+2
2000	Windows 2000	30	+18.5	+4.5
2001	Windows XP	40	+10	+10
2003	Windows Server 2003	50	+10	+5

This tells us that, if we are very kind with the numbers:

MS Windows Server was released with 50 million lines of code making up the behemoth piece of software. That's 50,000,000 lines of code. If you were to try to count that high, and could actually say the names of the numbers between one and fifty million at a steady rate of one per second (unlikely, given how long it takes to read 47,777,777 out loud), it would still take you more than 1.5 years to count that high without pausing to eat, drink, sleep, or even draw a very deep breath. Even counting at that rate to the 5,000,000 of NT 3.1 would take you about four months.
Given an extremely conservative estimate of one vulnerability per 1000 lines of code, NT 3.1 had 5000 security vulnerabilities, and Server 2003 was released with ten times that many.
MS Windows OSs using an NT-based kernel grew in size at a staggering rate. Averaging the rate of growth in the above table, we get more than 4.5 million per year, or 4,550,000 lines of code added per year.
The number of vulnerabilities introduced by all this additional code added to MS Windows systems based on the NT kernel, by the very conservative estimate I already provided, is one per 1000. This means that MS Windows was adding new bugs at a rate of about 4,550 per year. That means that MS Windows alone gained almost as many vulnerabilities as were actually discovered, for all software reported to CERT, in the year 2003. Given that MS Windows is actually a fairly small part of CERT's total database of bugs, the implications are dismaying. CERT's database shows 65 results for the year 2008 on a search under the term "Windows", which means that--if you take 2008 as representative--are being added about 65 times as quickly as they are being found.

It no longer seems surprising that vulnerabilities are discovered in software all the time. What seems surprising is that they are not being found more often.
If you want to produce secure software, you should focus on following the advice of people like Antoine de Saint-Exupery and Edsger W. Dijkstra. All else being equal, if you can find a way to eliminate lines of code without compromising the proper functioning of the software, you will probably improve the security of the software substantially.
Given how much more can be done per line of code when using higher-level languages, an argument might be made to use as high-level a language as you reasonably can for the task at hand, too.
Sometimes, the need to add more code to an application is unavoidable. Try to keep it to a minimum, though. When it comes to application security, complexity kills.
Notes
1: These numbers are estimates gleaned from Wikipedia's "Source lines of code" article. In some cases, Wikipedia's numbers are more vague than these. The numbers used here are actually meant to provide more specific, if not any more accurate, estimates for ease of calculation.

Hacked by Chinese

The danger of complexity: More code, more bugs

Blog Archive