Programming errors can be some of the most devastating problems in the modern world. Now that our lives are so intertwined with technology, a simple lost line of code can spell disaster and cost companies millions in damage or lost time. These eight mistakes are among the worst in history.
CrowdStrike’s broken update
CrowdStrike’s global outage is reminiscent of the Y2K threat to older computer users. The outage was caused by a faulty code being pushed out during an automatic update. A bug in the cloud-based testing architecture indicated that the broken update was ready, and the system simply pushed it to users.
That outage cost millions of dollars. Some news outlets estimate it cost Microsoft about $44 million per Fortune 500 company that was inconvenienced. This particular programming error brought the world almost to a complete standstill. It also shows how dependent we are on technology in the 21st century.
The Heartbleed bug
The Heartbleed Bug, a critical programming flaw discovered in 2014, exposed millions of websites and devices to potential data theft. This vulnerability in the widely used cryptographic software library OpenSSL allowed attackers to access sensitive information from the memory of affected systems, including passwords and encryption keys. The bug’s impact was devastating, costing companies an estimated $500 million in losses and recoveries.
The programming error at the heart of Heartbleed was a simple overread of the buffer in the TLS heartbeat extension. This oversight of code implementation went unnoticed for two years, highlighting the catastrophic consequences of even minor coding errors in critical software. It also shows that you should be wary when using open source software, because it is not clear what bugs are in the code.
Ariane 5 flight 501 fails
The 1996 failure of Ariane 5 Flight 501 is a stark reminder of how a seemingly minor programming error can lead to catastrophic consequences. The root cause was a software exception in the missile’s inertial reference system that was triggered when a 64-bit floating point number was converted to a 16-bit signed integer. This overflow caused the guidance system to interpret the correct flight data as a deviation from the expected flight path.
Just 37 seconds after launch, this error caused the rocket to veer off course and subsequently self-destruct. The explosion resulted in the loss of four scientific satellites and a decade of scientific research. Beyond the immediate costs of $370 million, the European Space Agency faced significant reputational damage and delays in their commercial space ambitions.
Therac-25 Accidents involving radiotherapy
Radiation is always something to be aware of, but this particular incident actually cost lives, and it was all due to a programming error. This radiation therapy device, designed to treat cancer patients, contained a software bug that allowed it to deliver massive overdoses of radiation under certain conditions. The bug stemmed from a race condition in the control software. If you type fast enough, you can bypass the software’s locks.
This programming error resulted in at least six known accidents in which patients received radiation doses hundreds of times higher than intended. Three of these incidents had a fatal outcome. Programming errors like these not only lead to millions of dollars in lawsuits, but also to the loss of lives, all because of something that could have been prevented with strict testing protocols.
The Mars Climate Orbiter
Sometimes the most expensive programming and software mistakes happen off-planet, as in this case. The core problem was a unit conversion mishap: Lockheed Martin, the spacecraft manufacturer, used imperial units (pound-seconds) in their software, while NASA’s Jet Propulsion Laboratory expected metric units (newton-seconds) for thrust calculations.
Unsurprisingly, the orbiter crashed on the surface of Mars because the units NASA used were wrong. The loss of the $327 million spacecraft was a significant setback for NASA’s Mars exploration program, delaying critical climate studies and forcing a reevaluation of communications and verification processes between contractors and the space agency. It’s an excellent example of why having a design document for a project is crucial.
Error in Knight Capital Group’s trading platform
Knight Capital used to be a respected financial firm that made its money through trading. The company’s proprietary trading software used algorithms to buy and sell automatically. Programmers were hired to make minor adjustments to the code to make it more efficient. Unfortunately, one of these programmers uploaded a code update that broke the entire algorithm.
In the span of 45 minutes, Knight Capital lost $440 million, nearly four times the company’s annual net profit. This programming error not only decimated the company’s capital, but also seriously damaged its reputation, leading to a dramatic decline in its stock price and eventual acquisition by a competitor. It should be a lesson for programmers to find and squash bugs before putting their code into production.
Pentium FDIV bug
If you’re Linux-savvy, you could do floating-point arithmetic in Bash, but what happens if an entire processor has a problem with floating-point division? In 1994, Intel’s flagship Pentium processor was the talk of the tech world. Unfortunately, the chip suffered from an error in the division lookup table used by the processor, which accidentally reset five of the 1,066 entries to zero, leading to inaccurate calculations in specific scenarios.
Initially, Intel downplayed the problem, but as public awareness grew, especially among scientists and engineers who relied on accurate calculations, Intel was forced to acknowledge its seriousness. The company ultimately offered to replace all affected chips, resulting in a cost of $475 million to recall and replace millions of processors. This incident shows how something as small as a few numbers in a lookup table can cost a company goodwill and millions of dollars in recall costs.
Mars Polar Lander Incident
Space has many hazards for hardware and software. Unlike the last Mars error, this one was not due to a unit difference. The $125 million spacecraft, designed to study the Martian climate and search for water ice, was lost during its descent to the planet’s surface. The root cause was traced to a software error in the landing sequence.
The error occurred in the spacecraft’s touchdown sensors. These sensors are designed to detect the deployment of the legs and contact with the surface, which disables the motors. However, the software did not take into account false signals generated when the lander’s legs were deployed. This premature indication of a landing caused the engines to fail while the lander was still 40 meters above the surface, resulting in a catastrophic crash. NASA lost the lander and all the instruments and data it would have collected. This catastrophe underlines why beta testing is such an important part of software design. Thorough testing could have prevented this.
Our modern lives are inexorably linked to computer programs. From the work we do to the games we play for fun, it all comes down to how programs work. Most of the time they work flawlessly, but when they don’t it can cost a company millions. These programming and software errors are a warning to all developers to pay attention to small details. The actual cost can be much higher than just the hours spent fixing the code.