Skip to main content

Number Formats and the Ariane V Disaster

Ariane V heavy-lift rocket - Image credit: ESA-CNES-ARIANESPACE

An Ariane V heavy-lift rocket similar to the prototype that self-destructed seconds after take-off on its first flight in 1996.

Embedded computers control things in ‘real-time’ based on sensor data fed into algorithms that contain simple or complex mathematical functions. The way numbers are handled in the machine is critically important for successful operation, but painful experience has shown that it’s all too easy to become obsessed with the algorithm to the exclusion of boring stuff like number formats.

For Want of a Nail…

…over $350m worth of Ariane V rocket and four Cluster satellites were lost 37 seconds after launch on June 4th 1996. Everything seemed to be going fine until, for no apparent reason, the rocket veered violently off course. The automatic self-destruct system operated as soon as it detected the apparent loss of control. Ariane 501 was intended to be the proving flight of ESA’s new heavy launcher carrying the four satellites designed to study the Earth’s magnetosphere. The cause of the disaster turned out to be a relatively trivial software bug that managed to evade built-in redundant systems and crash the inertial reference system. As is usual with man-made disasters, the ultimate outcome can be traced to a linked chain of errors starting with the re-use of Ariane 4 software modules that turned out to be incompatible with Ariane V’s guidance system. Read the report of the enquiry board for all the details, but in summary, this is what happened:

First link: Re-use of old code

The software for the Inertial Reference System (IRS) ported across from Ariane 4, calculated angles and velocities using accelerometer/gyroscope data, and the results were stored in a 64-bit floating-point format. Nothing wrong with that so far. In fact, the re-use of tried and tested software modules is encouraged in the world of high-reliability, high-risk applications. However, the On-Board Computer (OBC) which controlled the steering by ‘gimballing’ the engines expected to receive this motion information in the form of 16-bit integers. The non-technical press at the time seized upon this difference in formats and implied strongly that lazy engineers had made a stupid mistake by not realising it. Mistakes had been made, but not that crude and obvious one.

Second link: Making assumptions

The software engineers had not overlooked the discrepancy in format and had added conversion routines to the IRS software to deal with it. The first real error was one of assumption: based on the actual performance data of Ariane 4, they assumed that the new rocket would have angle/velocity/acceleration values of very similar magnitude. Real flight data had shown that the numbers did not exceed the ±32767 range of 16-bit integer format. What they didn’t know – perhaps nobody knew – was that Ariane V’s in-flight velocities (plural because the reference system worked with a 3-dimensional frame of reference) would exceed that figure. So convinced were the designers that no numeric overflow would ever occur, they failed to incorporate a simple piece of code called a Limiter before the format converter, which would have detected the overflow and ‘clamped’ the result to the maximum valid value instead. Performance suffers, but the likelihood of disaster is drastically reduced.

Third link: Communication breakdown

For the first 30 seconds or so after launch, no numerical overflows occurred, but when the horizontal velocity had increased to the point where the conversion code reported a problem, the IRS shut itself down and sent a diagnostic message to the OBC. Even at this stage, recovery should have been possible if the computer had been able to interpret the error message correctly, switch to the redundant backup unit or at least fall back on some default data. It did neither.

Fourth link: Not-so-redundant hardware

The IRS hardware was duplicated; unfortunately, so was the software, causing the second unit to shut down like the first. In response, the OBC decided for some strange reason to interpret the error message as valid data. As a result, it believed that the rocket was 90° off-course and heading off in a horizontal direction. Given this misconception, the outcome was inevitable: the OBC gimballed the engines right over on ‘opposite lock’ in an attempt to correct a situation that didn’t exist.

Choosing the right format

The engineers who wrote the IRS software for Ariane 4 presumably didn’t know what level of precision would be needed for the angle and velocity calculations, so decided to play safe and go for the IEEE-754-1985 64-bit Double-Precision arithmetic standard. This handled numbers with an accuracy of more than 15 decimal places which was certainly ‘overkill’, as I doubt whether the accelerometer/gyroscope sensor outputs had anything like that level of precision. Clearly, the OBC was not expecting to receive 64-bit floating-point numbers: its designers thought, with the benefit of real telemetry data from Ariane 4 flights, that 16-bit integers would be perfectly adequate. If 32-bit integers had been used for the faster Ariane V, the accident would probably have been avoided. Let’s take a look at the particular formats used in Ariane-501.

Fixed-Point numbers

The very first microprocessors, the Intel 4004 and 4040 worked with 4-bit arithmetic. The arrival of the 8080/8085 signalled the start of the 8-bit era which lasted until the late 1970s when National Semiconductor introduced their 16-bit PACE machine and Intel the 8086. Nowadays the market is dominated by 32-bit devices – usually with an ARM Cortex core. Virtually all PCs now use 64-bit silicon with multiple processor cores. All these developments have not yet rendered 8/16-bit technology obsolete because there are many applications that don’t require handling huge numbers with fine precision. The Ariane OBC was expecting its velocity data in the form of 16-bit integers as in Fig.1a. Integers have no fractional component and are just a special case of the Fixed-Point class of numbers where the ‘Binary Point’ lies to the right of the least significant bit. The Binary Point is to binary or Base 2 numbers what the Decimal Point is to decimal or Base 10 numbers: it marks the boundary between the integer component and the fractional (less than 1) part. You can see from Fig.1a that a 16-bit integer-only number can represent the range from 0 to 65536 in unsigned denary and signed -32768 to +32767 using the 2’s Complement convention.

Fig 1 - Fixed-Point Binary Numbers

Sometimes you need to represent a number with a fractional format but still, retain the ability to use simple 2’s complement signed arithmetic. Fixed-point makes that pretty easy to do, see Fig.1b. Just move the binary point along the number until you have the range and fractional resolution you need. Those two are always a trade-off: wide-range with low-resolution, or low-range and finer resolution. The integer number format in Fig.1a is the extreme case: widest possible range with a resolution of 1. There’s nothing magical here. By moving the binary point to the left all you’re doing is shifting the Bit Values (1, 2, 4, 8, etc) up and losing a higher value (128, 64, etc) each time. With the binary point starting on the extreme right, one shift left will halve the maximum number that can be handled but will increase the resolution from 1 to ½ bit. A further shift halves the range again, but doubles the resolution to ¼ bit, and so on. The fact is though, the maximum range that can be handled is limited by the number of bits allocated:

  • 8-bit numbers: Max range = 0 to 255 or -128 to +127 signed
  • 16-bit numbers: Max range = 0 to 65,535 or -32,768 to + 32,767 signed
  • 32-bit numbers: Max range = 0 to 4,294,967,295 or -2,147,483,648 to +2,147,483,647 signed

When selecting a device for a particular application, its ability to meet any ‘number-crunching’ requirements in terms of speed and numeric resolution are vitally important considerations. That means having an accurate specification for inputs and outputs, and knowing how much processing is required to turn one into the other. For example:

  • A text-processing system that inputs 7-bit ASCII characters, then does some formatting before outputting to a printer is unlikely to need anything more than an 8-bit microcontroller such as a Microchip PIC16.
  • Any application needing serious digital signal processing of high-speed sampled analogue data is going to need at least 16-bits and more likely 32-bits depending on the algorithm. Specialised microcontrollers such as the 16-bit Microchip dsPIC33 are optimised for DSP work and can perform a 16 x 16 to 40-bit multiply-accumulate in a single clock cycle.


When you start performing arithmetic with fixed-point numbers, particularly multiplication, overflow errors can happen very quickly. Most processors can detect overflows – there’s often a special flag bit in the status register – but the response may be limited to an orderly shutdown with an error message. And we know how that worked out on Ariane-501. A possible solution is to ‘scale’ the numbers to keep them in range. In the following example, two 8-bit unsigned integers to be multiplied together are each scaled by a factor of 2 so the result fits well within a 16-bit range.

Before scaling, (decimal equivalents in brackets):

FFh x FFh (255 x 255) = FE01h (65,025) Close to the maximum allowed value of FFFFh (65,535). Subsequent operations could cause overflow. Divide each input number by 2, then multiply:

After scaling:

7Fh x 7Fh (127 x 127) = 3F01h (16,129) The result is actually scaled by 4 so now needs to be multiplied by 2:

3F01h x 2 = 7E02h (32,258) Final scaled result of multiplication. But, as a check, can the unscaled result be recovered by multiplying by 2 again?

7E02h x 2 = FC04h (64,516) Oops. Something has gone wrong. The reason is not hard to find. Dividing the input numbers by 2 did not generate an integer result: 255 / 2 = 127.5. The loss of that 0.5 makes quite a difference. Thankfully there is a better way:

Move the Binary Point (BP):

First of all, I’ll move to binary notation as it’s easier to see what’s going on when the BP is moved.

1111 1111. x 1111 1111. = 1111 1110 0000 0001. (255 x 255 = 65,025) Notice the BP.

Now instead of scaling by dividing by 2 (a right shift by 1 bit), move the BP one bit left:

1111 111.1 (127.5) Now the least significant bit has the value 2-1 or ½ .

1111 111.1 x 1111 111.1 = 1111 1110 0000 00.01 (127.5 x 127.5 = 16,256.25)

The multiply function is not changed and produces the same sequence of 1s and 0s as before, but with a BP two steps to the left. As before, the result needs multiplying by 2 to maintain the scale factor. Just move the BP one bit right to achieve this. Move it once more to the right and you’re back to the starting value of 65,025.

1111 1110 0000 00.01 x 2 = 1111 1110 0000 000.1 (32,512.5) The correct answer!

1111 1110 0000 000.1 x 2 = 1111 1110 0000 0001. (65,025) and checked.

It’s easy to see how this works: the 16-bit result FE01h remains unchanged at every stage, only the BP moves changing the value each time – no bits are ‘lost’ and accuracy remains 100%. Fixed-point format suffers from the trade-off between range and precision; word lengths become impractical with very large and very small numbers.

For many years some Digital Signal Processors (DSP) have featured hardware for processing Q format fixed-point numbers with fractions, but newer devices now have on-chip hardware to take advantage of a format that works with very large (or very small), high-precision numbers: Floating-Point.

Floating-Point numbers

A floating-point number is not just a formatted collection of 1’s and 0’s: it is a set of parameters that are manipulated by a mathematical expression. It allows extremely large, high-precision numbers to be encoded with a relatively small number of physical bits. The basic formula for a floating-point number is:

S M x Be where M is the Mantissa or Significand, an unsigned Integer. B is the Base or Radix, usually 2 or 10. e is the exponent, a signed integer.

S is the sign bit, evaluated separately.

M is the numeric value of the number normalised to the form 1.x

B is the base for the exponent. B = 2 if working in binary, B = 10 for decimal format

e is a signed integer setting the magnitude of the number. Positive for large numbers, negative for small.

A basic storage format for a 16-bit floating-point number is shown in Fig.2. Until the IEEE-754 standard was developed, all sorts of formats based on the principle were produced; none compatible with each other.

Fig 2 - Floating-Point Binary Numbers

When to choose Floating-Point

In the days before 32-bit microprocessors with floating-point hardware, any applications manipulating large, complex datasets and needing a wide numerical dynamic range would be running on a mainframe or minicomputer such as a DEC PDP- or VAX-11. A microprocessor would have to rely on a software implementation, often included with high-level compilers such as C. This resulted in programs with bloated libraries running incredibly slowly. Nowadays, a 32-bit ARM Cortex-M4 core micro contains an IEEE-754 Single-Precision Floating-Point Unit (FPU) with the extended instruction set to match. Even the new BBC micro:bit computer board (201-2414) has one of those! It’s very tempting to conclude that the days of fixed-point arithmetic are over, but for general purpose applications not crunching big numbers at high speed, it’s not worth it. The trouble is, FP only produces approximations for many numbers thanks to rounding errors. If the results of your arithmetic need to be exact integers for sending to GPIO ports then fixed-point/integer numbers will likely involve far less coding hassle.


A postscript for the Ariane-501 debacle: The faulty code module served no purpose once the rocket was off the launch-pad. Its only function was to align the system before launch and should have been turned off. Ironically, it was deliberately left running to allow an easy restart of the system in the event of a brief hold in the countdown. Instead…

If you're stuck for something to do, follow my posts on Twitter. I link to interesting articles on new electronics and related technologies, retweeting posts I spot about robots, space exploration and other issues.

Engineer, PhD, lecturer, freelance technical writer, blogger & tweeter interested in robots, AI, planetary explorers and all things electronic. STEM ambassador. Designed, built and programmed my first microcomputer in 1976. Still learning, still building, still coding today.
DesignSpark Electrical Logolinkedin