In most programming languages, floating point numbers are represented a lot like scientific notation: with an exponent and a mantissa (also called the significand). A very simple number, say 9.2, is actually this fraction:
Where the exponent is -49 and the mantissa is 5179139571476070. The reason it is impossible to represent some decimal numbers this way is that both the exponent and the mantissa must be integers. In other words, all floats must be an integer multiplied by an integer power of 2.
9.2 may be simply 92/10, but 10 cannot be expressed as 2n if n is limited to integer values.
First, a few functions to see the components that make a 32- and 64-bit float. Gloss over these if you only care about the output (example in Python):
def float_to_bin_parts(number, bits=64):
if bits == 32: # single precision
int_pack = 'I'
float_pack = 'f'
exponent_bits = 8
mantissa_bits = 23
exponent_bias = 127
elif bits == 64: # double precision. all python floats are this
int_pack = 'Q'
float_pack = 'd'
exponent_bits = 11
mantissa_bits = 52
exponent_bias = 1023
raise ValueError, 'bits argument must be 32 or 64'
bin_iter = iter(bin(struct.unpack(int_pack, struct.pack(float_pack, number)))[2:].rjust(bits, '0'))
return [''.join(islice(bin_iter, x)) for x in (1, exponent_bits, mantissa_bits)]
There's a lot of complexity behind that function, and it'd be quite the tangent to explain, but if you're interested, the important resource for our purposes is the struct module.
Python's float is a 64-bit, double-precision number. In other languages such as C, C++, Java and C#, double-precision has a separate type double, which is often implemented as 64 bits.
When we call that function with our example, 9.2, here's what we get:
['0', '10000000010', '0010011001100110011001100110011001100110011001100110']
You'll see I've split the return value into three components. These components are:
Mantissa (also called Significand, or Fraction)
The sign is stored in the first component as a single bit. It's easy to explain: 0 means the float is a positive number; 1 means it's negative. Because 9.2 is positive, our sign value is 0.
The exponent is stored in the middle component as 11 bits. In our case, 0b10000000010. In decimal, that represents the value 1026. A quirk of this component is that you must subtract a number equal to 2(# of bits) - 1 - 1 to get the true exponent; in our case, that means subtracting 0b1111111111 (decimal number 1023) to get the true exponent, 0b00000000011 (decimal number 3).
The mantissa is stored in the third component as 52 bits. However, there's a quirk to this component as well. To understand this quirk, consider a number in scientific notation, like this:
The mantissa would be the 6.0221413. Recall that the mantissa in scientific notation always begins with a single non-zero digit. The same holds true for binary, except that binary only has two digits: 0 and 1. So the binary mantissa always starts with 1! When a float is stored, the 1 at the front of the binary mantissa is omitted to save space; we have to place it back at the front of our third element to get the true mantissa:
This involves more than just a simple addition, because the bits stored in our third component actually represent the fractional part of the mantissa, to the right of the radix point.
When dealing with decimal numbers, we "move the decimal point" by multiplying or dividing by powers of 10. In binary, we can do the same thing by multiplying or dividing by powers of 2. Since our third element has 52 bits, we divide it by 252 to move it 52 places to the right:
In decimal notation, that's the same as dividing 675539944105574 by 4503599627370496 to get 0.1499999999999999. (This is one example of a ratio that can be expressed exactly in binary, but only approximately in decimal; for more detail, see: 675539944105574 / 4503599627370496.)
Now that we've transformed the third component into a fractional number, adding 1 gives the true mantissa.
Exponent (middle component): Subtract 2(# of bits) - 1 - 1 to get the true exponent
Mantissa (last component): Divide by 2(# of bits) and add 1 to get the true mantissa
Putting all three parts together, we're given this binary number:
Which we can then convert from binary to decimal:
And multiply to reveal the final representation of the number we started with (9.2) after being stored as a floating point value:
Now that we've built the number, it's possible to reconstruct it into a simple fraction:
Shift mantissa to a whole number:
['0', '10000000010', '0011000000000000000000000000000000000000000000000000']
Already you can see the mantissa is only 4 digits followed by a whole lot of zeroes. But let's go through the paces.
Assemble the binary scientific notation:
There is also a nice tutorial that shows how to go the other way - given a decimal representation of a number, how do you construct the floating point equivalent. The "long division" approach shows very clearly how you end up with a "remainder" after trying to represent the number. Should be added if you want to be truly "canonical" with your answer.
If you're talking about Python and floating-point, I'd suggest at least including the Python tutorial in your links: docs.python.org/3.4/tutorial/floatingpoint.html That's supposed to be the one-stop go-to resource for floating-point issues for Python programmers. If it's lacking in some way (and it almost surely is), please do open an issue on the Python bug tracker for updates or changes.
This answer should definitely also link to floating-point-gui.de, as it's probably the best introduction for beginners. IMO, it should even go above "What every computer scientist should know..." -- these days, people who can reasonably comprehend Goldberg's paper usually are already well aware of it.
"This is one example of a ratio that can be expressed exactly in binary, but only approximately in decimal". This is not true. All of these 'number over a power of two' ratios are exact in decimal. Any approximation is only to shorten the decimal number -- for convenience.