content.html

<h1>How Computers Use Numbers</h1>
<aside class="subheader">
    This is a submission for
    <a href="https://some.3b1b.co">SoME3</a>.
</aside>

<p>
    With the invention of the vacuum tube we have tricked electrons into doing simple logic.
    However, that's not too useful on its own.
    While it is enough to perform basically any computation, <em>actually having numbers</em> would be very helpful &mdash;
    basically everything that a modern computer does is based on arithmetic in some way, for example:
    <ul>
        <li>A spreadsheet can contain expressions with mathematical operators</li>
        <li>Playing a game might require calculating enemy positions</li>
        <li>
            Drawing this web page requires computing the positions and sizes of all the elements
            <em>and the text characters inside those elements</em>
        </li>
    </ul>
    So, let's invent some!
</p>

<h2>The goal</h2>
<p>
    First, we need to find what traits a good number format has, and thus we should be aiming for.
    These could be anything, but here are some of the ones I think are important
    (and, <em>coincidentally</em>, modern computers optimize for):
</p>
<dl>
    <dt>Usefulness in real programs</dt>
    <dd>
        Perhaps the most important trait, as the whole point of having the capability to do math is for programs to use it.
        Additionally, fitting as many use cases into as few formats as possible would be ideal,
        as having more ways to store numbers than necessary would make processors a lot more expensive and power&hyphen;hungry.
    </dd>
    <dt>Simplicity</dt>
    <dd>
        Similarly to avoiding too many formats, straightforward logic would also save on price and power usage.
    </dd>
    <dt>Efficient memory usage</dt>
    <dd>
        Memory is a limited resource; therefore using it as efficiently as possible would be ideal,
        for example by avoiding duplicate values.
    </dd>
</dl>

<h2>Wait, what even is memory?</h2>
<memory-demo id="memory-demo"></memory-demo>
<div class="note">
    Click on the bits to toggle them!
</div>
<p>
    Memory is a collection of cells holding one of two values (i.e. a bit). These can represent basically anything that has two states,
    though usually they are visualized using ones and zeroes.
</p>
<type-switcher target="#memory-demo" selected="1_0">
    <template data-type="1_0">1/0</template>
    <template data-type="T_F">T/F</template>
    <template data-type="X_O">X/O</template>
    <template data-type="@_*">@/*</template>
</type-switcher>
<p>
    Note that bits are usually grouped in eights (called a byte), so using a multiple of 8 would be easiest to work with.
    We'll use just 8 bits for simplicity, but all of the following number formats come in a variety of sizes.
</p>

<h2>Counting with bits</h2>
<p>
    Similarly to how we use the 10 digits to count in base 10, we can use bits to count in base 2.
    By assigning each bit to a power of 2, we can use our 8 bits to store numbers between 0 and 255
    (<math display="inline">
        <mrow>
            <msup>
                <mn>2</mn>
                <mn>8</mn>
            </msup>
            <mo>&minus;</mo>
            <mn>1</mn>
        </mrow>
    </math>).
    This format is called an <em>8&hyphen;bit unsigned integer</em>.
</p>
<integer-demo type="unsigned" value="53"></integer-demo>

<h2>Negative numbers</h2>
<p>
    An obvious way of defining negative numbers (<em>signed integers</em>) is to just use 1 bit as a sign.
</p>
<integer-demo type="sign-bit" value="0x92"></integer-demo>
<p>
    However, this way of doing it has just one small issue: there are two ways to write 0, as it is neither positive or negative.
</p>
<integer-demo type="sign-bit" value="0" locked>
    <template data-value="0">Zero</template>
    <template data-value="0x80">Also zero</template>
</integer-demo>
<p>
    Instead, we could interpret the number as if it was unsigned, but then offset it down by 128
    (<math display="inline">
        <mrow>
            <msup>
                <mn>2</mn>
                <mn>7</mn>
            </msup>
        </mrow>
    </math>)
    to be able to represent numbers from &minus;128 to 127:
</p>
<integer-demo type="shifted" value="0x80">
    <template data-value="0">&minus;128</template>
    <template data-value="0x7f">&minus;1</template>
    <template data-value="0x80">0</template>
    <template data-value="0x81">1</template>
    <template data-value="0xff">127</template>
</integer-demo>
<p>
    At first glance this way of doing negatives seems perfect!
    After all it's very simple, it covers a useful range and all the bit patterns correspond to unique integers.
</p>
<p>
    But we can do better.
</p>

<h3>Two's complement</h3>
<p>
    While implementing negative numbers via a fixed bias seems relatively optimal,
    we can actually significantly cut down on the amount of circuitry we need via a neat trick!
</p>
<p>
    A hypothetical computer which uses offset signed integers and regular unsigned integers would need to implement
    arithmetic and comparisons for both signed and unsigned integers, as well as conversions between the two.
</p>
<p>
    Since arithmetic and conversions are only defined for where the ranges of the two formats overlap,
    could we have the 0&ndash;127 range use the same bit patterns for both signed and unsigned integers?
    It turns out that the answer is yes! There are several ways to achieve this,
    but by far the most common approach is making the most significant bit subtract 128 from the total instead of adding it.
</p>
<integer-demo type="twos-complement" value="0xf4" id="twoc-switching-demo">
    <template data-value="0x80">&minus;128</template>
    <template data-value="0xfd">&minus;3</template>
    <template data-value="0x3">3</template>
    <template data-value="0x7f">127</template>
</integer-demo>
<p>
    Remember that &ldquo;unsigned integer&rdquo; and &ldquo;two's complement&rdquo; are just ways of interpreting a collection of bits.
    Because all the relevant bit patterns match, we can convert for free by just changing how we're interpreting the data.
</p>
<type-switcher target="#twoc-switching-demo" selected="twos-complement">
    <template data-type="unsigned">Unsigned</template>
    <template data-type="twos-complement">Two's complement</template>
</type-switcher>
<div class="note">
    (this toggle affects the demo above)
</div>
<p>
    Additionally, two's complement allows us to drop the circuitry for signed addition,
    subtraction and most of multiplication, and if you're familiar with the
    <a href="https://youtu.be/3gyHKCDq1YA">p&hyphen;adic numbers</a> you might already have a hunch as to why.
</p>
<addition-demo value1="0xf6" value2="5"></addition-demo>
<p>
    Notice that if the result would need 9 bits to be stored fully, the last bit would be dropped.
    This is called <em>overflow</em> and is what allows us to share the signed and unsigned logic.
    Dropping the last bit means that we're actually working in
    <a href="https://en.wikipedia.org/wiki/Modular_arithmetic">modular arithmetic</a> (specifically mod 256),
    where numbers &ldquo;wrap around&rdquo;, so the number after 255 is 0.
    In this system, every number has an additive inverse even without including negative numbers:
    for example, the inverse of 2 is 254, since
    <math display="inline">
        <mrow>
            <mn>2</mn>
            <mo>+</mo>
            <mn>254</mn>
            <mo>&equiv;</mo>
            <mn>0</mn>
            <mo>(mod</mo>
            <mn>256</mn>
            <mo>)</mo>
        </mrow>
    </math>. Because 254 behaves like &minus;2, it may as well <em>be &minus;2</em>.
</p>
<p>
    If you're familiar with the p-adics, those share the same property of not having an explicit sign
    but arithmetic working the same as for positive numbers,
    which is why negative numbers in two's complement look like 2-adic integers
    with the infinite ones &ldquo;chopped off&rdquo;.
</p>

<hr>

<h2>A binary point in the middle</h2>
<p>
    Since we have mastered integers, we can now go fractional.
    Let's start by taking a regular 32&hyphen;bit unsigned integer and putting a binary point in the middle:
</p>
<fixed-point-demo value="0x00038000">
    <template data-value="0x000000ac">0.00262</template>
    <template data-value="0x028683d8">646.515</template>
    <template data-value="0x23293840">9001.22</template>
</fixed-point-demo>
<p>
    This is called fixed point, and it's <em>not terrible</em>.
    We have drastically cut the range of numbers we can represent, <em>without including negative numbers</em>
    (as an integer, we could represent the range 0&ndash;4,294,967,295 with 32 bits, and this format has only 0 to just under 65536),
    and the precision we get out of that sacrifice isn't even that great.
    We could try putting the binary point somewhere else, but there really isn't a spot that gives both decent range and precision.
</p>
<p>
    However, it is very simple, and the fact that addition and subtraction work the same as for integers
    does make fixed point a good option in some niche cases.
</p>

<h2>Better accomodating real&hyphen;world data</h2>
<p>
    Having uniformly spaced values, despite being great for integers, hasn't worked out so well for fractions.
    So, how should we distribute them? Well, since we want to fit as many use cases as possible into one format,
    both atom&hyphen;sized and universe&hyphen;sized numbers should be supported.
    But in practice, universe&hyphen;sized numbers are less precise (in absolute terms) than atom&hyphen;sized ones.
    We can take advantage of this.
</p>

<h2>Moving the binary point</h2>
<p>
    Since larger numbers are inherently less precise, could we skip defining all those decimals for large values?
    Well, yes! Let's dedicate, say, 5 bits to defining where the binary point is.
    For large values, we can move the point to the right to not store the unnecessary fractional part,
    and for small numbers, we can omit all the integer bits by moving the point to the left.
</p>
<moving-point-demo value="0x68019000" type="simple">
    <template data-value="0xa5b8d800">750,000</template>
    <template data-value="0x00010625">0.0005 (inexact)</template>
</moving-point-demo>
<div class="note">(green corresponds to point position)</div>
<p>
    Also, since 5 bits give us more possible bit positions than there actually are with the remaining 27 value bits,
    we can use the remaining ones for <em>hypervalues</em>, by putting extra <em>virtual zeroes</em> to fill in the space.
</p>
<moving-point-demo value="0xe8000003" type="hypervalues">
    <template data-value="0xe3962028">120,340,560</template>
    <template data-value="0xfd9682f0">1.5 billion</template>
</moving-point-demo>
<p>
    However, even with this range extension, the format is still extremely wasteful &mdash; for example, you can write 0
    in 32 different ways as the point position does not matter.
    <!--
        Which representations are canonical:
        fractional values (27 point patterns): half of them, since ending with a 0 means we can shift the number to the right
        and add 1 to the point position
        integers (5 point patterns): half of them for the first 4 patterns; if it ends with a 0, we can shift the number to the right
        and subtract 1 from the point position
        11111 point pattern is all canonical
    -->
    In fact, only <strong>~52%</strong> of all bit patterns produce unique values in this system. Yikes!
</p>

<h2>Scientific notation</h2>
<p>
    In order to ensure uniqueness, we can take inspiration from a way you might see numbers written down in the real world &mdash;
    scientific notation! We will need to adapt it to better suit our needs, though.
</p>
<p>
    Remember that a number in scientific notation looks like
    <math display="inline">
        <mrow>
            <mn>2.574</mn>
            <mo>&sdot;</mo>
            <msup>
                <mn>10</mn>
                <mn>4</mn>
            </msup>
        </mrow>
    </math>,
    where the mantissa is a number with one digit in front of the decimal point and an arbitrary amount of digits after,
    and the exponent is any integer.
</p>
<p>
    For the exponent, let's start by dedicating 8 of our 32 bits to it.
    Since we need both positive and negative exponents to represent large and small numbers respectively,
    we need to use a signed format.
    We could try doing two's complement again, but we wouldn't actually be able to take advantage of its benefits very well in this case
    (as there is no existing unsigned logic we can reuse).
    Instead, let's just make it an unsigned integer and subtract 127 from it, as that would make the underlying implementation simpler.
</p>
<p>
    For the mantissa, let's just use the remaining 24 bits as a fixed&hyphen;point number with 1 binary digit before the binary point.
    As for its sign, note that two's complement or shifting would be incredibly awkward to use here.
    This is because we need to represent a balanced range,
    and ideally there would be an equal amount of values on either side of 0.
    But that would make the total amount of values odd (<math display="inline">
        <mrow>
            <mn>2</mn>
            <mo>&sdot;</mo>
            <mi>positive values</mi>
            <mo>+</mo>
            <mn>1</mn>
        </mrow>
    </math>), and the amount of values we have to work with is even (<math display="inline">
        <mrow>
            <msup>
                <mn>2</mn>
                <mi>bits</mi>
            </msup>
        </mrow>
    </math>).
    Therefore, since we can't use the extra value effectively, let's just waste it in the simplest way possible:
    dedicate one bit of the mantissa to the sign, negating the value when the bit is 1. (The extra value goes to &minus;0.)
</p>
<table>
    <caption>
        Exponent bit patterns
    </caption>
    <thead>
        <tr>
            <th>Bit pattern</th>
            <th>Meaning</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>11111111</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>128</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>11111110</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>11111101</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>126</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td colspan="2">&vellip;</td>
        </tr>
        <tr>
            <td>00000010</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>&minus;125</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>00000001</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>&minus;126</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>00000000</td>
            <td>
                <math display="inline"><mrow><msup><mn>2</mn><mn>&minus;127</mn></msup></mrow></math>
            </td>
        </tr>
    </tbody>
</table>
<floating-point-demo type="naive" value="0x40f00000"></floating-point-demo>
<div class="note">(red &ndash; sign bit, green &ndash; exponent, blue &ndash; mantissa)</div>

<h3>Actually ensuring uniqueness</h3>
<p>
    Our format does not guarantee uniqueness yet. This is because we skipped one requirement of scientific notation:
    the first digit of the mantissa must be non&hyphen;zero (otherwise you can change the exponent and get the same value).
    So, we have to force the first digit of the mantissa to be non&hyphen;zero.
</p>
<p>
    Notice that since we're working in binary, there is only one non&hyphen;zero digit: one!
    That means that we can omit storing it entirely, which allows us to double the mantissa's precision. Nice!
</p>
<h3>Oh wait</h3>
<p>
    Remember when I said that the first digit of the mantissa can't be zero? That's true, except for exactly one number: 0.
    Thus, we can't write it anymore. So, let's just set the all zeroes (except for possibly the sign bit)
    bit pattern to represent &pm;0.
</p>
<table>
    <caption>
        Exponent bit patterns
    </caption>
    <thead>
        <tr>
            <th>Bit pattern</th>
            <th>Meaning</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>11111111</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>128</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>11111110</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>11111101</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>126</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td colspan="2">&vellip;</td>
        </tr>
        <tr>
            <td>00000010</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>&minus;125</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>00000001</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>&minus;126</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>00000000</td>
            <td>
                <math display="inline"><mrow><msup><mn>2</mn><mn>&minus;127</mn></msup></mrow></math>
                &amp; special case: 0
            </td>
        </tr>
    </tbody>
</table>
<floating-point-demo type="unique" value="0"></floating-point-demo>

<h2>Handling math with varying precision</h2>
<p>
    Because of the nature of having limited precision, we're going to encounter results that don't have bit representations
    quite often when doing basically any kind of math. (Indeed, the 4 billion values cover exactly 0% of the number line.)
</p>
<p>
    The obvious solution to that is rounding: after each mathematical operation, if the result does not have a bit pattern in our system,
    it snaps to the closest available value.
    (The default rounding mode is supposed to be <a href="https://en.wikipedia.org/wiki/Rounding#Rounding_half_to_even">rounding half to even</a>,
    as it minimizes error accumulation &mdash; but most CPUs use round&hyphen;to&hyphen;nearest by default anyway.)
</p>
<p>
    Notice that now, each bit pattern actually represents a range of values &mdash;
    the actual result that rounded to it could have been anywhere within the rounding window.
</p>
<p>
    Also, remember how we had an extra zero? We can now have +0 represent the positive half of the range and vice versa.
</p>

<h2>A giant gap</h2>
<p>
    Our format is currently <em>not great</em> at representing small numbers.
    Sure, we can store 0, but then we have a comparatively huge gap from anything else:
    the smallest non&hyphen;negative numbers in our format are
    0, then
    <math display="inline">
        <mrow>
            <mn>1.000000119</mn>
            <mo>&sdot;</mo>
            <msup>
                <mn>2</mn>
                <mn>&minus;127</mn>
            </msup>
        </mrow>
    </math>, then
    <math display="inline">
        <mrow>
            <mn>1.000000238</mn>
            <mo>&sdot;</mo>
            <msup>
                <mn>2</mn>
                <mn>&minus;127</mn>
            </msup>
        </mrow>
    </math>, and so on.
    That means that the second gap is only 0.000012% of the first one.
</p>
<p>
    To rectify our small number issues, we can make the whole lowest exponent have its mantissa start with a 0
    instead of just one specific value.
    (We also need to increase it by 1 to not leave a hole where the
    <math display="inline">
        <mrow>
            <msup>
                <mn>2</mn>
                <mn>&minus;127</mn>
            </msup>
        </mrow>
    </math>&hyphen;sized numbers originally were).
    Let's call these <em>subnormal numbers</em>.
</p>
<p>
    The smallest representable positive numbers are now
    <math display="inline">
        <mrow>
            <msup>
                <mn>2</mn>
                <mn>&minus;149</mn>
            </msup>
        </mrow>
    </math>, then
    <math display="inline">
        <mrow>
            <mn>2</mn>
            <mo>&sdot;</mo>
            <msup>
                <mn>2</mn>
                <mn>&minus;149</mn>
            </msup>
        </mrow>
    </math>,
    <math display="inline">
        <mrow>
            <mn>3</mn>
            <mo>&sdot;</mo>
            <msup>
                <mn>2</mn>
                <mn>&minus;149</mn>
            </msup>
        </mrow>
    </math> and so on.
    That means that not only do we have much more uniform gaps, but we even extended the range a little
    (though at the cost of some precision).
</p>
<table>
    <caption>
        Exponent bit patterns
    </caption>
    <thead>
        <tr>
            <th>Bit pattern</th>
            <th>Meaning</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>11111111</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>128</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>11111110</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>11111101</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>126</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td colspan="2">&vellip;</td>
        </tr>
        <tr>
            <td>00000010</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>&minus;125</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>00000001</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>&minus;126</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>00000000</td>
            <td>
                <math display="inline"><mrow><msup><mn>2</mn><mn>&minus;126</mn></msup></mrow></math>
                (subnormals)
            </td>
        </tr>
    </tbody>
</table>
<floating-point-demo type="subnormals" value="0x3f800000">
    <template data-value="0">0</template>
    <template data-value="0x000116c3">
        <math display="inline">
            <mrow>
                <msup>
                    <mn>10</mn>
                    <mn>&minus;40</mn>
                </msup>
            </mrow>
        </math>
    </template>
    <template data-value="0x0000001">smallest representable value</template>
</floating-point-demo>

<h2>Extending the range further</h2>
<p>
    Having a special value for numbers outside the normal range would be useful.
    For example, since zero could actually be any number close to 0, dividing by it
    <em>would result in an actual value</em>! We can't know it exactly (since we threw away that precision),
    so it would just result in the too&hyphen;big sentinel.
    This means that it effectively covers the part of the number line from the (new) largest regular value all the way to &infin;.
</p>
<p>
    So, let's dedicate the current largest regular value's bit pattern to it!
    Since our format is symmetric we'll also define a negative counterpart to have the same bit pattern as the positive one
    except with the sign bit being 1.
</p>
<floating-point-demo type="large-sentinel" value="0x7fffffff"></floating-point-demo>
<h3>Properties of an indeterminate value</h3>
<p>
    Since the our new values cover infinitely large parts of the number line, the numbers they represent could be pretty much anywhere.
    So, how does math work with a number that we only know is very large?
    Well, in practice, it only really makes sense to give it the same properties that &infin;
    (or, more precisely, a limit that diverges to &infin;) has traditionally,
    namely that it's contagious &mdash; most operations involving infinities will result in another infinity:
</p>
<ul>
    <li>
        <math display="inline">
            <mrow>
                <mi>&infin;</mi><mo>+</mo><mn>1</mn><mo>=</mo><mi>&infin;</mi>
            </mrow>
        </math>
    </li>
    <li>
        <math display="inline">
            <mrow>
                <mi>&infin;</mi><mo>&sdot;</mo><mo>(</mo><mn>&minus;2</mn><mo>)</mo><mo>=</mo><mi>&minus;&infin;</mi>
            </mrow>
        </math>
    </li>
    <li>
        <math display="inline">
            <mrow>
                <mi>&infin;</mi><mo>+</mo><mi>&infin;</mi><mo>=</mo><mi>&infin;</mi>
            </mrow>
        </math>
    </li>
    <li>
        <math display="inline">
            <mrow>
                <mi>&infin;</mi><mo>&gt;</mo><mn>2</mn>
            </mrow>
        </math> is <code>true</code>
    </li>
    <li>
        <math display="inline">
            <mrow>
                <mi>&infin;</mi><mo>&minus;</mo><mi>&infin;</mi><mo>=</mo><mi>?</mi>
            </mrow>
        </math>
    </li>
    <li>
        <math display="inline">
            <mrow>
                <mi>&infin;</mi><mo>&sdot;</mo><mn>0</mn><mo>=</mo><mi>?</mi>
            </mrow>
        </math>
    </li>
</ul>
<p>
    Since our new value behaves like an infinity, it would make sense to call it that as well.
</p>
<table>
    <caption>
        Exponent bit patterns
    </caption>
    <thead>
        <tr>
            <th>Bit pattern</th>
            <th>Meaning</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>11111111</td>
            <td>
                <math display="inline"><mrow><msup><mn>2</mn><mn>128</mn></msup></mrow></math>
                &amp; special case: infinities
            </td>
        </tr>
        <tr>
            <td>11111110</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>11111101</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>126</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td colspan="2">&vellip;</td>
        </tr>
        <tr>
            <td>00000010</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>&minus;125</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>00000001</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>&minus;126</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>00000000</td>
            <td>
                <math display="inline"><mrow><msup><mn>2</mn><mn>&minus;126</mn></msup></mrow></math>
                (subnormals)
            </td>
        </tr>
    </tbody>
</table>
<floating-point-demo type="infinities" value="0xffffffff"></floating-point-demo>

<h2>Dealing with those question marks</h2>
<p>
    Since two infinities could have been a result of rounding numbers of very different magnitudes,
    we can't assign any sensible value to
    <math display="inline">
        <mrow>
            <mi>&infin;</mi><mo>&minus;</mo><mi>&infin;</mi>
        </mrow>
    </math>.
    Similarly,
    <math display="inline">
        <mrow>
            <mi>&infin;</mi><mo>&sdot;</mo><mn>0</mn>
        </mrow>
    </math>
    can't have a definite value assigned to it &mdash;
    a very big number we don't know exactly times a very small number we don't know exactly could be
    <em>literally anything</em>.
</p>
<p>
    So, what should happen when a question&hyphen;mark operation happens?
    Well, first of all, an error should probably be raised somewhere.
    However, some programs might not want to be interrupted when working with these kinds of values,
    so we can't do much more than setting a flag in the processor.
</p>
<p>
    But, since a question&hyphen;mark operation can't suspend execution, it needs to return a value.
    Let's call it <em>Not a Number</em>, or NaN. Now, where do we put it?
</p>
<p>
    Note that since there are several different operations that could produce NaNs,
    it might be useful to tell what produced a NaN from the NaN itself.
    In other words, we need to be able to attach extra data to a NaN.
</p>
<p>
    Since we need to attach extra data to NaNs, let's dedicate an entire exponent to them, say, the highest one.
    We'll need to relocate the infinities, so let's redefine them to be an all 0 mantissa with a maximum exponent
    (and an arbitrary sign bit). All the other mantissa values can now correspond to NaNs.
</p>
<h3>Properties of an <em>even more</em> indeterminate value</h3>
<p>
    A NaN, similarly to an infinity, could theoretically be an actual number where
    <em>all</em> of the information about it has been discarded. It could also be
    something which isn't a real number at all, such as a square root of a negative number.
    Therefore:
</p>
<ul>
    <li>
        NaNs are contagious, like infinities &mdash;
        pretty much all arithmetic operations involving a NaN will produce another NaN,
        since we can't derive a specific value from a lack of information
    </li>
    <li>
        All comparisons with NaNs return <code>false</code>
        <ul>
            <li>
                This means that <math display="inline">
                    <mrow>
                        <mi>x</mi><mo>=</mo><mi>x</mi>
                    </mrow>
                </math> is <code>false</code> if and only if x is NaN,
                which can be used as a test for whether you're dealing with one
            </li>
            <li>
                <code>false</code> isn't explicitly meaningful here, but comparisons have to return a value
                and <code>false</code> is marginally more convenient due to allowing for an easy NaN test
            </li>
        </ul>
    </li>
    <li>
        While generating a NaN will usually set an error flag, using it in operations does not do that
    </li>
    <li>
        However, a NaN that emits an error when used could still be useful,
        so let's add the option for NaNs to be <em>signaling</em>
        <ul>
            <li>
                When used in an arithmetic operation, a signaling NaN will set the invalid operation error flag
                and the result will be a regular (quiet) NaN
            </li>
            <li>
                This could, for example, be used for additional error&hyphen;checking
            </li>
            <li>
                In practice, signaling NaNs aren't very common
            </li>
        </ul>
    </li>
</ul>
<table>
    <caption>
        Exponent bit patterns
    </caption>
    <thead>
        <tr>
            <th>Bit pattern</th>
            <th>Meaning</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>11111111</td>
            <td>
                Infinities and NaNs
            </td>
        </tr>
        <tr>
            <td>11111110</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>11111101</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>126</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td colspan="2">&vellip;</td>
        </tr>
        <tr>
            <td>00000010</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>&minus;125</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>00000001</td>
            <td><math display="inline"><mrow><msup><mn>2</mn><mn>&minus;126</mn></msup></mrow></math></td>
        </tr>
        <tr>
            <td>00000000</td>
            <td>
                <math display="inline"><mrow><msup><mn>2</mn><mn>&minus;126</mn></msup></mrow></math>
                (subnormals)
            </td>
        </tr>
    </tbody>
</table>
<floating-point-demo type="ieee754">
    <template data-value="0">0</template>
    <template data-value="0x14c59342">mass of a carbon atom (kg)</template>
    <template data-value="0x3f800000">1</template>
    <template data-value="0x523a43b7">galaxies in the universe</template>
    <template data-value="0x7f800000">Infinity</template>
    <template data-value="0x7f800001">NaN</template>
</floating-point-demo>

<hr>

<h2>Conclusion</h2>
<p>
    We have now reinvented the three most common number formats used by modern computers:
    unsigned integers, two's complement integers and IEEE 754 floating point numbers
    (well, mostly &mdash; some details have been left out, mostly regarding NaNs and some arithmetic edge cases).
</p>
<p>
    Note that these formats come in a variety of sizes &mdash;
    8&hyphen;bit, 16&hyphen;bit, 32&hyphen;bit and 64&hyphen;bit integers are common,
    and floats typically come in 32&hyphen;bit and 64&hyphen;bit (which has 11 bits of exponent and 52 bits of mantissa) varieties.
</p>
<p>
    While the basic functionality of the above&hyphen;mentioned number formats has been explained,
    they also lend themselves to various bit-manipulation exploits. I won't be explaining them here,
    but here are some of my favourites:
</p>
<ul>
    <li><a href="https://stackoverflow.com/questions/73646834#answer-73655977">
        Fast division by a constant
    </a></li>
    <li><a href="https://youtu.be/p8u_k2LIZyo">
        Fast inverse square root
    </a></li>
</ul>
<p>
    Also, while integers and floats are what the processor works with directly, programs can use more complex number formats
    based upon the fundamental ones, for example complex numbers are represented as a pair of floats,
    arbitrary&hyphen;precision integers are a big block of data interpreted as an integer in software
    and decimal numbers have their digits represented with groups of bits
    (which is useful if you're working with something inherently base&hyphen;10 and don't want to
    <a href="https://0.30000000000000004.com/">lose precision</a>, such as currency).
</p>
<p>
    Lastly, if you ever want to play with floating-point numbers,
    <!-- polska gurom -->
    check out <a href="https://float.exposed">float.exposed</a>!
    It provides support for more floating-point formats
    and has additional visuals over the demos in this article.
</p>