-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathcontent.html
867 lines (849 loc) · 33.5 KB
/
content.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
<h1>How Computers Use Numbers</h1>
<aside class="subheader">
This is a submission for
<a href="https://some.3b1b.co">SoME3</a>.
</aside>
<p>
With the invention of the vacuum tube we have tricked electrons into doing simple logic.
However, that's not too useful on its own.
While it is enough to perform basically any computation, <em>actually having numbers</em> would be very helpful —
basically everything that a modern computer does is based on arithmetic in some way, for example:
<ul>
<li>A spreadsheet can contain expressions with mathematical operators</li>
<li>Playing a game might require calculating enemy positions</li>
<li>
Drawing this web page requires computing the positions and sizes of all the elements
<em>and the text characters inside those elements</em>
</li>
</ul>
So, let's invent some!
</p>
<h2>The goal</h2>
<p>
First, we need to find what traits a good number format has, and thus we should be aiming for.
These could be anything, but here are some of the ones I think are important
(and, <em>coincidentally</em>, modern computers optimize for):
</p>
<dl>
<dt>Usefulness in real programs</dt>
<dd>
Perhaps the most important trait, as the whole point of having the capability to do math is for programs to use it.
Additionally, fitting as many use cases into as few formats as possible would be ideal,
as having more ways to store numbers than necessary would make processors a lot more expensive and power‐hungry.
</dd>
<dt>Simplicity</dt>
<dd>
Similarly to avoiding too many formats, straightforward logic would also save on price and power usage.
</dd>
<dt>Efficient memory usage</dt>
<dd>
Memory is a limited resource; therefore using it as efficiently as possible would be ideal,
for example by avoiding duplicate values.
</dd>
</dl>
<h2>Wait, what even is memory?</h2>
<memory-demo id="memory-demo"></memory-demo>
<div class="note">
Click on the bits to toggle them!
</div>
<p>
Memory is a collection of cells holding one of two values (i.e. a bit). These can represent basically anything that has two states,
though usually they are visualized using ones and zeroes.
</p>
<type-switcher target="#memory-demo" selected="1_0">
<template data-type="1_0">1/0</template>
<template data-type="T_F">T/F</template>
<template data-type="X_O">X/O</template>
<template data-type="@_*">@/*</template>
</type-switcher>
<p>
Note that bits are usually grouped in eights (called a byte), so using a multiple of 8 would be easiest to work with.
We'll use just 8 bits for simplicity, but all of the following number formats come in a variety of sizes.
</p>
<h2>Counting with bits</h2>
<p>
Similarly to how we use the 10 digits to count in base 10, we can use bits to count in base 2.
By assigning each bit to a power of 2, we can use our 8 bits to store numbers between 0 and 255
(<math display="inline">
<mrow>
<msup>
<mn>2</mn>
<mn>8</mn>
</msup>
<mo>−</mo>
<mn>1</mn>
</mrow>
</math>).
This format is called an <em>8‐bit unsigned integer</em>.
</p>
<integer-demo type="unsigned" value="53"></integer-demo>
<h2>Negative numbers</h2>
<p>
An obvious way of defining negative numbers (<em>signed integers</em>) is to just use 1 bit as a sign.
</p>
<integer-demo type="sign-bit" value="0x92"></integer-demo>
<p>
However, this way of doing it has just one small issue: there are two ways to write 0, as it is neither positive or negative.
</p>
<integer-demo type="sign-bit" value="0" locked>
<template data-value="0">Zero</template>
<template data-value="0x80">Also zero</template>
</integer-demo>
<p>
Instead, we could interpret the number as if it was unsigned, but then offset it down by 128
(<math display="inline">
<mrow>
<msup>
<mn>2</mn>
<mn>7</mn>
</msup>
</mrow>
</math>)
to be able to represent numbers from −128 to 127:
</p>
<integer-demo type="shifted" value="0x80">
<template data-value="0">−128</template>
<template data-value="0x7f">−1</template>
<template data-value="0x80">0</template>
<template data-value="0x81">1</template>
<template data-value="0xff">127</template>
</integer-demo>
<p>
At first glance this way of doing negatives seems perfect!
After all it's very simple, it covers a useful range and all the bit patterns correspond to unique integers.
</p>
<p>
But we can do better.
</p>
<h3>Two's complement</h3>
<p>
While implementing negative numbers via a fixed bias seems relatively optimal,
we can actually significantly cut down on the amount of circuitry we need via a neat trick!
</p>
<p>
A hypothetical computer which uses offset signed integers and regular unsigned integers would need to implement
arithmetic and comparisons for both signed and unsigned integers, as well as conversions between the two.
</p>
<p>
Since arithmetic and conversions are only defined for where the ranges of the two formats overlap,
could we have the 0–127 range use the same bit patterns for both signed and unsigned integers?
It turns out that the answer is yes! There are several ways to achieve this,
but by far the most common approach is making the most significant bit subtract 128 from the total instead of adding it.
</p>
<integer-demo type="twos-complement" value="0xf4" id="twoc-switching-demo">
<template data-value="0x80">−128</template>
<template data-value="0xfd">−3</template>
<template data-value="0x3">3</template>
<template data-value="0x7f">127</template>
</integer-demo>
<p>
Remember that “unsigned integer” and “two's complement” are just ways of interpreting a collection of bits.
Because all the relevant bit patterns match, we can convert for free by just changing how we're interpreting the data.
</p>
<type-switcher target="#twoc-switching-demo" selected="twos-complement">
<template data-type="unsigned">Unsigned</template>
<template data-type="twos-complement">Two's complement</template>
</type-switcher>
<div class="note">
(this toggle affects the demo above)
</div>
<p>
Additionally, two's complement allows us to drop the circuitry for signed addition,
subtraction and most of multiplication, and if you're familiar with the
<a href="https://youtu.be/3gyHKCDq1YA">p‐adic numbers</a> you might already have a hunch as to why.
</p>
<addition-demo value1="0xf6" value2="5"></addition-demo>
<p>
Notice that if the result would need 9 bits to be stored fully, the last bit would be dropped.
This is called <em>overflow</em> and is what allows us to share the signed and unsigned logic.
Dropping the last bit means that we're actually working in
<a href="https://en.wikipedia.org/wiki/Modular_arithmetic">modular arithmetic</a> (specifically mod 256),
where numbers “wrap around”, so the number after 255 is 0.
In this system, every number has an additive inverse even without including negative numbers:
for example, the inverse of 2 is 254, since
<math display="inline">
<mrow>
<mn>2</mn>
<mo>+</mo>
<mn>254</mn>
<mo>≡</mo>
<mn>0</mn>
<mo>(mod</mo>
<mn>256</mn>
<mo>)</mo>
</mrow>
</math>. Because 254 behaves like −2, it may as well <em>be −2</em>.
</p>
<p>
If you're familiar with the p-adics, those share the same property of not having an explicit sign
but arithmetic working the same as for positive numbers,
which is why negative numbers in two's complement look like 2-adic integers
with the infinite ones “chopped off”.
</p>
<hr>
<h2>A binary point in the middle</h2>
<p>
Since we have mastered integers, we can now go fractional.
Let's start by taking a regular 32‐bit unsigned integer and putting a binary point in the middle:
</p>
<fixed-point-demo value="0x00038000">
<template data-value="0x000000ac">0.00262</template>
<template data-value="0x028683d8">646.515</template>
<template data-value="0x23293840">9001.22</template>
</fixed-point-demo>
<p>
This is called fixed point, and it's <em>not terrible</em>.
We have drastically cut the range of numbers we can represent, <em>without including negative numbers</em>
(as an integer, we could represent the range 0–4,294,967,295 with 32 bits, and this format has only 0 to just under 65536),
and the precision we get out of that sacrifice isn't even that great.
We could try putting the binary point somewhere else, but there really isn't a spot that gives both decent range and precision.
</p>
<p>
However, it is very simple, and the fact that addition and subtraction work the same as for integers
does make fixed point a good option in some niche cases.
</p>
<h2>Better accomodating real‐world data</h2>
<p>
Having uniformly spaced values, despite being great for integers, hasn't worked out so well for fractions.
So, how should we distribute them? Well, since we want to fit as many use cases as possible into one format,
both atom‐sized and universe‐sized numbers should be supported.
But in practice, universe‐sized numbers are less precise (in absolute terms) than atom‐sized ones.
We can take advantage of this.
</p>
<h2>Moving the binary point</h2>
<p>
Since larger numbers are inherently less precise, could we skip defining all those decimals for large values?
Well, yes! Let's dedicate, say, 5 bits to defining where the binary point is.
For large values, we can move the point to the right to not store the unnecessary fractional part,
and for small numbers, we can omit all the integer bits by moving the point to the left.
</p>
<moving-point-demo value="0x68019000" type="simple">
<template data-value="0xa5b8d800">750,000</template>
<template data-value="0x00010625">0.0005 (inexact)</template>
</moving-point-demo>
<div class="note">(green corresponds to point position)</div>
<p>
Also, since 5 bits give us more possible bit positions than there actually are with the remaining 27 value bits,
we can use the remaining ones for <em>hypervalues</em>, by putting extra <em>virtual zeroes</em> to fill in the space.
</p>
<moving-point-demo value="0xe8000003" type="hypervalues">
<template data-value="0xe3962028">120,340,560</template>
<template data-value="0xfd9682f0">1.5 billion</template>
</moving-point-demo>
<p>
However, even with this range extension, the format is still extremely wasteful — for example, you can write 0
in 32 different ways as the point position does not matter.
<!--
Which representations are canonical:
fractional values (27 point patterns): half of them, since ending with a 0 means we can shift the number to the right
and add 1 to the point position
integers (5 point patterns): half of them for the first 4 patterns; if it ends with a 0, we can shift the number to the right
and subtract 1 from the point position
11111 point pattern is all canonical
-->
In fact, only <strong>~52%</strong> of all bit patterns produce unique values in this system. Yikes!
</p>
<h2>Scientific notation</h2>
<p>
In order to ensure uniqueness, we can take inspiration from a way you might see numbers written down in the real world —
scientific notation! We will need to adapt it to better suit our needs, though.
</p>
<p>
Remember that a number in scientific notation looks like
<math display="inline">
<mrow>
<mn>2.574</mn>
<mo>⋅</mo>
<msup>
<mn>10</mn>
<mn>4</mn>
</msup>
</mrow>
</math>,
where the mantissa is a number with one digit in front of the decimal point and an arbitrary amount of digits after,
and the exponent is any integer.
</p>
<p>
For the exponent, let's start by dedicating 8 of our 32 bits to it.
Since we need both positive and negative exponents to represent large and small numbers respectively,
we need to use a signed format.
We could try doing two's complement again, but we wouldn't actually be able to take advantage of its benefits very well in this case
(as there is no existing unsigned logic we can reuse).
Instead, let's just make it an unsigned integer and subtract 127 from it, as that would make the underlying implementation simpler.
</p>
<p>
For the mantissa, let's just use the remaining 24 bits as a fixed‐point number with 1 binary digit before the binary point.
As for its sign, note that two's complement or shifting would be incredibly awkward to use here.
This is because we need to represent a balanced range,
and ideally there would be an equal amount of values on either side of 0.
But that would make the total amount of values odd (<math display="inline">
<mrow>
<mn>2</mn>
<mo>⋅</mo>
<mi>positive values</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
</math>), and the amount of values we have to work with is even (<math display="inline">
<mrow>
<msup>
<mn>2</mn>
<mi>bits</mi>
</msup>
</mrow>
</math>).
Therefore, since we can't use the extra value effectively, let's just waste it in the simplest way possible:
dedicate one bit of the mantissa to the sign, negating the value when the bit is 1. (The extra value goes to −0.)
</p>
<table>
<caption>
Exponent bit patterns
</caption>
<thead>
<tr>
<th>Bit pattern</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>11111111</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>128</mn></msup></mrow></math></td>
</tr>
<tr>
<td>11111110</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow></math></td>
</tr>
<tr>
<td>11111101</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>126</mn></msup></mrow></math></td>
</tr>
<tr>
<td colspan="2">⋮</td>
</tr>
<tr>
<td>00000010</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>−125</mn></msup></mrow></math></td>
</tr>
<tr>
<td>00000001</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>−126</mn></msup></mrow></math></td>
</tr>
<tr>
<td>00000000</td>
<td>
<math display="inline"><mrow><msup><mn>2</mn><mn>−127</mn></msup></mrow></math>
</td>
</tr>
</tbody>
</table>
<floating-point-demo type="naive" value="0x40f00000"></floating-point-demo>
<div class="note">(red – sign bit, green – exponent, blue – mantissa)</div>
<h3>Actually ensuring uniqueness</h3>
<p>
Our format does not guarantee uniqueness yet. This is because we skipped one requirement of scientific notation:
the first digit of the mantissa must be non‐zero (otherwise you can change the exponent and get the same value).
So, we have to force the first digit of the mantissa to be non‐zero.
</p>
<p>
Notice that since we're working in binary, there is only one non‐zero digit: one!
That means that we can omit storing it entirely, which allows us to double the mantissa's precision. Nice!
</p>
<h3>Oh wait</h3>
<p>
Remember when I said that the first digit of the mantissa can't be zero? That's true, except for exactly one number: 0.
Thus, we can't write it anymore. So, let's just set the all zeroes (except for possibly the sign bit)
bit pattern to represent ±0.
</p>
<table>
<caption>
Exponent bit patterns
</caption>
<thead>
<tr>
<th>Bit pattern</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>11111111</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>128</mn></msup></mrow></math></td>
</tr>
<tr>
<td>11111110</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow></math></td>
</tr>
<tr>
<td>11111101</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>126</mn></msup></mrow></math></td>
</tr>
<tr>
<td colspan="2">⋮</td>
</tr>
<tr>
<td>00000010</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>−125</mn></msup></mrow></math></td>
</tr>
<tr>
<td>00000001</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>−126</mn></msup></mrow></math></td>
</tr>
<tr>
<td>00000000</td>
<td>
<math display="inline"><mrow><msup><mn>2</mn><mn>−127</mn></msup></mrow></math>
& special case: 0
</td>
</tr>
</tbody>
</table>
<floating-point-demo type="unique" value="0"></floating-point-demo>
<h2>Handling math with varying precision</h2>
<p>
Because of the nature of having limited precision, we're going to encounter results that don't have bit representations
quite often when doing basically any kind of math. (Indeed, the 4 billion values cover exactly 0% of the number line.)
</p>
<p>
The obvious solution to that is rounding: after each mathematical operation, if the result does not have a bit pattern in our system,
it snaps to the closest available value.
(The default rounding mode is supposed to be <a href="https://en.wikipedia.org/wiki/Rounding#Rounding_half_to_even">rounding half to even</a>,
as it minimizes error accumulation — but most CPUs use round‐to‐nearest by default anyway.)
</p>
<p>
Notice that now, each bit pattern actually represents a range of values —
the actual result that rounded to it could have been anywhere within the rounding window.
</p>
<p>
Also, remember how we had an extra zero? We can now have +0 represent the positive half of the range and vice versa.
</p>
<h2>A giant gap</h2>
<p>
Our format is currently <em>not great</em> at representing small numbers.
Sure, we can store 0, but then we have a comparatively huge gap from anything else:
the smallest non‐negative numbers in our format are
0, then
<math display="inline">
<mrow>
<mn>1.000000119</mn>
<mo>⋅</mo>
<msup>
<mn>2</mn>
<mn>−127</mn>
</msup>
</mrow>
</math>, then
<math display="inline">
<mrow>
<mn>1.000000238</mn>
<mo>⋅</mo>
<msup>
<mn>2</mn>
<mn>−127</mn>
</msup>
</mrow>
</math>, and so on.
That means that the second gap is only 0.000012% of the first one.
</p>
<p>
To rectify our small number issues, we can make the whole lowest exponent have its mantissa start with a 0
instead of just one specific value.
(We also need to increase it by 1 to not leave a hole where the
<math display="inline">
<mrow>
<msup>
<mn>2</mn>
<mn>−127</mn>
</msup>
</mrow>
</math>‐sized numbers originally were).
Let's call these <em>subnormal numbers</em>.
</p>
<p>
The smallest representable positive numbers are now
<math display="inline">
<mrow>
<msup>
<mn>2</mn>
<mn>−149</mn>
</msup>
</mrow>
</math>, then
<math display="inline">
<mrow>
<mn>2</mn>
<mo>⋅</mo>
<msup>
<mn>2</mn>
<mn>−149</mn>
</msup>
</mrow>
</math>,
<math display="inline">
<mrow>
<mn>3</mn>
<mo>⋅</mo>
<msup>
<mn>2</mn>
<mn>−149</mn>
</msup>
</mrow>
</math> and so on.
That means that not only do we have much more uniform gaps, but we even extended the range a little
(though at the cost of some precision).
</p>
<table>
<caption>
Exponent bit patterns
</caption>
<thead>
<tr>
<th>Bit pattern</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>11111111</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>128</mn></msup></mrow></math></td>
</tr>
<tr>
<td>11111110</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow></math></td>
</tr>
<tr>
<td>11111101</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>126</mn></msup></mrow></math></td>
</tr>
<tr>
<td colspan="2">⋮</td>
</tr>
<tr>
<td>00000010</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>−125</mn></msup></mrow></math></td>
</tr>
<tr>
<td>00000001</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>−126</mn></msup></mrow></math></td>
</tr>
<tr>
<td>00000000</td>
<td>
<math display="inline"><mrow><msup><mn>2</mn><mn>−126</mn></msup></mrow></math>
(subnormals)
</td>
</tr>
</tbody>
</table>
<floating-point-demo type="subnormals" value="0x3f800000">
<template data-value="0">0</template>
<template data-value="0x000116c3">
<math display="inline">
<mrow>
<msup>
<mn>10</mn>
<mn>−40</mn>
</msup>
</mrow>
</math>
</template>
<template data-value="0x0000001">smallest representable value</template>
</floating-point-demo>
<h2>Extending the range further</h2>
<p>
Having a special value for numbers outside the normal range would be useful.
For example, since zero could actually be any number close to 0, dividing by it
<em>would result in an actual value</em>! We can't know it exactly (since we threw away that precision),
so it would just result in the too‐big sentinel.
This means that it effectively covers the part of the number line from the (new) largest regular value all the way to ∞.
</p>
<p>
So, let's dedicate the current largest regular value's bit pattern to it!
Since our format is symmetric we'll also define a negative counterpart to have the same bit pattern as the positive one
except with the sign bit being 1.
</p>
<floating-point-demo type="large-sentinel" value="0x7fffffff"></floating-point-demo>
<h3>Properties of an indeterminate value</h3>
<p>
Since the our new values cover infinitely large parts of the number line, the numbers they represent could be pretty much anywhere.
So, how does math work with a number that we only know is very large?
Well, in practice, it only really makes sense to give it the same properties that ∞
(or, more precisely, a limit that diverges to ∞) has traditionally,
namely that it's contagious — most operations involving infinities will result in another infinity:
</p>
<ul>
<li>
<math display="inline">
<mrow>
<mi>∞</mi><mo>+</mo><mn>1</mn><mo>=</mo><mi>∞</mi>
</mrow>
</math>
</li>
<li>
<math display="inline">
<mrow>
<mi>∞</mi><mo>⋅</mo><mo>(</mo><mn>−2</mn><mo>)</mo><mo>=</mo><mi>−∞</mi>
</mrow>
</math>
</li>
<li>
<math display="inline">
<mrow>
<mi>∞</mi><mo>+</mo><mi>∞</mi><mo>=</mo><mi>∞</mi>
</mrow>
</math>
</li>
<li>
<math display="inline">
<mrow>
<mi>∞</mi><mo>></mo><mn>2</mn>
</mrow>
</math> is <code>true</code>
</li>
<li>
<math display="inline">
<mrow>
<mi>∞</mi><mo>−</mo><mi>∞</mi><mo>=</mo><mi>?</mi>
</mrow>
</math>
</li>
<li>
<math display="inline">
<mrow>
<mi>∞</mi><mo>⋅</mo><mn>0</mn><mo>=</mo><mi>?</mi>
</mrow>
</math>
</li>
</ul>
<p>
Since our new value behaves like an infinity, it would make sense to call it that as well.
</p>
<table>
<caption>
Exponent bit patterns
</caption>
<thead>
<tr>
<th>Bit pattern</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>11111111</td>
<td>
<math display="inline"><mrow><msup><mn>2</mn><mn>128</mn></msup></mrow></math>
& special case: infinities
</td>
</tr>
<tr>
<td>11111110</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow></math></td>
</tr>
<tr>
<td>11111101</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>126</mn></msup></mrow></math></td>
</tr>
<tr>
<td colspan="2">⋮</td>
</tr>
<tr>
<td>00000010</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>−125</mn></msup></mrow></math></td>
</tr>
<tr>
<td>00000001</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>−126</mn></msup></mrow></math></td>
</tr>
<tr>
<td>00000000</td>
<td>
<math display="inline"><mrow><msup><mn>2</mn><mn>−126</mn></msup></mrow></math>
(subnormals)
</td>
</tr>
</tbody>
</table>
<floating-point-demo type="infinities" value="0xffffffff"></floating-point-demo>
<h2>Dealing with those question marks</h2>
<p>
Since two infinities could have been a result of rounding numbers of very different magnitudes,
we can't assign any sensible value to
<math display="inline">
<mrow>
<mi>∞</mi><mo>−</mo><mi>∞</mi>
</mrow>
</math>.
Similarly,
<math display="inline">
<mrow>
<mi>∞</mi><mo>⋅</mo><mn>0</mn>
</mrow>
</math>
can't have a definite value assigned to it —
a very big number we don't know exactly times a very small number we don't know exactly could be
<em>literally anything</em>.
</p>
<p>
So, what should happen when a question‐mark operation happens?
Well, first of all, an error should probably be raised somewhere.
However, some programs might not want to be interrupted when working with these kinds of values,
so we can't do much more than setting a flag in the processor.
</p>
<p>
But, since a question‐mark operation can't suspend execution, it needs to return a value.
Let's call it <em>Not a Number</em>, or NaN. Now, where do we put it?
</p>
<p>
Note that since there are several different operations that could produce NaNs,
it might be useful to tell what produced a NaN from the NaN itself.
In other words, we need to be able to attach extra data to a NaN.
</p>
<p>
Since we need to attach extra data to NaNs, let's dedicate an entire exponent to them, say, the highest one.
We'll need to relocate the infinities, so let's redefine them to be an all 0 mantissa with a maximum exponent
(and an arbitrary sign bit). All the other mantissa values can now correspond to NaNs.
</p>
<h3>Properties of an <em>even more</em> indeterminate value</h3>
<p>
A NaN, similarly to an infinity, could theoretically be an actual number where
<em>all</em> of the information about it has been discarded. It could also be
something which isn't a real number at all, such as a square root of a negative number.
Therefore:
</p>
<ul>
<li>
NaNs are contagious, like infinities —
pretty much all arithmetic operations involving a NaN will produce another NaN,
since we can't derive a specific value from a lack of information
</li>
<li>
All comparisons with NaNs return <code>false</code>
<ul>
<li>
This means that <math display="inline">
<mrow>
<mi>x</mi><mo>=</mo><mi>x</mi>
</mrow>
</math> is <code>false</code> if and only if x is NaN,
which can be used as a test for whether you're dealing with one
</li>
<li>
<code>false</code> isn't explicitly meaningful here, but comparisons have to return a value
and <code>false</code> is marginally more convenient due to allowing for an easy NaN test
</li>
</ul>
</li>
<li>
While generating a NaN will usually set an error flag, using it in operations does not do that
</li>
<li>
However, a NaN that emits an error when used could still be useful,
so let's add the option for NaNs to be <em>signaling</em>
<ul>
<li>
When used in an arithmetic operation, a signaling NaN will set the invalid operation error flag
and the result will be a regular (quiet) NaN
</li>
<li>
This could, for example, be used for additional error‐checking
</li>
<li>
In practice, signaling NaNs aren't very common
</li>
</ul>
</li>
</ul>
<table>
<caption>
Exponent bit patterns
</caption>
<thead>
<tr>
<th>Bit pattern</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>11111111</td>
<td>
Infinities and NaNs
</td>
</tr>
<tr>
<td>11111110</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow></math></td>
</tr>
<tr>
<td>11111101</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>126</mn></msup></mrow></math></td>
</tr>
<tr>
<td colspan="2">⋮</td>
</tr>
<tr>
<td>00000010</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>−125</mn></msup></mrow></math></td>
</tr>
<tr>
<td>00000001</td>
<td><math display="inline"><mrow><msup><mn>2</mn><mn>−126</mn></msup></mrow></math></td>
</tr>
<tr>
<td>00000000</td>
<td>
<math display="inline"><mrow><msup><mn>2</mn><mn>−126</mn></msup></mrow></math>
(subnormals)
</td>
</tr>
</tbody>
</table>
<floating-point-demo type="ieee754">
<template data-value="0">0</template>
<template data-value="0x14c59342">mass of a carbon atom (kg)</template>
<template data-value="0x3f800000">1</template>
<template data-value="0x523a43b7">galaxies in the universe</template>
<template data-value="0x7f800000">Infinity</template>
<template data-value="0x7f800001">NaN</template>
</floating-point-demo>
<hr>
<h2>Conclusion</h2>
<p>
We have now reinvented the three most common number formats used by modern computers:
unsigned integers, two's complement integers and IEEE 754 floating point numbers
(well, mostly — some details have been left out, mostly regarding NaNs and some arithmetic edge cases).
</p>
<p>
Note that these formats come in a variety of sizes —
8‐bit, 16‐bit, 32‐bit and 64‐bit integers are common,
and floats typically come in 32‐bit and 64‐bit (which has 11 bits of exponent and 52 bits of mantissa) varieties.
</p>
<p>
While the basic functionality of the above‐mentioned number formats has been explained,
they also lend themselves to various bit-manipulation exploits. I won't be explaining them here,
but here are some of my favourites:
</p>
<ul>
<li><a href="https://stackoverflow.com/questions/73646834#answer-73655977">
Fast division by a constant
</a></li>
<li><a href="https://youtu.be/p8u_k2LIZyo">
Fast inverse square root
</a></li>
</ul>
<p>
Also, while integers and floats are what the processor works with directly, programs can use more complex number formats
based upon the fundamental ones, for example complex numbers are represented as a pair of floats,
arbitrary‐precision integers are a big block of data interpreted as an integer in software
and decimal numbers have their digits represented with groups of bits
(which is useful if you're working with something inherently base‐10 and don't want to
<a href="https://0.30000000000000004.com/">lose precision</a>, such as currency).
</p>
<p>
Lastly, if you ever want to play with floating-point numbers,
<!-- polska gurom -->
check out <a href="https://float.exposed">float.exposed</a>!
It provides support for more floating-point formats
and has additional visuals over the demos in this article.
</p>