**Part 1: Normal numbers in base ten****Part 2: **Numbers in base φ and π**Part 3: **One is normal in base π

(Warning: This is mostly background information for the other posts.)

Usually, we write numbers in base ten. For example:

π ≃ **3.14159**_ten

What is this magic? Let me explain.

## Kindergarten math made difficult

First, pick a “base”, which can be any number greater than one. We’ll use ten.

Then, choose “digits”: symbols for some small integers, like **0** for zero and **3** for three.

A decimal representation in base ten is a sequence of these symbols, with a “decimal point” (looks like: **.**) somewhere.

A symbol just to the left of the decimal point is worth its face value; symbols one position left of that are worth ten times their face value, and symbols one position to the right (just after the decimal point) are worth one tenth of their face value, etc.

Here are some nonobvious but easy-to-prove facts:

- Digits for the numbers zero through nine are sufficient to represent any number this way, and are also minimal (if you remove any digit you can’t represent some number)
- This representation is almost unique
- There is a nice algorithm for finding such a representation
- Decimals in base ten are sorted lexicographically: to figure out which of two decimals is larger, you can just compare the first position where they differ. (With one edge case.)

“Wait, *almost* unique?” Yeah, because **.99999…** (infinite tail of **9**s) is equal to **1.**, and **2.4999…** = **2.5**, etc. As it turns out, this is the only kind of non-unique representation in base ten. It’s also the only case where lexicographic ordering fails (very slightly: the two representations of these numbers are consecutive lexicographically).

(I’m ignoring leading zeros.)

## How to write numbers in base ten: be greedy

The nice algorithm is greedy: you insert the most valuable digit you can in every position.

Let’s try to write pi.

First, divide by whatever power of ten you need to make the result less than one. That’s 10^1.

- Let r
_{0}= pi/10^1, a number between zero and one. - Multiply by ten: 10×r
_{0}is between zero and ten. - The biggest digit less than or equal to 10×r
_{0}is**3**. - So we start with: π =
**3**… - (We divided by 10^1 and have now written 1 digit, so put the decimal point here: π =
**3.**+ …) - We’re left with a remainder r
_{1}= (10×r_{0}– 3), which is between zero and one. - Multiply by ten: 10×r
_{1}is between zero and ten. - The biggest digit less than or equal to 10×r
_{1}is**1**. - π =
**3.1**+ … - Our next remainder is r
_{2}= 10×r_{1}– 1, which is again between zero and one. - Multiply
*that*by ten… etc.

For numbers with two representations (one that is finite, and another ending in a tail of **9**s), this algorithm always gives the finite representation.

There are two ways to obtain the string **0.999…**:

- Pick a sequence of numbers less than one, whose limit is one. Take the positionwise limit of the greedy decimal representations of these numbers.
- Write the second-greediest representation of one: put
**0.**instead of**1.**, but then follow the greedy algorithm from there.

## Greedy algorithm for any base

Here’s a Python snippet for the greedy representation of any number, in any base:

```
def greedy(base, x, n):
offset = mp.floor(mp.log(x)/mp.log(base))+1
x /= mp.power(base, offset)
dig = []
rem = []
r = x
for _ in range(n):
rem += [r]
r *= base
d = int(r)
dig += [d]
r = r - d
return int(offset), dig, rem
```

(I’m using **mpmath** for arbitrary-precision arithmetic.)

You may also find these useful:

```
def tofloat(offset, base, dig):
out = 0.
mult = 1
for d in dig:
mult /= base
out += d * mult
return out * base**offset
def tostring(offset, dig, rem=None):
out = ''
for i, d in enumerate(dig):
if i == offset:
out += '.'
out += (str(d) if d<=9 else f'[{d}]')
return out
```

And here’s the “second-greediest representation” in code:

```
def greedy2(base, x, n):
offset = mp.floor(mp.log(x)/mp.log(base))+1
x /= base**offset
dig = []
rem = []
r = x
rem += [r]
r *= base
d = int(r) - 1
dig += [d]
r = r - d
for _ in range(n):
rem += [r]
r *= base
d = int(r)
dig += [d]
r = r - d
return int(offset), dig, rem
```

## Normal numbers

Most numbers don’t have a second-greediest representation: if you try to write pi starting with **2.**, you’ll be in trouble (the largest thing you can end up with is **2.999…**, which is three, not pi).

In fact, the following is easy to prove (if you majored in math):

- For almost all numbers, their base-ten representation is unique, given by the greedy algorithm, and does not terminate.
- For almost all numbers, for any k, the distribution of length-k substrings of the base-ten representation approaches a uniform distribution.
- For almost all numbers, when following the greedy algorithm, the distribution of remainders approaches a uniform distribution on [0, 1)

The statement about remainders is equivalent to the statement about substrings, and I think it’s more elegant.

As an example, pi is conjectured (not proven!) to be normal in base ten. Here are histograms of its first thousand digits (3, 1, 4, 1, …), its first thousand digit-pairs (31, 14, 41, 15, …), and two histograms of its first thousand remainders (.1415…, .4159…, .1592…, .5926…, …) with different bin sizes:

As we look at more and more digits of pi, these histograms should each converge to uniform — but nobody has been able to prove it.

## Normal representations

We can also define a normal *representation* as a representation of a number with a uniform limiting distribution of remainders. In base ten, this isn’t very interesting: the only numbers with more than one representation aren’t normal, and neither are either of their representations.

In particular, **1. **and **0.999… **are definitely not normal.

We’ll need this definition later, though.