Monthly Archives: June 2022

One is (almost) normal in base π

See previous posts if you’re confused:

Part 1: Normal numbers in base ten
Part 2: Numbers in base φ and π
Part 3: One is normal in base π

This post is also available on LessWrong.

This wasn’t supposed to be a whole series of posts, so we’re going to speed through the punchline:

One is a normal number.

“No it’s not.”

In base π, I mean.

“No, it’s still just 1.

Don’t be so greedy. Start with 0. and go from there.

“So it’s 0.222… or 0.333… or something?”

Nope, it’s 0.3011021110…

“That’s… bad. Does it repeat?”

I think it’s normal, but that seems hard to prove.

“So each digit occurs equally often?”

No, normal numbers in base π are about a 37%-30%-29%-4% split of 0, 1, 2, 3.

“Where did that distribution come from?”

Integrals of this weird function:

(Specifically, integrals over [0, 1/π], [1/π, 2/π], [2/π, 3/π], and [3/π, 1].)

“And where did that come from?”

It’s the distribution of remainders when computing a random number in base π.

“What are the x coordinates of those discontinuities?”

Sequence of remainders when computing 0.3011021110…

“um”

which are dense in [0, 1], by the way.

“Can you prove that?”

No, and neither can you.

“…And the y coordinates?”

The discontinuities get smaller by a factor of π each time.

“And almost all numbers have this distribution of remainders?”

Yup.

“Including 1, if you use the 0.3011021110… representation.”

Yup.

“So the histogram of x-coordinates of the N largest discontinuities in this function approaches… this function itself, as N goes to infinity.”

Yup.

“Which has derivative zero almost everywhere, but has a dense set of discontinuities.”

Yup!

“Any other neat facts about it?”

It’s a fractal.

“How?”

Stretch out the colored regions by a factor of π horizontally, shrink by π vertically, and add them to each other. You get the original function back:

(Some imagination required.)

“And this is all because π is normal, or transcendental, or something?”

Nope, I think analogous statements are all true for base 3.5.



Numbers in base φ, π, and so on

Part 1: Normal numbers in base ten
Part 2: Numbers in base φ and π
Part 3: One is normal in base π

(This post assumes familiarity with the concept of number bases: decimal, binary, hexadecimal, and so on.)

Let’s say we want to write numbers in base φ≃1.618_ten (the golden ratio), or base π≃3.14_ten. So, for example, 1000_π = π^3.

The allowed digits are {0, 1} for base φ, and {0, 1, 2, 3} for base π.

Why not just e.g. {0, 1, 2} for π? That wouldn’t be enough digits to express everything: there’s a gap between 0.222…_π = (2π/(π-1) – 2) ≃ 0.934_ten and 1._π = 1._ten.

Lots of representations, out of order

In integer bases, decimal representations are almost unique (see previously): you can write 5/2 as 2.5_ten or 2.4999…_ten, but that’s the only kind of ambiguity.

In base φ, 100._φ = φ^2 = φ+1 = 11._φ. Uh oh.

Worse still, 110._φ = 1000._φ, which means that 111._φ > 1000._φ. Numbers aren’t lexicographically ordered anymore! Looking at two numbers, it’s not immediately obvious which is bigger.

(But there’s a limit to how bad this gets: the largest number starting with 0. is 0.111…_φ, which is equal to 10._φ. So if two numbers start off with the same digits, and then the first has a 1 where the second has a 0, the second number has to start “catching up” within the next two digit positions or it will end up being smaller.)

Wikipedia has lots of facts about base φ. It describes the “standard form” of a number as the representation that avoids the digit sequence 11, which is always equivalent to the representation given by the greedy algorithm in the previous post. (In short: keep multiplying by the base and chopping off the integer part of the result.)

Mercifully, the greedy/standard representation in any base is unique and sorts lexicographically.

Second-greediest representation

In any base, the greedy algorithm applied to one gives 1.

The second-greediest representation (see previously) in an integer base is always something like 1 = 0.222…_three or 1 = 0.999…_ten.

In base φ, the second-greediest-rep is 1 = 0.11_φ, which terminates.

In base π, it’s 1 = 0.301102111002022113…_π, nonterminating, with no discernable pattern.

(Note that flipping any of the 0s to 1 or 1s to 2 makes the number bigger than one.)

This kind of makes sense — the golden ratio is a much nicer number than pi (algebraic instead of transcendental, etc.).

But in base 5/2, the second-greediest-rep is 1 = 0.2101110000110121…_(5/2), nonterminating, no discernable pattern.

And in base X≃2.732_ten, the positive root of x^2-2x-2, it’s 1 = 0.22_X.

So which numbers have terminating second-greediest-reps for 1?

We can rewrite these representations as expressions like 11._φ = φ^2 and 22._X = X^2. The left-hand sides are polynomials in the base (φ+1 and 2X+2, in these cases), and we end up with the somewhat weird theorem:

The second-greediest-rep for 1 terminates in base B iff B is a root of a monic polynomial whose other coefficients are integers between -floor(B) and 0, inclusive.

What about repeating second-greediest-reps? Integer bases, sure, but any others?

Well, in base φ+1, we get 1 = 0.2111…_(φ+1).

In general, when do we get something repeating?

…Left as exercise for the reader.

When do we have a choice?

As we write down digits for some number in some base B, we can sometimes deviate from the greedy algorithm by writing down something smaller than necessary, leaving a remainder that isn’t in the half-open interval [0, 1).

For the second-greediest representation of a number, we do this once, at the beginning. If the number we’re representing is 1, we have a remainder of 1, which just barely falls outside [0, 1).

But sometimes we can be bolder. How large can the remainder r_i be? Presumably there’s some cap, r_i <= r_max.

Well, the largest contribution we can get from the remaining digits is 0.ddd…_B, where d=ceil(B-1) is the largest digit. And that’s equal to:

r_max = ceil(B-1) / (B-1)

Sanity check: For integer B, this is just 1, and we are only allowed to have remainders in [0, 1]. This is consistent with the fact that tails of 9’s are the only non-greedy representation possible in base ten.

Normal numbers in weird bases

Recall our statements about normal numbers in base ten:

  • For almost all numbers, for any k, the distribution of length-k substrings of the base-ten representation approaches a uniform distribution.
  • For almost all numbers, when following the greedy algorithm, the distribution of remainders approaches a uniform distribution on [0, 1)

I think the following is true in any base B, and fairly easy to prove:

  • For almost all numbers, for any k, the distribution of length-k substrings of the base-B greedy representation approaches the same distribution Dk(B).
  • For almost all numbers, when following the greedy algorithm, the distribution of remainders approaches the same distribution D(B).

The sets of numbers for which these two statements are true should be the same; let’s call them “normal numbers”, extending the usual integer-base definition. Furthermore, non-normal numbers may still have “normal representations” (whose remainders follow D(B)) under some non-greedy algorithm.

I think the following is true but probably very difficult to prove:

  • In base pi, the second-greediest representation of one is normal.

Note that this is definitely not true in any integer base, or in base phi, where the second-greediest rep of one terminates or repeats.

What does D(π) look like, and what are its properties? See here.

Non-unique decimals, 0.999…, and normal numbers (in base ten)

Part 1: Normal numbers in base ten
Part 2: Numbers in base φ and π
Part 3: One is normal in base π

(Warning: This is mostly background information for the other posts.)

Usually, we write numbers in base ten. For example:

π ≃ 3.14159_ten

What is this magic? Let me explain.

Kindergarten math made difficult

First, pick a “base”, which can be any number greater than one. We’ll use ten.

Then, choose “digits”: symbols for some small integers, like 0 for zero and 3 for three.

A decimal representation in base ten is a sequence of these symbols, with a “decimal point” (looks like: .) somewhere.

A symbol just to the left of the decimal point is worth its face value; symbols one position left of that are worth ten times their face value, and symbols one position to the right (just after the decimal point) are worth one tenth of their face value, etc.

Here are some nonobvious but easy-to-prove facts:

  • Digits for the numbers zero through nine are sufficient to represent any number this way, and are also minimal (if you remove any digit you can’t represent some number)
  • This representation is almost unique
  • There is a nice algorithm for finding such a representation
  • Decimals in base ten are sorted lexicographically: to figure out which of two decimals is larger, you can just compare the first position where they differ. (With one edge case.)

“Wait, almost unique?” Yeah, because .99999… (infinite tail of 9s) is equal to 1., and 2.4999… = 2.5, etc. As it turns out, this is the only kind of non-unique representation in base ten. It’s also the only case where lexicographic ordering fails (very slightly: the two representations of these numbers are consecutive lexicographically).

(I’m ignoring leading zeros.)

How to write numbers in base ten: be greedy

The nice algorithm is greedy: you insert the most valuable digit you can in every position.

Let’s try to write pi.

First, divide by whatever power of ten you need to make the result less than one. That’s 10^1.

  • Let r0 = pi/10^1, a number between zero and one.
  • Multiply by ten: 10×r0 is between zero and ten.
  • The biggest digit less than or equal to 10×r0 is 3.
  • So we start with: π = 3
  • (We divided by 10^1 and have now written 1 digit, so put the decimal point here: π = 3. + …)
  • We’re left with a remainder r1 = (10×r0 – 3), which is between zero and one.
  • Multiply by ten: 10×r1 is between zero and ten.
  • The biggest digit less than or equal to 10×r1 is 1.
  • π = 3.1 + …
  • Our next remainder is r2 = 10×r1 – 1, which is again between zero and one.
  • Multiply that by ten… etc.

For numbers with two representations (one that is finite, and another ending in a tail of 9s), this algorithm always gives the finite representation.

There are two ways to obtain the string 0.999…:

  1. Pick a sequence of numbers less than one, whose limit is one. Take the positionwise limit of the greedy decimal representations of these numbers.
  2. Write the second-greediest representation of one: put 0. instead of 1., but then follow the greedy algorithm from there.

Greedy algorithm for any base

Here’s a Python snippet for the greedy representation of any number, in any base:

def greedy(base, x, n):
    offset = mp.floor(mp.log(x)/mp.log(base))+1
    x /= mp.power(base, offset)
    dig = []
    rem = []
    r = x
    for _ in range(n):
        rem += [r]
        r *= base
        d = int(r)
        dig += [d]
        r = r - d
    return int(offset), dig, rem

(I’m using mpmath for arbitrary-precision arithmetic.)

You may also find these useful:

def tofloat(offset, base, dig):
    out = 0.
    mult = 1
    for d in dig:
        mult /= base
        out += d * mult
    return out * base**offset

def tostring(offset, dig, rem=None):
    out = ''
    for i, d in enumerate(dig):
        if i == offset:
            out += '.'
        out += (str(d) if d<=9 else f'[{d}]')
    return out

And here’s the “second-greediest representation” in code:

def greedy2(base, x, n):
    offset = mp.floor(mp.log(x)/mp.log(base))+1
    x /= base**offset
    dig = []
    rem = []
    r = x
    rem += [r]
    r *= base
    d = int(r) - 1
    dig += [d]
    r = r - d
    for _ in range(n):
        rem += [r]
        r *= base
        d = int(r)
        dig += [d]
        r = r - d
    return int(offset), dig, rem

Normal numbers

Most numbers don’t have a second-greediest representation: if you try to write pi starting with 2., you’ll be in trouble (the largest thing you can end up with is 2.999…, which is three, not pi).

In fact, the following is easy to prove (if you majored in math):

  • For almost all numbers, their base-ten representation is unique, given by the greedy algorithm, and does not terminate.
  • For almost all numbers, for any k, the distribution of length-k substrings of the base-ten representation approaches a uniform distribution.
  • For almost all numbers, when following the greedy algorithm, the distribution of remainders approaches a uniform distribution on [0, 1)

The statement about remainders is equivalent to the statement about substrings, and I think it’s more elegant.

As an example, pi is conjectured (not proven!) to be normal in base ten. Here are histograms of its first thousand digits (3, 1, 4, 1, …), its first thousand digit-pairs (31, 14, 41, 15, …), and two histograms of its first thousand remainders (.1415…, .4159…, .1592…, .5926…, …) with different bin sizes:

As we look at more and more digits of pi, these histograms should each converge to uniform — but nobody has been able to prove it.

Normal representations

We can also define a normal representation as a representation of a number with a uniform limiting distribution of remainders. In base ten, this isn’t very interesting: the only numbers with more than one representation aren’t normal, and neither are either of their representations.

In particular, 1. and 0.999… are definitely not normal.

We’ll need this definition later, though.