"The Law of Benford"
The "significant-digit paradox" was probably first discovered by the astronomer Simon Newcomb in 1881 when he observed that the pages at the beginning of a book with logarithms tended to be wearied much more than the back pages. The physicist Frank Benford rediscovered this phenomenon in 1938 while working at General Electric. He compiled sets of numbers from baseball scores to numbers extracted from Reader's Digest and verified the nonuniform distribution of the most significant digit. Take the most significant digit of all these numbers, and alas, one finds that 1, 2, ..., 9 do not occur with the same frequency 1/9 which one would perhaps expect to be the statistical average. Instead digit D occurs on the average with a frequency of log 1 + D
D
where log is the logarithm with base 10. This formula was first found empirically by Simon Newcomb in 1881. Later in 1961 the mathematicianRoger Pinkham proved Newcomb's formula from the assumption that the numbers exhibit a scale-invariance; that is, their statistical properties are invariant vis-à-vis changes by multiplicative factors (such as occurs when e.g. one exchanges sums of money
between different currencies). Theodore Hill presented in 1996 a general theorem which showed that numbers drawn from different distributions tend to conform to the Newcomb-distribution.
Benford's law has been put to practical use for e.g. detecting income tax frauds - faked number series stick out because they generally do not adhere to Benford's law. (M J Nigrini reports that he has applied the DA - "digital analysis" - to Bill Clinton's tax returns and found no evidence of fraud.) Here we will illuminate Benford's law by taking a random series of numbers in the interval 1 to 1000 and subject it to "scrambling"; i.e. we multiply the
numbers with factors picked at random from the interval 1 to 9 and iterate this procedure several times. The idea is that by this we will obtain a scale invariant series since further multiplication by any factor will not significantly change its statistics. An important trick used in these circumstances is to shift the decimal point in the numbers so that they will stay in the interval [1, 10). For these number one can then also obtain a continuous probability distribution on the interval [1,10]. The content of Newcomb's formula is then that Prob Z( < z) = log z( ).
1. Generate (pseudo-)random numbers N := 400 n := 0 N.. − 1
X := runif N 1( , ,1000) Uniformly distributed numbers in the interval 1 to 1000.
2. The decimal point shifting function
Zf x( ) nn← 0
nn← nn +1 x > 10nn
while if x > 1
nn← nn −1 x < 10nn
while if x ≤ 1
x x
10nn 1−
←
x x
← 10 if x ≥ 10
:= The function y = Zf(x)
displaces the decimal point in x such that the result y stays in the interval 1 to 10. E.g.
Zf 234( ) = 2.34 Zf 0.00234( ) = 2.34
3. The "scrambling" function ZXf Y( ) nn← length Y( )
λ← 1 + rnd 8( ) Zi← Zf
( )
λ⋅Yii∈0 nn.. − 1 for
Z
:= This "scrambling"
function operates on an array of numbers by multiplying them with random numbers from the interval 1 to 9 and then applies the decimal point shifting Zf-function.
4. Function which calculates the statistics of the significant-digit
digstat Y( ) nn← length Y( )
Zi← 0 i∈0 9..
for
s← num2str Y
( )
isd← substr s 0( , ,1) d ←str2num sd( ) Zd ← Zd + 1 Yi ≥ 1
if
i∈0 nn.. − 1 for
Z
:= This function will
compute the statistics of the first-digit of the numbers in an array Y.
Extracts the first digit of the number. If the digit is d we increment Zd by one.
5. Function that iterates the scrambling n times.
Znf Y n( , )
Y ← ZXf Y( ) k∈0 n.. − 1 for
Y :=
We apply the iterated scrambling to the random series X:
Itn := 40 Number of iterations.
Iterates the scrambling. This will generate an approximately
scale-invariant series of numbers in the range [1, 10). The scrambling may be considered as a dynamic process and the Newcomb distribution as its
"attractor".
Xn := Znf X Itn( , )
6. The frequencies
stat := digstat X( ) Calculates the statistics for the random series ...
statn := digstat Xn( ) ... and for the scrambled series.
A noticeably difference between the statistics of the random series and the scrambled series will be observed. The random series has nearly equal frequencies for the significant digits d = 1, ... ,9, whereas for the scrambled series the frequency tends to decrease with increasing d. (To control the "stability" of the result, mark the runif-function in section 1 above and then press F9 which will recalculate the whole thing with a new set of "seeds".)
k := 1 9..
0 2 4 6 8 10
0 50 100 150
random series
"scrambled" series
Frequencies of the first digit
statk statnk
k
7. The distributions
We will next compute histograms using our data in order to obtain the probability distributions.
Again, by marking the runif-function in section 1 and pressing F9 one can repeat the calculations with new series and one should find that the distribution for the scrambled series varies closely around the Newcomb distribution whereas the random series follows an almost straight line in accordance with a uniform distribution of the significant digit.
0 2 4 6 8 10
0 0.5 1
"scrambled" series random series Newcomb
Distributions of Xn, X0 and Newcomb
dzn N dz0
N NC
zintv
The results will be compared with the Newcomb distribution.
NCi:= fnc zintv
(
i)
fnc z( ) := log z( ) The Newcomb distribution:
dz0i 1+ := dz0i+ (i< NN − 1) histz0⋅ i dz00 := 0
histz0 := hist zintv X0( , ) X0 Zf X( )
→ :=
dzni 1+ := dzni+ (i< NN − 1) histzn⋅ i dzn0 := 0
Calculates the
histogram for scrambled data. For the random series X we have first to apply the decimal point shifting
function Zf.
histzn := hist zintv Xn( , ) zintvi 1 9 i
⋅NN + :=
Number of intervals to be used for the histogram.
i:= 0 NN.. − 1 NN := 40
8. The Kolmogorov-Smirnov test
A standard check for whether an empirical distribution Fn(z) based on a sample of size n agrees with a given theoretical distribution F is the Kolmogorov-Smirnov test. It is based on the fact that the distribution of the variable Dn = max | Fn(z) - F(z)| is independent of the actual distribution F. For large samples n it can be shown that Dn with probability 0.95 is less than 1.36
n . Thus, if Dn is less than this it can be said that there is an agreement between the data and the theoretical distribution F on a 95% confidence level.
Resizes the empirical
distribution array to fit that of the NC-array.
Fn submatrix dzn
N ,0,length NC( ) − 1,0,0
:=
Dn:= max Fn
(
− NC→)
The KS-variable.KS95 1.36
:= N The KS 95% confidence bound.
With our choice of parameters we should typically have Dn less than the KS-limit KS95. We find Dn = 0.02849 and KS95 = 0.068 . The Kolmogorov-Smirnov test together with digit-statistics thus supplies a method for testing for scale-invariance for sets of numbers.
9. Scale-invariance: theory
Consider our scrambled series whose elements z are in the range [1, 10). The scale-invariance implies that the distribution function
F a b( , ) = Prob a( ≤ z ≤ b) satisfies F
(
λ⋅a,λ⋅b)
= F a b( , )for λ > 0 such that the arguments stay in the interval [1, 10). Thus F(a,b) is a function of the ratio a/b only:
F a b( , ) G b
a
=
From the properties of the probability distribution,
F 1 1( , ) = 0
F a b( , ) = F 1 b( , ) − F 1 a( , )
it follows that the function G(z) must satisfy
G 1( ) = 0 G 10( ) = F 1 10( , ) = 1
G b
a
= G b( ) − G a( )From this it is then obvious that G(z) can be no other than the logarithmic Newcomb distribution:
F 1 z( , ) = G z( ) = log z( )
The above arguments can be repeated mutatis mutandis if we consider the first two digits of numbers (a case also known by Newcomb and Benford). Now we use a decimal point shifting function that keeps the numbers in the interval 10 to 100. For instance, 5924 becomes 59.24. The probability that e.g. the first two digits are 13 will then be given by log 1+ 13
13
. If we sum over all possible combinations of the first two digits we indeed get the total probability one:10 99
i
log 1 + i
i
∑
= = log
10010
= 1The above can be generalized to the case of the three first digits etc.
Today it may seem odd that the significant-digit phenomenon defied plausible explanations till recent times. One reason for this may have been the entrenched status of the normal (Gaussian) distribution among probabilists. A related "law" is one associated with the name of G K Zipf, a "law" figuring prominently especially in linguistics, and which has been "promoted" i.a. by Benoit Mandelbrot.
What sort of data may approach the Benford law? Typically we have two sorts of Benford data. First, heterogeneous data
collected from various sets with different distributions. Secondly, homogenous data whose source has some approximately
scale-invariant properties. Thus, Nigrini has a figure which shows the analysis of the population counts of the 3141 US counties (1990 census). The difference (standard deviation) from the Newcomb distribution of the significant-digit is less than 0.7%.
Here one might think that this reflects an approximate
scale-invariance of the sizes of the counties in the data. A similar analysis by Arnér for the 289 Swedish counties (census 1998) shows also a quite good agreement with the Newcomb distribution though it narrowly exceeds the Kolmgorov-Smirnov 95%
confidence bound (Dn = 0.10 and KS95 = 0.08). Indeed, in the
"digital analysis" pioneered by Nigrini the digit-distributions are used as "spectrograms". "Spikes/dips" above or below the Newcomb-distribution may indicate systematic fraud, fake, or some threshold phenomenon built into the system.
But, after all, why do the boiling points of the elements accord to the Newcomb distribution?
Nigrini (1999) estimates that some 150 academic papers have been published on Benford's law.
10. Bibliography
Magnus Arnér: På Irrfärder i Slumpens Värld. Studentlitteratur 2001.
["Tabellparadoxen", chapter 3. - In Swedish.]
Frank Benford: "The law of anomalous numbers". Proc. Amer. Phil. Soc. 78 (1938) 551- 572.
James Berger: Statistical Decision Theory and Bayesian Analysis. 2. ed. Springer Verlag 1985. [Page 86.]
M W Browne: "Following Bedford´s law, or looking for no. 1". New York Times (4 August 1998).
Theodore P Hill: "The first-digit phenomenon". American Scientist 86 (1996) 358 - 363.
Theodore P Hill: "The significant-digit phenomenon". Amer. Math. Monthly 102 (1995) 322 - 327.
Benoit Mandelbrot: The Fractal Geometry of Nature. W H Freeman 1983.
Robert Matthews: "The power of ten". New Scientist (10 July 1999).
Simon Newcomb: "Note on the frequency of use of different digits in natural numbers". Amer. J. Math. 4 (1881) 39 - 40.
Mark J Nigrini: "I have got your number. How a mathematical phenomenon can help CPAs uncover fraud and other irregularities".
<www.aicpa.org/pubs/jofa/may1999/niggrini.htm>
Roger S Pinkham: "On the distribution on the first significant digits". Ann. Math. Stat.
32 (1961) 1223 - 1230.
R Raimi: "The first digit problem". Amer. Math. Monthly 83 (1976) 521 - 538.
R Raimi: "The peculiar distribution of first digits". Scientific American (Dec 1969) 109 - 119.
M A Snyder, J H Curry, and A M Dougherty: "Stochastic aspects of one-dimensional discrete dynamical systems: Benford's law". Phys. Rev. E Vol. 64. No.2 (2001).
"For me the law is a prime example of a mathematical idea which is a surprise to everyone - even the experts".
T Hill
"For me, Benford is a great hero. His law is not magic, but sometimes it seems like it".
M J Nigrini.
"Thus all the explanations of [Benford's law] so far given seems to lack something of finality ....
the answer remains obscure".
R Raimi (1969).
Appendix I. Newcomb frequencies for the digits
From the above arguments it follows that the probability, in a Benford series, that the nth digit dn is k is equal to
Prob d
(
n= k)
10n 1− 10n−1
i
log i 10⋅ + k + 1 i 10⋅ + k
∑
==
if n > 1 (k = 0, 1, ..., 9) and in case n = 1 (k = 1, 2, ..., 9) Prob d
(
1= k)
= log
1 +k k
Next we calculate a table of the Newcomb frequencies up to the 5th digit.
table
ds←num2str n( +1) s← concat ds ". digit"( , )
p0 n 1, + ← s n∈0 4..
for
pk 1+ ,0← num2str k( ) k∈0 9..
for
p0 0, ← "Digit"
pk 1+ ,1 if k > 0 log 1 1 + k
, ,"void"
← k∈0 9..
for
pk 1+ ,n 1+
10n 1− 10n−1
i
log i 10⋅ + k +1 i 10⋅ + k
∑
=← k∈0 9..
for
n∈1 4..
for
p :=
Table of the frequencies of the first to the fifth digit according to the Newcomb distribution:
table
"Digit" "1. digit" "2. digit" "3. digit" "4. digit" "5. digit"
"0" "void" 0.11968 0.10178 0.10018 0.10002
"1" 0.30103 0.11389 0.10138 0.10014 0.10001
"2" 0.17609 0.10882 0.10097 0.10010 0.10001
"3" 0.12494 0.10433 0.10057 0.10006 0.10001
"4" 0.09691 0.10031 0.10018 0.10002 0.10000
"5" 0.07918 0.09668 0.09979 0.09998 0.10000
"6" 0.06695 0.09337 0.09940 0.09994 0.09999
"7" 0.05799 0.09035 0.09902 0.09990 0.09999
"8" 0.05115 0.08757 0.09864 0.09986 0.09999
"9" 0.04576 0.08500 0.09827 0.09982 0.09998
=
For increasing n there is a rapid convergence to the uniform distribution of the digits with frequency 1/10. This is also proved from the above formulas using the facts that for big numbers S we have
1 S
i
1
∑
= i ≈ ln S( ) ln 1 1+ S
≈1 S
Appendix II. Benford series by mixing Gaussian distributions Above we demonstrated how a Benford series was generated by
"scrambling". Picking numbers randomly from various distributions generates also Benford series if the distributions are in a certain sense "unbiased", as has been pointed out by T Hill. We can demonstrate this effect by picking numbers randomly from Gaussian distributions whose standard deviations are set
randomly. Intuitively this procedure will result in an approximately scale invariant series if the standard deviations are varied over a range large enough.
gmix
σ ← rnd 1000( ) si← rnorm 1 0
(
, ,σ)
0i∈0 K.. − 1 for
s
:= Picks numbers from
Gaussian distributions with randomly
assigned standard deviations σ in the range of 0 to 1000.
0 200 400 600 800 1000
4000 2000 0 2000 4000
gmixn
n
Transforms the data to the interval 1 to 10 and calculates the digit
statistics. Calculates then the
histogram and the distribution which is compared with the Newcomb distribution.
Zgmix:= Zf
(
gmix→)
sgmix := digstat Zgmix( ) gm:= hist zintv Zgmix( , )
dgm0 := 0 dgmi 1+ := dgmi+ (i< NN − 1) gm⋅ i
0 2 4 6 8 10
0 0.5 1
Gaussian mixture Newcomb
dgm K NC
zintv
It can be pointed out that if we draw numbers from given Gaussian distribution we also tend to get a digit statisticis which differs
significantly from the uniform 1/9 distribution:
σ := 100 gs := rnorm K 0
(
, ,σ)
zgs := Zf
(
gs→)
hgs := hist zintv zgs( , ) dgs0 := 0 dgsi 1+ := dgsi+ (i< NN − 1) hgs⋅ i0 2 4 6 8 10
0 0.5 1
gaussian Newcomb dgs
K NC
zintv
stg := digstat zgs( ) nck log 1 1 + k
:=
0 2 4 6 8 10
0 0.1 0.2 0.3 0.4
gaussian Newcomb stgk
K nck
k
we tend to get a digit distribution with "2" in excess when compared with the Newcomb distribution; with numbers from N(0,30) we get "3" in excess etc. Given a probablility distribution F then the probability, that for a number drawn from this distribution its first digit is 1, is given by
Prob d
(
1= 1)
n
Prob 10
(
n ≤ x < 2 10⋅ n)
∑
=
n
F 10
(
n,2 10⋅ n)
∑
=
Similarly for digits "2", "3", etc. We can implement this formula for calculating the digit distribution for a series with distribution N(0,σ).
gaussdig
( )
σ nn← ceil log( ( )
σ)
+ 2Zk 2
−nn nn
n
pnorm k
( + 1) 10⋅ n,0,σ
pnorm k 10
(
⋅ n,0,σ)
− +
...
∑
=
⋅
← k∈1 9..
for
Z :=
dstat := gaussdig
( )
σ1 9
j
dstatj
∑
= = 10 2 4 6 8 10
0 0.1 0.2 0.3 0.4
Gaussian simul.
Newcomb Gaussian theor.
stgk K nck dstatk
k
The distribution function FZ in terms of the decimal shifted variable z of |x| is given by
FZ z( ) n
Prob 10
(
n ≤ x < z 10⋅ n)
∑
=
Thus in the Gauss case we have the following approximate implementation:
Fgauss z
( )
,σ nn← ceil log( ( )
σ)
+ 2Z 2
−nn nn
n
pnorm z 10
(
⋅ n,0,σ)
pnorm 10
(
n,0,σ)
− +
...
∑
=
⋅
← Z :=
fgi:= Fgauss zintv
(
i,σ)
0 2 4 6 8 10
0 0.5 1
Gaussian Newcomb fg
NC
zintv
An aside: comparing the entropies. The Shannon's entropy for a discrete probability measure p is defined by
Entropy p( )
1 length p( ) 1−
i
pi⋅ln p
( )
i∑
=− :=
It attains maximum for the uniform distribution p1 = ... = pn = 1/n.
We can calculate how much the entropy for the 1st digit statistics deviates from the uniform value for the Gaussian and the
Newcomb statistics.
maxentropy:= ln 9( ) Maximum entropy.
maxentropy = 2.19722
Entropy dstat( ) = 1.9665 Entropy for Gaussian numbers.
Entropy nc( ) = 1.99343 Entropy for Newcomb statistics.
Appendix III. Benford and dynamics
In a recent paper Snyder, Curry and Dougherty studies discrete one-dimensional dynamical systems which produces Benford series. A discrete dynamics is defined by a map
xn 1+ = f x
( )
nA classical one-dimensional example is the logistic map
f x( ) = λ⋅x⋅(1 − x)
on the interval [0.1] for 0 ≤ λ ≤4 . Given an initial value x0 one may study the statistics of its orbit; i.e. the points xn = fn(x0). In the case λ = 4 the logistic map becomes a chaotic map (also called the Ulam map). The statistics of the orbits are generally
independent of the starting point x0 for the chaotic maps.
Also the distribution of the points is invariant vis-à-vis the map f, since a further iteration by f on the orbit does not change
anything. However, only a in a few cases (the Ulam map is one of them) has it been possible to find an analytic expression for the invariant distributions. If we can find such an expression, then as the authors show, it is easy to check whether the points of the orbit satisfy the Newcomb digit-distribution. Thus they can show that the Ulam map and the so-called tent map do not produce Benford series. In case we do not have an expression for the distribution it remains to explore the distribution numerically. For instance, the authors provide numerical evidence that the
so-called Renyi map produces Benford series.
The Renyi map is defined on [0, ∞) by
f x( ) x 1− x
( ) if x< 1 x − 1 otherwise
:= Renyi map
N := 1000 n := 0 N.. − 1
x0 := rnd 100( ) xn 1+ := f x
( )
n0 200 400 600 800 1000
1.10 3 0.01
0.1 1 10 100 1.103
Renyi series
xn
n
computing n2 points for each n1 randomly chosen orbits and determine their statistics:
SR n1 n2
(
, , f ,seed,d)
Zi← 0 i∈0 9..
for
x0← rnd seed( ) − d
xj 1+ ← f x
( )
jj∈0 n2 1.. − for
x Zf x( )
→
←
Z← Z + digstat x( ) i∈0 n1 1.. − for
Z :=
ZZ := SR 100 500( , ,f ,100,0) Ztot
0 9
n
ZZn
∑
=:=
ZZ ZZ
:= Ztot Calculates the digit statistics.
0 2 4 6 8 10
0 0.1 0.2 0.3 0.4
Renyi Newcomb ZZk
nck
k
The reason why the Renyi map seems to have the asymptotic Newcomb distribution is perhaps due to its "slingshot" mechanism which effects a "scrambling" leading to scale invariance.
Heuristic argument:
Apparently this Benford property has to do with the *slingshot effect* of the Renyi map; points of the form x = 1 - r are ejected to x =1/r - 1. The graphs of the Renyi orbits also resemble fractal curves with triangles of various heights. If r > 0 is uniformly distributed near 0 we may expect the big numbers -- the height of the triangles -- x := 1/r to have a pdf p(x) of the form 1/x^2. The pdf q(x) that we have a big number x in the series would then be proportional to q(x) = p(x) + p(x+1) + p(x+2) + .... := 1/x (since a triangle of height x implies that the numbers x-1, x-2, ... are in the series too, due to the definition of the Renyi map for x > 1). Now use the decimal shifting trick and calculate the distribution function for the decimal shifted variable 1 =< z < 10.
F(z) = sum Prob(10^n < x < z*!0^n) proportional to integral from {10^n} to {z*10^n} dx/x = ln(z).
That is, we would obtain a logarithmic function in z in accordance with the Newcomb distribution. The part 0 < x < 1 seems to be sort of a mirror (?) of the case x > 1 (take a look at a graph using log-scale y-axis). When x is close to zero it takes about 1/x steps to grow back to the size around 1.
It might be that if one was to try to flesh out the argument above there would be sort of a circular-argument use of the distribution of points in the series. However, that might be ok since it proves that the distribution is the invariant distribution of the time series in the asymptotic sense.
(Indeed, one should expect it to be easier to theoretically verify whether a dynamical system has a certain invariant measure than to actually produce one in the general case.)
"Scale invariance" of fractal time series, such as random walk, suggests these have also peculiar distributions of the digits. As an example we consider the Brownian random walk:
br x( ) := x + rnd 1( ) − 0.5
Brown process.
z0 := rnd 10( ) − 5 zn 1+ := br z
( )
n0 200 400 600 800 1000 10
5 0 5 10
Brown series
zn
n
ZB:= SR 100 10000( , ,br,1,0.5)
Calculates the digit statistics for a number of Brown
series. There seems to be a significant deviation from the Newcomb distribution but the result fluctuates quite much for small series.
ZB ZB
0 9
n
ZBn
∑
=:=
0 2 4 6 8 10
0 0.1 0.2 0.3 0.4
Fractal Newcomb ZBk
nck
k
General fractal curve with power spectrum exponent β:
noise1f n
(
,∆t,β)
nn← 2n 1− + 1N ← 2n
fqk k N
1
∆t
⋅
←
k∈0 nn.. − 1 for
amp← rnorm nn 0( , ,1) θ← runif nn 0
(
, ,2⋅π)
Fk
ampk⋅ei⋅θk
fqk
( )
β 2
←
k∈1 nn.. − 1 for
F0←F1 Z← IFFT F( ) Z
:=
∆t 1
:= 100 The time step used below.
Example with 1/f-noise (β = 1):
xx := noise1f 10
(
,∆t,1)
kk := 0 length xx.. ( ) − 10 500 1000 1500 60
40 20 0 20 40
1/f-noise
xxkk
kk
The next function calculates the digit statistics for the fractal time series.
n1f n1 n
(
, ,∆t,β)
Zi← 0 i∈0 9..
for
z← noise1f n
(
,∆t,β→)
x Zf z( )
→
←
Z ←Z + digstat x( ) i∈0 n1 1.. − for
Z :=
It might also be of interest to compare a "Zipf distribution" with the Newcomb distribution and the digit distribution of the fractal time series.
The "Zipf"
distribution of the digits.
SS
1 9
n
1
∑
= n:= SS = 2.82897 Zpk 1
SS 1
⋅k :=
β := 2
Next we generate Brownian process via the
Fourier-transform method.
Zif := n1f 100 10
(
, ,∆t,β)
Zif Zif
0 9
n
Zifn
∑
=:=
This exploratory simulation suggests Brownian (β = 2) has a digit statistics which is closer to the Zipf distribution than the Newcomb distribution:
0 2 4 6 8 10
0 0.1 0.2 0.3 0.4
Fractal Newcomb
"Zipf"
Zifk nck Zpk
k