"The Law of Benford"The "significant-digit paradox" was probably first discovered by the astronomer Simon Newcomb

(1)

"The Law of Benford"

The "significant-digit paradox" was probably first discovered by the astronomer Simon Newcomb in 1881 when he observed that the pages at the beginning of a book with logarithms tended to be wearied much more than the back pages. The physicist Frank Benford rediscovered this phenomenon in 1938 while working at General Electric. He compiled sets of numbers from baseball scores to numbers extracted from Reader's Digest and verified the nonuniform distribution of the most significant digit. Take the most significant digit of all these numbers, and alas, one finds that 1, 2, ..., 9 do not occur with the same frequency 1/9 which one would perhaps expect to be the statistical average. Instead digit D occurs on the average with a frequency of log 1 + D



D

  

where log is the logarithm with base 10. This formula was first found empirically by Simon Newcomb in 1881. Later in 1961 the mathematician

Roger Pinkham proved Newcomb's formula from the assumption that the numbers exhibit a scale-invariance; that is, their statistical properties are invariant vis-à-vis changes by multiplicative factors (such as occurs when e.g. one exchanges sums of money

between different currencies). Theodore Hill presented in 1996 a general theorem which showed that numbers drawn from different distributions tend to conform to the Newcomb-distribution.

Benford's law has been put to practical use for e.g. detecting income tax frauds - faked number series stick out because they generally do not adhere to Benford's law. (M J Nigrini reports that he has applied the DA - "digital analysis" - to Bill Clinton's tax returns and found no evidence of fraud.) Here we will illuminate Benford's law by taking a random series of numbers in the interval 1 to 1000 and subject it to "scrambling"; i.e. we multiply the

numbers with factors picked at random from the interval 1 to 9 and iterate this procedure several times. The idea is that by this we will obtain a scale invariant series since further multiplication by any factor will not significantly change its statistics. An important trick used in these circumstances is to shift the decimal point in the numbers so that they will stay in the interval [1, 10). For these number one can then also obtain a continuous probability distribution on the interval [1,10]. The content of Newcomb's formula is then that Prob Z( < z) = log z( ).

(2)

1. Generate (pseudo-)random numbers N := 400 n := 0 N.. − 1

X := runif N 1( , ,1000) Uniformly distributed numbers in the interval 1 to 1000.

2. The decimal point shifting function

Zf x( ) nn← 0

nn← nn +1 x > 10ⁿⁿ

while if x > 1

nn← nn −1 x < 10ⁿⁿ

while if x ≤ 1

x x

10^{nn 1}⁻

←

x x

← 10 if x ≥ 10

:= The function y = Zf(x)

displaces the decimal point in x such that the result y stays in the interval 1 to 10. E.g.

Zf 234( ) = 2.34 Zf 0.00234( ) = 2.34

3. The "scrambling" function ZXf Y( ) nn← length Y( )

λ← ¹ + ^{rnd 8}^{( )} Z_i^← Zf

( )

λ⋅^Y_i

i∈0 nn.. − 1 for

Z

:= This "scrambling"

function operates on an array of numbers by multiplying them with random numbers from the interval 1 to 9 and then applies the decimal point shifting Zf-function.

(3)

4. Function which calculates the statistics of the significant-digit

digstat Y( ) nn← length Y( )

Z_i← 0 i∈0 9..

for

s^← num2str Y

( )

_i

sd← substr s 0( , ,1) d ←str2num sd( ) Z_d ← Z_d + 1 Y_i ≥ 1

if

i∈0 nn.. − 1 for

Z

:= This function will

compute the statistics of the first-digit of the numbers in an array Y.

Extracts the first digit of the number. If the digit is d we increment Z_d by one.

5. Function that iterates the scrambling n times.

Znf Y n( , )

Y ← ZXf Y( ) k∈0 n.. − 1 for

Y :=

We apply the iterated scrambling to the random series X:

Itn := 40 Number of iterations.

Iterates the scrambling. This will generate an approximately

scale-invariant series of numbers in the range [1, 10). The scrambling may be considered as a dynamic process and the Newcomb distribution as its

"attractor".

Xn := Znf X Itn( , )

(4)

6. The frequencies

stat := digstat X( ) Calculates the statistics for the random series ...

statn := digstat Xn( ) ... and for the scrambled series.

A noticeably difference between the statistics of the random series and the scrambled series will be observed. The random series has nearly equal frequencies for the significant digits d = 1, ... ,9, whereas for the scrambled series the frequency tends to decrease with increasing d. (To control the "stability" of the result, mark the runif-function in section 1 above and then press F9 which will recalculate the whole thing with a new set of "seeds".)

k := 1 9..

0 2 4 6 8 10

0 50 100 150

random series

"scrambled" series

Frequencies of the first digit

stat_k statn_k

k

7. The distributions

We will next compute histograms using our data in order to obtain the probability distributions.

(5)

Again, by marking the runif-function in section 1 and pressing F9 one can repeat the calculations with new series and one should find that the distribution for the scrambled series varies closely around the Newcomb distribution whereas the random series follows an almost straight line in accordance with a uniform distribution of the significant digit.

0 2 4 6 8 10

0 0.5 1

"scrambled" series random series Newcomb

Distributions of Xn, X0 and Newcomb

dzn N dz0

N NC

zintv

The results will be compared with the Newcomb distribution.

NC_i^:= fnc zintv

(

_i

)

fnc z( ) := log z( ) The Newcomb distribution:

dz0_{i 1}₊ := dz0_i+ (i< NN − 1) histz0⋅ _i dz0₀ := 0

histz0 := hist zintv X0( , ) X0 Zf X( )

→ := 

dzn_{i 1}₊ := dzn_i+ (i< NN − 1) histzn⋅ _i dzn₀ := 0

Calculates the

histogram for scrambled data. For the random series X we have first to apply the decimal point shifting

function Zf.

histzn := hist zintv Xn( , ) zintv_i 1 9 i

⋅NN + :=

Number of intervals to be used for the histogram.

i:= 0 NN.. − 1 NN := 40

(6)

8. The Kolmogorov-Smirnov test

A standard check for whether an empirical distribution Fn(z) based on a sample of size n agrees with a given theoretical distribution F is the Kolmogorov-Smirnov test. It is based on the fact that the distribution of the variable Dn = max | Fn(z) - F(z)| is independent of the actual distribution F. For large samples n it can be shown that Dn with probability 0.95 is less than 1.36

n . Thus, if Dn is less than this it can be said that there is an agreement between the data and the theoretical distribution F on a 95% confidence level.

Resizes the empirical

distribution array to fit that of the NC-array.

Fn submatrix dzn

N ,0,length NC( ) − 1,0,0

   

:=

Dn:= max Fn^{}

(

− NC^→

)

The KS-variable.

KS95 1.36

:= N The KS 95% confidence bound.

With our choice of parameters we should typically have Dn less than the KS-limit KS95. We find Dn = 0.02849 and KS95 = 0.068 . The Kolmogorov-Smirnov test together with digit-statistics thus supplies a method for testing for scale-invariance for sets of numbers.

9. Scale-invariance: theory

Consider our scrambled series whose elements z are in the range [1, 10). The scale-invariance implies that the distribution function

F a b( , ) = Prob a( ≤ z ≤ b) satisfies F

(

λ⋅^a,λ⋅^b

)

⁼ ^{F a b}⁽ ^, ⁾

(7)

for λ > 0 such that the arguments stay in the interval [1, 10). Thus F(a,b) is a function of the ratio a/b only:

F a b( , ) G b



a

  

=

From the properties of the probability distribution,

F 1 1( , ) = 0

F a b( , ) = F 1 b( , ) − F 1 a( , )

it follows that the function G(z) must satisfy

G 1( ) = 0 G 10( ) = F 1 10( , ) = 1

G b



a

  

⁼ ^{G b}^{( )} ⁻ ^{G a}^{( )}

From this it is then obvious that G(z) can be no other than the logarithmic Newcomb distribution:

F 1 z( , ) = G z( ) = log z( )

The above arguments can be repeated mutatis mutandis if we consider the first two digits of numbers (a case also known by Newcomb and Benford). Now we use a decimal point shifting function that keeps the numbers in the interval 10 to 100. For instance, 5924 becomes 59.24. The probability that e.g. the first two digits are 13 will then be given by log 1+ 13



13

  

. If we sum over all possible combinations of the first two digits we indeed get the total probability one:

(8)

10 99

i

log 1 + i



i

  

∑

₌ ⁼ ^log

^ 

¹⁰⁰¹⁰

 

⁼ ¹

The above can be generalized to the case of the three first digits etc.

Today it may seem odd that the significant-digit phenomenon defied plausible explanations till recent times. One reason for this may have been the entrenched status of the normal (Gaussian) distribution among probabilists. A related "law" is one associated with the name of G K Zipf, a "law" figuring prominently especially in linguistics, and which has been "promoted" i.a. by Benoit Mandelbrot.

What sort of data may approach the Benford law? Typically we have two sorts of Benford data. First, heterogeneous data

collected from various sets with different distributions. Secondly, homogenous data whose source has some approximately

scale-invariant properties. Thus, Nigrini has a figure which shows the analysis of the population counts of the 3141 US counties (1990 census). The difference (standard deviation) from the Newcomb distribution of the significant-digit is less than 0.7%.

Here one might think that this reflects an approximate

scale-invariance of the sizes of the counties in the data. A similar analysis by Arnér for the 289 Swedish counties (census 1998) shows also a quite good agreement with the Newcomb distribution though it narrowly exceeds the Kolmgorov-Smirnov 95%

confidence bound (Dn = 0.10 and KS95 = 0.08). Indeed, in the

"digital analysis" pioneered by Nigrini the digit-distributions are used as "spectrograms". "Spikes/dips" above or below the Newcomb-distribution may indicate systematic fraud, fake, or some threshold phenomenon built into the system.

But, after all, why do the boiling points of the elements accord to the Newcomb distribution?

Nigrini (1999) estimates that some 150 academic papers have been published on Benford's law.

(9)

10. Bibliography

Magnus Arnér: På Irrfärder i Slumpens Värld. Studentlitteratur 2001.

["Tabellparadoxen", chapter 3. - In Swedish.]

Frank Benford: "The law of anomalous numbers". Proc. Amer. Phil. Soc. 78 (1938) 551- 572.

James Berger: Statistical Decision Theory and Bayesian Analysis. 2. ed. Springer Verlag 1985. [Page 86.]

M W Browne: "Following Bedford´s law, or looking for no. 1". New York Times (4 August 1998).

Theodore P Hill: "The first-digit phenomenon". American Scientist 86 (1996) 358 - 363.

Theodore P Hill: "The significant-digit phenomenon". Amer. Math. Monthly 102 (1995) 322 - 327.

Benoit Mandelbrot: The Fractal Geometry of Nature. W H Freeman 1983.

Robert Matthews: "The power of ten". New Scientist (10 July 1999).

Simon Newcomb: "Note on the frequency of use of different digits in natural numbers". Amer. J. Math. 4 (1881) 39 - 40.

Mark J Nigrini: "I have got your number. How a mathematical phenomenon can help CPAs uncover fraud and other irregularities".

<www.aicpa.org/pubs/jofa/may1999/niggrini.htm>

Roger S Pinkham: "On the distribution on the first significant digits". Ann. Math. Stat.

32 (1961) 1223 - 1230.

R Raimi: "The first digit problem". Amer. Math. Monthly 83 (1976) 521 - 538.

R Raimi: "The peculiar distribution of first digits". Scientific American (Dec 1969) 109 - 119.

M A Snyder, J H Curry, and A M Dougherty: "Stochastic aspects of one-dimensional discrete dynamical systems: Benford's law". Phys. Rev. E Vol. 64. No.2 (2001).

"For me the law is a prime example of a mathematical idea which is a surprise to everyone - even the experts".

T Hill

"For me, Benford is a great hero. His law is not magic, but sometimes it seems like it".

M J Nigrini.

"Thus all the explanations of [Benford's law] so far given seems to lack something of finality ....

the answer remains obscure".

R Raimi (1969).

(10)

Appendix I. Newcomb frequencies for the digits

From the above arguments it follows that the probability, in a Benford series, that the nth digit d_n is k is equal to

Prob d

(

_n= k

)

10^{n 1}⁻ 10ⁿ−1

i

log i 10⋅ + k + 1 i 10⋅ + k

   

∑

=

if n > 1 (k = 0, 1, ..., 9) and in case n = 1 (k = 1, 2, ..., 9) Prob d

(

₁= k

)

⁼ ^log

^ _

¹ ⁺_k ^k

^ _

Next we calculate a table of the Newcomb frequencies up to the 5th digit.

table

ds←num2str n( +1) s← concat ds ". digit"( , )

p_{0 n 1}_, ₊ ← s n∈0 4..

for

p_{k 1}₊ _,₀← num2str k( ) k∈0 9..

for

p_{0 0}_, ← "Digit"

p_{k 1}₊ _,₁ if k > 0 log 1 1 + k

   

, ,"void"

   

← k∈0 9..

for

p_{k 1}₊ _,_{n 1}₊

10^{n 1}⁻ 10ⁿ−1

i

log i 10⋅ + k +1 i 10⋅ + k

   

∑

=

← k∈0 9..

for

n∈1 4..

for

p :=

(11)

Table of the frequencies of the first to the fifth digit according to the Newcomb distribution:

table

"Digit" "1. digit" "2. digit" "3. digit" "4. digit" "5. digit"

"0" "void" 0.11968 0.10178 0.10018 0.10002

"1" 0.30103 0.11389 0.10138 0.10014 0.10001

"2" 0.17609 0.10882 0.10097 0.10010 0.10001

"3" 0.12494 0.10433 0.10057 0.10006 0.10001

"4" 0.09691 0.10031 0.10018 0.10002 0.10000

"5" 0.07918 0.09668 0.09979 0.09998 0.10000

"6" 0.06695 0.09337 0.09940 0.09994 0.09999

"7" 0.05799 0.09035 0.09902 0.09990 0.09999

"8" 0.05115 0.08757 0.09864 0.09986 0.09999

"9" 0.04576 0.08500 0.09827 0.09982 0.09998

=

For increasing n there is a rapid convergence to the uniform distribution of the digits with frequency 1/10. This is also proved from the above formulas using the facts that for big numbers S we have

1 S

i

1

∑

₌ i ^≈ ^{ln S}^{( )} ln 1 1

+ S

   

^≈

1 S

Appendix II. Benford series by mixing Gaussian distributions Above we demonstrated how a Benford series was generated by

"scrambling". Picking numbers randomly from various distributions generates also Benford series if the distributions are in a certain sense "unbiased", as has been pointed out by T Hill. We can demonstrate this effect by picking numbers randomly from Gaussian distributions whose standard deviations are set

randomly. Intuitively this procedure will result in an approximately scale invariant series if the standard deviations are varied over a range large enough.

(12)

gmix

σ ← ^{rnd 1000}⁽ ⁾ s_i← rnorm 1 0

(

, ,σ

)

0

i∈0 K.. − 1 for

s

:= Picks numbers from

Gaussian distributions with randomly

assigned standard deviations σ in the range of 0 to 1000.

0 200 400 600 800 1000

4000 2000 0 2000 4000

gmix_n

n

Transforms the data to the interval 1 to 10 and calculates the digit

statistics. Calculates then the

histogram and the distribution which is compared with the Newcomb distribution.

Zgmix:= Zf^

(

gmix^→

)

sgmix := digstat Zgmix( ) gm:= hist zintv Zgmix( , )

dgm₀ := 0 dgm_{i 1}₊ := dgm_i+ (i< NN − 1) gm⋅ _i

0 2 4 6 8 10

0 0.5 1

Gaussian mixture Newcomb

dgm K NC

zintv

(13)

It can be pointed out that if we draw numbers from given Gaussian distribution we also tend to get a digit statisticis which differs

significantly from the uniform 1/9 distribution:

σ := ¹⁰⁰ gs ^:= rnorm K 0

(

, ,σ

)

zgs := ^Zf

(

gs^→

)

hgs := hist zintv zgs( , ) dgs₀ := 0 dgs_{i 1}₊ := dgs_i+ (i< NN − 1) hgs⋅ _i

0 2 4 6 8 10

0 0.5 1

gaussian Newcomb dgs

K NC

zintv

stg := digstat zgs( ) nc_k log 1 1 + k

   

:=

0 2 4 6 8 10

0 0.1 0.2 0.3 0.4

gaussian Newcomb stg_k

K nc_k

k

(14)

we tend to get a digit distribution with "2" in excess when compared with the Newcomb distribution; with numbers from N(0,30) we get "3" in excess etc. Given a probablility distribution F then the probability, that for a number drawn from this distribution its first digit is 1, is given by

Prob d

(

₁= 1

)

n

Prob 10

(

ⁿ ≤ x < 2 10⋅ ⁿ

)

∑

=

n

F 10

(

ⁿ,2 10⋅ ⁿ

)

∑

=

Similarly for digits "2", "3", etc. We can implement this formula for calculating the digit distribution for a series with distribution N(0,σ).

gaussdig

( )

σ ⁿⁿ^← ^{ceil log}

( ( )

σ

)

⁺ ²

Z_k 2

−nn nn

n

pnorm k



( + 1) 10⋅ ⁿ,0,σ



pnorm k 10

(

⋅ ⁿ,0,σ

)

− +



...

 

 

∑ 

=

⋅

← k∈1 9..

for

Z :=

dstat ^:= gaussdig

( )

σ

1 9

j

dstat_j

∑

₌ ⁼ ¹

0 2 4 6 8 10

0 0.1 0.2 0.3 0.4

Gaussian simul.

Newcomb Gaussian theor.

stg_k K nc_k dstat_k

k

(15)

The distribution function FZ in terms of the decimal shifted variable z of |x| is given by

FZ z( ) n

Prob 10

(

ⁿ ≤ x < z 10⋅ ⁿ

)

∑

=

Thus in the Gauss case we have the following approximate implementation:

Fgauss z

( )

,σ ⁿⁿ^← ^{ceil log}

( ( )

σ

)

⁺ ²

Z 2

−nn nn

n

pnorm z 10

(

⋅ ⁿ,0,σ

)

pnorm 10

(

ⁿ,0,σ

)

− +



...

 

 

∑

₌



⋅

← Z :=

fg_i^:= Fgauss zintv

(

_i,σ

)

0 2 4 6 8 10

0 0.5 1

Gaussian Newcomb fg

NC

zintv

An aside: comparing the entropies. The Shannon's entropy for a discrete probability measure p is defined by

(16)

Entropy p( )

1 length p( ) 1−

i

p_i^⋅ln p

( )

_i

∑

₌

− :=

It attains maximum for the uniform distribution p₁ = ... = p_n = 1/n.

We can calculate how much the entropy for the 1st digit statistics deviates from the uniform value for the Gaussian and the

Newcomb statistics.

maxentropy:= ln 9( ) Maximum entropy.

maxentropy = 2.19722

Entropy dstat( ) = 1.9665 Entropy for Gaussian numbers.

Entropy nc( ) = 1.99343 Entropy for Newcomb statistics.

Appendix III. Benford and dynamics

In a recent paper Snyder, Curry and Dougherty studies discrete one-dimensional dynamical systems which produces Benford series. A discrete dynamics is defined by a map

x_{n 1}₊ ⁼ f x

( )

_n

A classical one-dimensional example is the logistic map

f x( ) = λ⋅^x⋅(1 − x)

on the interval [0.1] for 0 ≤ λ^≤4 . Given an initial value x₀ one may study the statistics of its orbit; i.e. the points x_n = fⁿ(x₀). In the case λ = 4 the logistic map becomes a chaotic map (also called the Ulam map). The statistics of the orbits are generally

independent of the starting point x₀ for the chaotic maps.

(17)

Also the distribution of the points is invariant vis-à-vis the map f, since a further iteration by f on the orbit does not change

anything. However, only a in a few cases (the Ulam map is one of them) has it been possible to find an analytic expression for the invariant distributions. If we can find such an expression, then as the authors show, it is easy to check whether the points of the orbit satisfy the Newcomb digit-distribution. Thus they can show that the Ulam map and the so-called tent map do not produce Benford series. In case we do not have an expression for the distribution it remains to explore the distribution numerically. For instance, the authors provide numerical evidence that the

so-called Renyi map produces Benford series.

The Renyi map is defined on [0, ∞) by

f x( ) x 1− x

( ) if x< 1 x − 1 otherwise

:= Renyi map

N := 1000 n := 0 N.. − 1

x₀ := rnd 100( ) x_{n 1}₊ ^:= f x

( )

_n

0 200 400 600 800 1000

1.10 ³ 0.01

0.1 1 10 100 1.10³

Renyi series

x_n

n

(18)

computing n₂ points for each n₁ randomly chosen orbits and determine their statistics:

SR n1 n2

(

, , f ,seed,d

)

Z_i← 0 i∈0 9..

for

x₀← rnd seed( ) − d

x_{j 1}₊ ^← f x

( )

_j

j∈0 n2 1.. − for

x Zf x( )

→

← 

Z← Z + digstat x( ) i∈0 n1 1.. − for

Z :=

ZZ := SR 100 500( , ,f ,100,0) Ztot

0 9

n

ZZ_n

∑

₌

:=

ZZ ZZ

:= Ztot Calculates the digit statistics.

0 2 4 6 8 10

0 0.1 0.2 0.3 0.4

Renyi Newcomb ZZ_k

nc_k

k

(19)

The reason why the Renyi map seems to have the asymptotic Newcomb distribution is perhaps due to its "slingshot" mechanism which effects a "scrambling" leading to scale invariance.

Heuristic argument:

Apparently this Benford property has to do with the *slingshot effect* of the Renyi map; points of the form x = 1 - r are ejected to x =1/r - 1. The graphs of the Renyi orbits also resemble fractal curves with triangles of various heights. If r > 0 is uniformly distributed near 0 we may expect the big numbers -- the height of the triangles -- x := 1/r to have a pdf p(x) of the form 1/x^2. The pdf q(x) that we have a big number x in the series would then be proportional to q(x) = p(x) + p(x+1) + p(x+2) + .... := 1/x (since a triangle of height x implies that the numbers x-1, x-2, ... are in the series too, due to the definition of the Renyi map for x > 1). Now use the decimal shifting trick and calculate the distribution function for the decimal shifted variable 1 =< z < 10.

F(z) = sum Prob(10^n < x < z*!0^n) proportional to integral from {10^n} to {z*10^n} dx/x = ln(z).

That is, we would obtain a logarithmic function in z in accordance with the Newcomb distribution. The part 0 < x < 1 seems to be sort of a mirror (?) of the case x > 1 (take a look at a graph using log-scale y-axis). When x is close to zero it takes about 1/x steps to grow back to the size around 1.

It might be that if one was to try to flesh out the argument above there would be sort of a circular-argument use of the distribution of points in the series. However, that might be ok since it proves that the distribution is the invariant distribution of the time series in the asymptotic sense.

(Indeed, one should expect it to be easier to theoretically verify whether a dynamical system has a certain invariant measure than to actually produce one in the general case.)

"Scale invariance" of fractal time series, such as random walk, suggests these have also peculiar distributions of the digits. As an example we consider the Brownian random walk:

br x( ) := x + rnd 1( ) − 0.5

Brown process.

z₀ := rnd 10( ) − 5 z_{n 1}₊ ^:= br z

( )

_n

(20)

0 200 400 600 800 1000 10

5 0 5 10

Brown series

z_n

n

ZB:= SR 100 10000( , ,br,1,0.5)

Calculates the digit statistics for a number of Brown

series. There seems to be a significant deviation from the Newcomb distribution but the result fluctuates quite much for small series.

ZB ZB

0 9

n

ZB_n

∑

₌

:=

0 2 4 6 8 10

0 0.1 0.2 0.3 0.4

Fractal Newcomb ZB_k

nc_k

k

(21)

General fractal curve with power spectrum exponent β:

noise1f n

(

,∆^t,β

)

ⁿⁿ^← ²^{n 1}⁻ ⁺ ¹

N ← 2ⁿ

fq_k k N

1

∆^t

⋅

←

k∈0 nn.. − 1 for

amp← rnorm nn 0( , ,1) θ^← ^{runif nn 0}

(

, ,2⋅π

)

F_k

amp_k⋅eⁱ^⋅^θ^k

fq_k

( )

β 2

←

k∈1 nn.. − 1 for

F₀←F₁ Z← IFFT F( ) Z

:=

∆^t ¹

:= 100 The time step used below.

Example with 1/f-noise (β = 1):

xx ^:= noise1f 10

(

,∆^t,1

)

^kk := 0 length xx.. ( ) − 1

(22)

0 500 1000 1500 60

40 20 0 20 40

1/f-noise

xx_kk

kk

The next function calculates the digit statistics for the fractal time series.

n1f n1 n

(

, ,∆^t,β

)

Z_i← 0 i∈0 9..

for

z^← ^{}noise1f n

(

,∆^t,β^→

)

x Zf z( )

→

← 

Z ←Z + digstat x( ) i∈0 n1 1.. − for

Z :=

It might also be of interest to compare a "Zipf distribution" with the Newcomb distribution and the digit distribution of the fractal time series.

The "Zipf"

distribution of the digits.

SS

1 9

n

1

∑

₌ n

:= SS = 2.82897 Zp_k 1

SS 1

⋅k :=

(23)

β := ²

Next we generate Brownian process via the

Fourier-transform method.

Zif ^:= n1f 100 10

(

, ,∆^t,β

)

Zif Zif

0 9

n

Zif_n

∑

₌

:=

This exploratory simulation suggests Brownian (β = 2) has a digit statistics which is closer to the Zipf distribution than the Newcomb distribution:

0 2 4 6 8 10

0 0.1 0.2 0.3 0.4

Fractal Newcomb

"Zipf"

Zif_k nc_k Zp_k

k