Various statistical test of pseudorandom number generator

(1)

Master Project

Mohammad shafiqul haque 2010-06-14

Subject: Mathematics Level: Advance Course code: 4MA11E

Title:Varies statistical test of pseudorandom number

generator

(2)

(3)

Abstract

This thesis is related to varies statistical test of pseudorandom number generator. In this thesis I have tried to discuss some aspects of selecting and testing Pseudorandom number generators. The outputs of such generators may be used in many cryptographic applications, such as the generation of key material. After statistical test I have tried to compaire the test value of every generator and have discussed which one is producing good sequences and which one is a good generator.

Key-words: Pseudorandom number generator (PRNG), Statistical test,Test value,

Acknowledgments

I would like to thank my supervisor Marcus Nilsson for accepting and giving chance and encourage to do my thesis under his kind supervision. I also want to thank of my head of department and teachers who helped us in different ways. I am also thankfull to my parents and friends who suported and encourage during my study. I am also giving thanks to the university library and university lab for their excellent support. At last I am giving thanks to the swedish government for giving excellent oppertunity for study in sweden.

(4)

1 Introduction

There are two basic types of generators for creating random sequences. One is random numbar generator (RNG) and another is pseudorandom number generator (PRNG). Gen- erally, a random number generator uses a non-deterministic source i.e. unpredictable source along with some processing function to generate random sequence. The outputs of an RNG may be used directly as a random number or may be fed into a pseudorandom number generator (PRNG) and, the pseudorandom number generator creates a sequence of random bits from an initial value called a seed by using known algorithm. Random numbers play an important role in the ﬁeld of network security applicaitons. The need for random and pseudorandom numbers arises in many cryptographic applications. Crypto- graphic protocols require random or pseudorandom inputs at various points, and also for auxiliary quantaties used in generating digital signatures of the employes authenticaiton.

We can generate random number by spinning wheels or rolling dice or card shufﬂing.

Nowadays we can produce a pseudorandom number sequence by using the latest computer technology. Pseudorandom numbers sequences are very useful for different types of applications. Let suppose simulation, sampling, numerical analysis and computer programming see [1]. By using PRNG with one or more inputs we can generate a lots of

"pseudorandom" numbers. A random number seems to be more random and we can obtain it through natural sources of the random numbers. We can choose an example from a semiconductor resistor that is thermal noise. It is a good source of randomness. On the other hand, just as ﬂipping coins to generate random bits would not be practical for cryptographic applications, most natural conditions are not practical due to the natural slowness in sampling the procedure and the complexity of ensuring that an opponent does not examine the process see [2]. There are many types of methods to create a pseudo random number and we have read that the Linear Congruential method and the Blum Blum Shub method are widely used to create a pseudorandom number.

1.1 Aim of the Project

The aim of the project is by using pseudorandom number generators in Mathematica we will do various statistical tests of various kinds of generators and after getting these statistical test values we will discuss the randomness of the sequences.

1.2 Random and pseudorandom number generators (RNGs and PRNG)

At the heart of simulations of random models is a method for producing random numbers- a procedure or function that will churn out number after number uniformly distributed in the interval [0,1]. The method explained in this section is the method used by most programming languages which have built in random number generators. Actually, the random number generator will be a speciﬁc formula that produces random numbers in a completely deterministic way. This is a contradiction to the very idea of randomness.

Consequently, the numbers produced by the random number generator are often called pseudorandom because, although they have a very deﬁnite pattern, they appear to have no discernible pattern detectable without knowing the exact formula used. The fact that the same sequence of pseudorandom numbers is generated each time the generator is used even useful in helping debug programs and understanding the results of the simulation.

The second type of generator is pseudorandom number generator (PRNG). A PRNG uses one or more inputs and generates multiple pseudorandom numbers. Inputs to PRNGs are called seeds. In contexts in which unpredictability is needed, the seed itself must be random and unpredictable. Hence, by default, a PRNG should obtain its seeds from the

(6)

outputs of an RNG, so a PRNG requires a RNG as a companion. The outputs of a PRNG are typically deterministic functions of the seed, so all true randomness is conﬁned to seed generation. The deterministic nature of the process leads to the term pseudorandom.

Since each element of a pseudorandom sequence is reproducible from its seed, only the seed needs to be saved if reproduction or validation of the pseudorandom sequence is re- quired.

Ironically, pseudorandom numbers often appear to be more random than random numbers obtained from physical sources. If a pseudorandom sequence is properly constructed, each value in the sequence is produced from the previous value via transformations that appear to introduce additional randomness. A series of such transformations can elimi- nate statistical auto correlations between input and output. Thus, the outputs of a PRNG may have better statistical properties and be produced faster than an RNG. Randomness means "no pattern", pseudorandomness means "no apparent pattern". Start with positive integers MULT(for multiplier), ADDR(for adder), and NORM(for normalizer). SEED is to be a pseudorandom number satisfying

0≤ SEED < NORM

Each time a new random number is needed, it is produced from the previous value of SEED by the formula

SEED := (MULT ∗ SEED + ADDR)modNORM

That is, ﬁrst SEED gets multiplied by MULT, then ADDR is added on, and ﬁnally the remainder upon division by Norm is the new value of SEED.

Example 1.1. Use these values

MULT := 6 ADDR := 5 NORM := 11

What values of SEED will be produced if the initial value of SEED is 0? if the initial value is 4 ?

Each time the new value of SEED is

SEED := (6 ∗ SEED + 5)mod11.

With SEED initially 0 this sequence is generated.

052689473105268...

With SEED initially 4 this sequence is generated.

47310526894731052...

From Example 1 two facts are apparent: First, since the next number in the sequence is generated from only the value of the previous number, if any number is generated again, the entire list is a repetition from that point on. Second, the "cycle length"- the number of distinct numbers before repetition occurs-can be at most of length equal to the value of NORM. This is so since the mod function produces the remainder upon division by NORM, which necessarily is a number between 0 and NORM-1; thus there are NORM many possible remainders-NORM many possible values of the random number SEED, that is; and as soon as one is repeated, repetition of the entire list occurs. A good random number generator would use values of MULT, ADDR, and NORM so that the cycle length is large.

(7)

Example 1.2. Consider the generator

SEED := (6 ∗ SEED + 3)mod7

Then with different initial values of SEED, these sequences are generated:

0303030303...

1212121212...

4646464646...

5555555555...

Although the value of NORM= 7 suggests that the cycle length might be the maximum value of 7, the actual cycle lengths are small; if SEED is initially 5, then the random number generator is quite useless. The theory of what values of the parameters result in good random number generators is complicated and more a subject of abstract algebra than of probability see [3].

1.3 Need for statistical test and why?

My thesis is related to statistical test and I have to do statistical test in mathematica. With- out statistical test it is not possible to get the test value sequence and without sequence it is also not possible to justify randomness of sequences. After doing statistical test we will check the randomness of the sequences. [4].

2 Applications of PRNG

2.1 Simulation

Simulation is essentially a technique of statistical sampling control. Used in combination with a model to obtain approximate answers to questions about complex, multifactorial probalistic problems. It is very useful when the numerical and analytical techniques can not answer. Simulation is infact a statistical experiment, performed in a digital computer see [5]. Statistical models for estimating the characteristics of the distributions as a means of functions, quantiles and other functions that can not be calculated in closed form. Using a simulation estimator, it is good for calculating a measure of how accurate the estimate, plus the estimate itself see [6]. Simulation is a technique that can be used to shed light on how a complex system as well as a thorough analysis is not available.

Example 2.1. Engineers can simulate trafﬁc models in the surrounding area of a construc- tion plan to use what the effects may have different limitations. A physicist can simulate the activities of gas molecules under conditions that are enclosed by no known assumptions. Statistical models used to estimate the probability of the type of our models can not be considered logically. Simulation, because it includes and ingredients of randomness in an analysis, it is occasionally the Monte Carlo analysis, the name of the famour European gambling see [6].

(8)

2.2 Cryptography

Cryptography means hiding information. We can see the use of cryptography in the ﬁeld of mathematics, computer science and engineering. We are using cryptography in the advanced applications for example ATM (automated teller machine) Cards, computer passwords, and the electronic commerce Pseudorandom number generator is related to cryptography the degine of Pseudorandom number generators is one of the main is- sues in stream ciphering, where Pseudorandom number sequences are often employed as keystreams see [7].

Example 2.2. In cryptograph to encrypt a plaintext we can use a pseudorandom number as a key stream and to decrypt the ciphertext also use pseudorandom number then we get our plaintext again. So we can see here pseudorandom number is very essential in our cryptograpic application.

Our plaintext is 01101001100 Key stream is 10110010111

Add by bit wise then Our ciphertext becomes 11011011011

Again if we add kye stream with ciphertext by bit wise then we will get back our plaintext which is given below

11011011011 plus10110010111 plaintext 01101001100

3 Some Types of PRNGs

3.1 Linear Congruential Generator

The most widely used technique for pseudorandom number generation is an algorithm, it is called linear congruential method, ﬁrst introduced by D. H. Lehmer in 1949. We need to choose four parameters which are given in the table below.

Parameter name Range

m 0< m

a 0≤ a < m

c 0≤ c < m

X₀ 0≤ X0< m

The sequence of random numbers(Xn) is obtained via the following equation

X_n+1≡ (aXn+ c) (mod m),0 ≤ Xn+1< m. (3.1) This will produce a sequence of integers with each integer in the range 0≤ Xn< m, (see [4]).

Example 3.1. Let us now consider the Linear Congruential Generator for the parameters m=5 , a=2, X₀= 2 and c=1. We get,

X₁≡ 2 · X0+ 1 (mod m)

≡ 2 · 2 + 1 (mod 5)

(9)

≡ 5 (mod 5)

≡ 0 (mod 5) and,

X₂≡ 2 · 0 + 1 ≡ 1 (mod 5) in a similar way we get,

X₃≡ 3,X4≡ 2,X5≡ 0,X6≡ 1.

The sequence is 0 1 3 2 0 1 3 2. . . We can see that the sequence is repeating with period 4.

3.2 How to produce a bit sequence

We will discuss about how to produce a bit sequence. We can consider a linear congruen- tial equation mod m and its starting value X₀=2.

X_n≡ 3Xn−1+ 5 (mod m),X0= 2. (3.2) By using the above mentioned linear congruential equation we can produce a binary se- quence of S_m. We can also produce the binary sequence by using the last bit, last three bits and last six bits of every number in Sm.

Example 3.2. Let us take m=11 in our linear congruantial equation then we can produce the binary sequence by using the last bit, last three bits and last six bits of every number in S₁₁

our linear congruantial equation becomes

X_n≡ 3Xn−1+ 5 (mod 11),X0= 2.

If we take n=1, then our equation becomes

X₁≡ 3X0+ 5 (mod 11).

X₁≡ 3 · 2 + 5 (mod 11).

X₁≡ 11 (mod 11).

X₁= 0 when n=2

X₂≡ 3X1+ 5 (mod 11).

X₂≡ 3 · 0 + 5 (mod 11).

X₂≡ 5 (mod 11).

X₂= 5

In a similar way we get the sequence 2 0 5 9 10 2 Now we will make it to a binary sequence. So our binary sequence becomes 0 0 1 1 0 0.

Now we will produce a binary sequence by using the last three bits of every number in S₁₁. The binary sequence of 2 is

2= 0 · 2⁰+ 1 · 2¹

(10)

So the binary sequence of 2 is 1 0

In a similar way we get the binary sequence of 0 is 0 The binary sequence of 5 is 1 0 1

The binary sequence of 9 is 1 0 0 1 The binary sequence of 10 is 1 0 1 0

Now the binary sequence by using the last bit of every number in S₁₁are 0,0,1,1,0,0.

The binary sequence by using the last three bits of every number in S₁₁are 010,000,101,001,010.

The binary sequence by using the last six bits of every number in S₁₁ are 000010,000000,000101,001001,001010.

4 Probability Distribution

A probability distribution indicates either the probability of each value of an unidenti- ﬁed random variable or the probability of the value falling within a particular interval.

The behavior of a random variable is characterized by its probability distribution, that is, by the way probabilities are distributed over the values it assumes. The corresponding functions for a continuous random variable are the probability distribution function, probability distributions are uses to calculate definite intervals for parameters and to calculate critical area for hypothesis tests. It is sometimes useful to identify a reasonable distributional model for the data. Statistical intervals and hypothesis tests are often based on specific distributional assumptions. Before computing an interval based on distributional assumption, we need to justify that the assumption is justified for the given data. In this case, the distribution does not need to be the best-fitting distribution for the data. But an adequate enough that the statistical technique yields valid conclusions. Also simulation studies with random numbers generated from using a specific probability distribution are often needed. That the term probability functions covers both discrete and continuous distributions. We may use the term probability density functions to mean both discrete and continuous probability functions. Given a random experiment with its associated random variable X and given a real number x, let us consider the probability of the even (s : X(s) ≤ x),or, simply, P(X ≤ x). This probability is clearly dependent on the assigned value x. The function

Fx(x) = P(X ≤ x), (4.1)

is defined as the probability density function (PDF), or simply the distribution function, of X. In equation(4.1), subscript X identifies the random variable. This subscript is some- times omitted when there is no risk of confusion. Let us repeat that Fx(x) is simply P(A), the probability of an event A occuring, the event being X≤ x. The PDF is thus the prob- ability that X will assume a value lying in a subset of S, the subset being point x and all points lying to the left of x. As x increases, the subset covers more of the real line, and the value of PDF increases until it reaches 1. The PDF of a random variable thus accumulates probability as x increases, and the name cumulative distribution function (CDF) is also used for this function. In view of the definition and the discussion above, we give some of the important properties possessed by a PDF.

It exists for discrete and continuous random variables and has values between 0 and 1.

It is a nonnegative, continuous to the left, and nondecreasing function of the real variable x.

Fx(−∞) = 0,andFx(+∞) = 1 (4.2)

(11)

If a and b are two real numbers such that a< b,than

P(a < X ≤ b) = Fx(b) − Fx(a). (4.3) This relation is a direct result of the identity

P(X ≤ b) = P(X ≤ a) + P(a < X ≤ b).

We see from equation (4.3) that the probability of X having a value in an arbitrary interval can be represented by the difference between two values of the PDF. Generalizing, probabilities associated with any sets of intervals are derivable from the PDF see [8].

4.1 Gamma distribution A random variable X with density

f(x) = 1

Γ(α)β^αx^α−1e^−x^β

where x> 0, α > 0, β > 0 is said to have a gamma distribution with parameters α and β see[10], where

Γ(α) = ^∞

0

t^α−1e^−tdt.

4.2 chi-square (χ²) distribution

A particular type of gamma distribution known as aχ²distribution. This distribution is closely related to a random samples from a normal distribution, which is widely applied in the ﬁeld of statistics. The gamma distribution with parameterα and β, and positive integer n, the gamma distribution for whichα = n/2 and β = 1/2 is called the chi square distribution with n degrees of freedom. The chi square distribution has one parameter,

Figure 4.1: chi square distribution with 6 degrees of fredom

its degrees of freedom. It has a opposite skew, the skew is less with more degrees of freedom. As the degrees of fredom increase, the chi square distribution approaches a normal distribution. The mean of a chi square distribution is its degree of freedom. see [6]

(12)

Figure 4.2: chi square distribution with 3 degrees of freedom

Figure 5.1: chi square distribution with 5 degrees of freedom

5 chi-square tests

Any statistical test that uses the chi- square distribution can be called chi-square test.

The chi square test is perhaps the best known of statistical test and it is basic method that is used in connection many other test. In a particular example of the chi square test as it might be applied to dice throwing. The chi Square distribution is a mathematical distribution that is directly in many tests of significance. The most common use of the chi square distribution is to test differences among propertions. Although this test is by no means the only test based on the chi square distribution, it has come to be known as the chi square test The chi-square test is the most commonly used method for comparing frequences or propertions. It is a statistical test used to determine if observed data deviate from those expected under a particular hypothesis. The chi-square test is also referred to as a test of a measure of "goodness of fit" between data. Theχ² distribution with k- 1 degrees of freedom is a process for testing the null hypothes is that our data from an sample from a specific distribution against the alternative hypothesis that the data have some other distribution. The test is most natural when the specific distribution is discrete.

Suppose that there are k possible values for each observation. We observe Niwith value i for i= 1,....,k. Suppose that the null hypothesis says that the probability of the ith possible value is p_ifor i= 1,....,k. Then we compute

Q=

∑

^k

i=1

(Ni− npi)²

np_i (5.1)

Where n= ∑^ki=1N_i is a simpel size. When the null hypothesis says that the data have a continuous distribution, then one must ﬁrst create a corresponding discrete distribution. One does this by dividing the real line into ﬁnitely many intervals, calculating the probability of each interval p₁,...pk, and then pretending as we learned from the data

(13)

were into which intervals each obserbation fell. This converts the original data into discrete data with k possible values. All the χ² test statistics in this text have the form

∑(observed−expected)²

expected , where "observed" stands for an observed count, and "expected"

stands for the expected value of the observed count under the assumption that the null hypothesis is true.

In 1900, Karl Pearson showed that if the hypothesis H₀ is true , then as the sample size n→ ∞, the degrees of freedom of Q converges to the degrees of freedom of the χ²distri- bution with k-1 degrees of freedom. Thus if H₀is true, and the sample size n is large, the distribution of Q will be approximately aχ²distribution with k-1 degrees of freedom the discussion that we have presented indicates that H₀should be rejected when Q≥ C, where C is an appropriate constant. If it is desired to carry out the at the level of signiﬁcance∝o, then C should be chosen to be the 1− ∝oquantity of theχ²distribution with k-1 degrees of freedom. This is called theχ²test of goodness-of-ﬁt. see [6]

The chi-square analysis is used to test the null hypothesis H₀, which is the hypothesis that states there is no signiﬁcant difference between expected and observed data. Investi- gators either accept or reject H₀, after compairing the value of chi-square to a probability distribution. Chi-square values with low probability lead to the rejection of H₀ and it is assumed that a factor other than chance creates a large deviation between expected and observed results.

Example 5.1. If we flip a coin 200 times the probability of flipping heads is 0.5, and the probability of flipping tails is 0.5

this means that we are predicting that half of the time the coin will come up heads, and half of the time the coin will come up tails.

then our hypothesis predicts.

Expected; Heads-100, Tails-100, total- 200 to test hypothesis, we are ﬂipping our penny 200 times .

Observed: Heads- 108, Tails- 92, total- 200 our chi-square formula is

Q=

∑

^k

i=1

(Ni− npi)²

np_i (5.2)

For heads,(Ni− npi)²= (108 − 100)²= (8)²= 64 For tails,(Ni− npi)²= (92 − 100)²= (−8)²= 64

The number of trails is very important. A particular deviation from perfect means a lot more if there are only a few trials then it would if there were many trials, this is done by dividing our(Ni− npi)²values by the expected values

For heads, ^(Nⁱ^−np_np ⁱ⁾²

i =₁₀₀⁶⁴ = 0.64 For tails, ^(Nⁱ^−np_np ⁱ⁾²

i =₁₀₀⁶⁴ =0.64

To calculate the chi-square value for our experiment, we add together all of the ^(Nⁱ^−np_np ⁱ⁾²

i

values sum ofχ²= 0.64 + 0.64 = 1.28

We can descrive all information by the following table.

data observed expected (O − E) (O − E)² ^(O−E)_E ²

heads 100 108 8 64 .64

tails 100 92 -8 64 .64

total 200 200 sumχ²= 1.28

(14)

Now we have to ﬁnd χ²(v) (degrees of freedom ). To calculate the χ²(v) we need to know the numbers of classes of data. In the case of this example that number would be two ("Heads" or "Tails") so, this degree of freedom is, χ²(v) = (2-1)=1. If we were dealing with dice rather then coins then df would be (6-1)=5. Now we have the sum of χ²and theχ²(v) 1.28 and 1 respectively. According to chi-square distribution table, 1.28 falls between the numbers 1.07 and 1.64 which represents 0.30 and 0.20 respectively. So, we could say, that probability of our chi-square falls between 0.20 and 0.30.

A probability of 0.20 corresponds to a "chance" of 20%, and 0.30 to a chance of 30%, this chi- square result means that, If our hypothesis is correct, then our results would be at least this far from what we predited or the probability that we would get results at least as bad as these, even though our hypothetsis is correct is between 0.20 and 0.30.in bilogically applications, a probability 5% is usually adopted as the standard. This values means that the chances of an observed value arising by chance is only 1 in 20, beacause the chi squared value we obtained in the coin exemple is greater then 0.05, we accept the null hypothesis as true and conclud that our coin is fair[10]

6 statistical test

Now we are going to deﬁne a couple of tests to use on our mathamatica ﬁle. Note that we will be dealing with binary sequence. We will use n=100 as a sample size.

6.1 Monobit test

Here the focuse of the test is the proportion of zeroes and ones for the entire sequence.

The purpose of this test is to determine whether the number of zeros in a sequence are approximately the same as would be expected for a truly random sequence. All subsequence tests depend on the passing of this test. Now we will derive the statistic for the monobit test from our chi square formula see(6.4). In our statistical test for monobit test we can take the length of the subsequence is 1, The number of different outcomes k is 2 and the degree of fredom is k-1=2-1=1. p_i=1/2 is the probability for monobit test and n=100, is the number of samples. Now our chi square formula becomes

Q₁=

∑

²

i=1

(Ni− npi)² np_i

=(N1− np1)²

np₁ +(N2− np2)² np₂

=(N1− 50)²

50 +(N2− 50)²

50 .

(6.1)

6.2 Twobit test

Now we will derive the statistic for the twobit test from our chi square formula see (6.4).

In our statistical test for monobit sequence we can take the length of the subsequence is 2, The number of different outcomes k is 4 and the degrees of fredom is k-1=4-1=3. p_i=1/4 is the probability for twobit test and n is the number of samples. Now our chi square

(15)

formula becomes Q₂=

∑

⁴

i=1

(Ni− npi)² np_i

= (N1− np1)²

np₁ +(N2− np2)²

np₂ +(N3− np3)²

np₃ +(N4− np4)² np₄

= (N1− 25)²

25 +(N2− 25)²

25 +(N3− 25)²

25 +(N4− 25)²

25 .

(6.2)

So outcomes of two bits sequences are (00,10,01,11) 6.3 Threebit test

Now we will derive the statistic for the threebit test from our chi square formula see (6.4).

In our statistical test for threebit sequence we can take the length of the subsequence is 3, then the number of different outcomes k is 8 and the degree of fredom is k-1=8-1=7.

p_i=1/8 is the probability for threebit test and n is the number of samples. Now our chi square formula becomes,

Q₃=

∑

⁸

i=1

(Ni− npi)²

np_i . (6.3)

And outcomes of three bits sequences are (000,100,010,001,110,101,011,111) 6.4 Subsequences of length t

Now we are going to discuss the subsequence of length t. Let t be a positive integer. The number of different outcomes is 2^t and the probability of subsequence of length t is 1/2^t and the chi square formula becomes

Q_t = ²

t

i=1

∑

(Ni− n/2^t)

n/2^t (6.4)

6.5 General discussion about testvalues percentage

From probability distribution function we can say that our test values must be between 0 and 1 (see4.3)

Figure 6.1: chi square distribution with 6 degrees of fredom

In the figure right sight area is significant level ofα, which is 5% area of whole figure.

If our test values V lies in the area of α then we can say that our test value is not good About test values percentage see the percent table.

(16)

7 Investigates of some pseudo random number generators

The sequence of random numbers(Xn) is obtained via the following equation

X_n+1≡ (aXn+ b) (mod m),0 ≤ Xn+1< m, X₀= S.

Which is called a linear congruential equation. Where m is the modulus, a is the multi- plier, b is the increment and X₀= 0 is a starting value.

Now we will choose the values of m, a, b and s, and investigate different generators by doing statistical test. There are ﬁve generators taken with different values and we will observe after statistical test what kind of sequences the values of different generators produces and we will justify that the values are random or not random. The values for generators which we will use to investigate the pseudorandom number generators is presented in the table below.

generators m a b s

genetator 1 400 25 5 2 generator 2 2509 23 5 2 generator 3 1578 25 5 2 generator 4 3568 25 5 2 generator 5 6784 27 5 2

Given the above mentioned 5 generators and its values we can do statistical test for last one bit test, last 3 bit test and last 6 bit test (see 6) and in each bit we can change the value of t suppose for last 1 bit test for generator 1, we do the statistical test when t=1, 2 and 3 one after another. After each statistical test we will get different test values of sequences and after analyzing these test values we will discuss which sequence is random and which one is not random. We can now discuss the statistical test value V, if V is less then the 1% entry or greater than the 99% entry, we can reject the numbers as a not sufﬁciently random, and according to our percentage table (see :percentage table)we can say that the number is "very bad". If v lies between the 1% and 5% entries or between the 95% and 99% entries, then we can see in table (see:percentage table) , the numbers are "bad".If V lies between the 5% and 25% entries, or the 75% and 95% entries, then according to your percentage table we can say that the number is "not so good" but the number will be

"good" when the number lies between 25% and 75%.We can describes the percentage of the test values in a easy way in a table below.

percentage scores

0%-1% very bad

1%-5% bad

5%-25% not so good

25%-75% good

75%-95% not so good

95%-99% bad

99%-100% very bad

(17)

After statistical test we get the sequences. The chi-square test is often done at least three times on different sets of data which is test1, test2 and test3 , and if at least two of the three tests are "not good" the results are suspect the numbers are regarded as not sufﬁciently random. Now we will discuss about generator1.

7.1 Generator 1

7.1.1 generator 1 for last 1 bit

test test 1 test 2 test 3

monobit test very bad very bad very bad 2 sequence test very bad very bad very bad 3 sequence test very bad very bad very bad 7.1.2 generator 1 for last 3 bit

monobit test bad very bad very bad 2 sequence test very bad very bad very bad 3 sequence test very bad very bad very bad 7.1.3 generator 1 for last 6 bit

test test1 test 2 test 3

monobit test not so good Good Good

2 sequence test bad bad very bad

3 sequence test not so good not so Good good 7.1.4 general discussion

In our generator 1 for 1 bit we can see that mono bit, two bits and three bits tests values all are "very bad" and when we change the bits from 1 to 3 bits then the values of the tests are very similler to 1 bit tests when we change the bit from 3 to 6 bits then we can see that in monobit test three of the two values are "good" and in two sequences test we can see that three of the two tests are "bad" and one is "very bad" and in three sequences test two tests are "not so good" and one is "good". So we can say that in generator 1, one bit and three bits test are not good but 6 bits test are better than other two.

7.2 Generator 2

monobit test Good Good Good

2 sequence test very bad very bad very bad 3 sequence test not so good Good Not so

good 7.2.2 generator 2 for last 3 bits

monobit test Very bad Good not so good 2 sequence test good good not so good 3 sequence test not so good not so good not so good

(18)

7.2.3 generator 2 for last 6 bits

monobit test not so good good good 2 sequence test bad not so good not so good 3 sequence test not so good not so good good 7.2.4 general discussion

In generator 2 for last 1 bit we can see three test values are "good" but in 2 sequences test all test values are "very bad" and in 3 sequences test two values "are not so good" and one is "good". If we change the bit from 1 bit to 3 bit then we can see that in monobit test one value is "good" and other two is "very bad" and "not so good" but in 2 sequence test two values are "good" and one is "not so good". Unfortunately in 3 sequence test all values are "not so good". Now in 6 bit test we can see that two tests are "good" and one is "not so good" but in 2 sequences test two values are "not so good" and one is "bad" also for 3 sequences test we get two of the three test values are "not so good" values and one is

"good". In generator 2 we can see that for 1 bit test we get 4 "good" sequences, for 3 bit sequences we get 3 "good" sequences and for 6 bits test we get 3 "good" sequences. So we can say that in generator 2 we get more "good" sequences in one bit test, so generator 2 for one bit test is batter than other two .

7.3 Generator 3

monobit test very bad very bad very bad 2 sequence test very bad very bad very bad 3 sequence test very bad very bad very bad 7.3.2 generator 3 for last 3 bits

monobit test not so good not so good not so good 2 sequence test very bad very bad very bad 3 sequence test not so good good very bad 7.3.3 generator 3 for last 6 bits

monobit test not so good good good 2 sequence test not so good good not so good 3 sequence test not so good good bad

7.3.4 general discussion

In generator 3 for last 1 bit we can see that monobit, twobits and three bits test values all are "very bad". In 3 bit test, all three test values are "not so good" in monobit test and 2 sequence test all test values are "very bad", in 3 sequence test the test values are "not so good", "good" and "very bad". In 6 bit test, for monobit test we get two "good" and one

"not so good" test values, for 2 sequence test we get two "not so good" and one "good"

test values and for 3 sequences test we get "not so good", "good" and "bad" test values.

We are looking here that, generator 3 for last one bit generates all "very bad" sequences but 6 bits test makes some good sequences. So we can say that generator 3 is good for last 6 bits.

(19)

7.4 Generator 4

monobit test very bad not so good not so good 2 sequence test very bad very bad very bad 3 sequence test very bad very bad very bad 7.4.3 generator 4 for last 6 bits

monobit test good not so good good 2 sequence test good not so good good 3 sequence test not so good not so good bad 7.4.4 general discussion

In generator 4 for last 1 bit we can see that mono bit, two bits and three bits test values all are "very bad". Generator 3 for last 3 bits test, we get two "not so good" and one is

"very bad" in monobit test. In 2 sequence and 3 sequence test we get all test values are

"very bad", In generator 4 for 6 bits, for monobit test we get two "good" and one "not so good" test values, for 2 sequence test we get one "not so good" and two "good" test values and for 3 sequence test we get two "not so good", and one "bad" test values. Now we can say that generator 4 for last bit sequences is not producing the good sequences but only generator 4 for last 6 bits producing few good sequences. So generator 4 is better for more bits.

7.5 Generator 5

monobit test good bad not so good 2 sequence test bad very bad very bad 3 sequence test very bad very bad very bad

(20)

7.5.4 general discussion

In generator 5 for last 1 bit we can see that monobit, twobits and three bits test values all are "very bad". Generator 5 for last 3 bits ,we get also all values are "very bad", but In 6 bit test, for monobit test we get "good", "bad" and "not so good", for 2 sequence test we get two "very bad" and one "bad" test values and for 3 sequence test we get all test values are "very bad". So we can say that generator 5 is not good to produce good sequences.

8 Conclusion

After statistical test we get the test values and according to our percentage table we mark the test values as a "good", "not so good", "bad" and "very bad". Now we will compaire the values from generator to generator. In generator 1, 3, 4 and 5 for last 1 bit test we can see that all the test values are "very bad but" in generator 2 we can see that all monobit test are "good" but two sequence test value are "very bad"’. In generator 1 for last 3 bit we can see that almost all sequences test value are "very bad", but in generator 2 for last 3 bit, maximum test values are "not so good". In generator 3 we can see that the number of test values "not so good" and "very bad" are equal, but in generator 4 maximum values are "very bad" and in generator 5 all values are "very bad". We can observe that in every generator last 6 bits creating some "good" sequences. In generator 2 and 3 for last 3 bits also produces some "good" sequences but maximum are "not so good" sequences.

Generator 1 and 5 are very simillar because 1 bit and 3 bits test results are very simillar in both generators, but in generator 3 we can see that maximum are "good" and "not so good". Now in generator 5 we can see that maximum are "very bad". Compairing all five generators we can find many "good" test values for last 6 bits except generator five, and i can say that generator 2 is the best generator to produce "good" sequences.

References

[1] Sheldom M.Ross simulation, 2002.

[2] http://csrc.nist.gov/publications/nistpubs/800-22-rev1/SP800-22rev1.pdf [3] Frederick Solomon Probability and Stochastic Processes

[4] Knuth, The art of computer programming, volume 2, 1998.

[5] P.A.W.Lewis and E.J.Orav, Simulation methodology for statisticians,operation ana- lysts, and engineers 1998.

[6] Morris H. DeGroot, Mark J.Schervish, Probability and statistics 2002.

[7] Wade Trappe and Lawrence C. Washington Introduction to Cryptography with Cod- ing Theory 2006.

[8] T.T. Soong Fundamentals of Probability And Statistics For E ngineers [9] Jay L.Devore Probability and Statistics for Engineering and the Sciences [10] J. Susan Milton, Jesse C.Arnold Introduction to Probability and Statistics [11] http://www. science.jrank.org/pages/1401/chi square- test.html

[12] Kenneth H. Rosen , Elementary number theory and its application.

(21)

9 appendix

generator 1 for last 1 bits test when t=1,

when, n=100, test value is 0 , 1 n=200, test value is 0 , 1 n=300, test value is 0 , 1 now for generator 2,when t=1

when, n=100, test value is 0.36 , 0.548506 n=200, test value is 0.36 , 0.548506 n=300, test value is 0.36 , 0.548506 now adding last 1 bits test when t=2, for generator 1.

when, n=100, test value is 300 , 0 n=200, test value is 300 , 0 n=300, test value is 300 , 0 now last 1 bit test when t=2, for generator 2,

when, n=100, test value is 11.44 , 0.00956972 n=200, test value is 11.92 , 0.00766229 n=300, test value is 11.44 , 0.00956972 now last 1 bit test when t=3 for generator 1.

when, n=100, test value is 300 , 0 n=200, test value is 77.6 , 4.24105

n=300, test value is 100 , 0 now last 1 bit test when t=3 for generator 2.

when, n=100, test value is 4.84 , 0.0278069 n=200, test value is 7.84 , 0.00511026

n=300, test value is 9 , 0.0026998 now last 3 bit test when t=1 for generator 2.

when, n=100, test value is 0 , 1 n=200, test value is 0.64 , 0.423711 n=300, test value is 1.44 , 0.230139 now last 3 bit test when t=2 for generator 1.

when, n=100, test value is 21.84 , 0.0000704273 n=200, test value is 21.84 , 0.0000704273 n=300, test value is 23.12 , 0.0000381227

(22)

now last 3 bit test when t=2 for generator 2.

n=300, test value is 4.68 , 0.7038 now last 6 bit test when t=3 for generator 2.

n=300, test value is 8 , 0.332594 if we organize avobe information into a table then it becomes generator:1 for last 1 bit

(23)

test test1 test2 test3

monobit test 1 1 1

2 sequence test 0 0 0 3 sequence test 0 0 0 generator:2 for last 1 bit

test test1 test2 test3.

monobit test 0.548506 0.548506 0.548506 2 sequence test 0.00956972 0.00766229 0.00956972 3 sequence test 0.184085 0.267336 0.202587 generator:3 for last 1 bit

monobit test 1 1 1

2 sequence test 0 0 0 3 sequence test 0 0 0 generator:1 for last 3 bits

monobit test 0.0278069 0.00511026 0.0026998 2 sequence test 0.0000704273 0.0000704273 0.0000381227 3 sequence test 0.999988 0.999988 0.999988 generator:2 for last 3 bits

monobit test 1 0.423711 0.230139 2 sequence test 0.288573 0.623678 0.905525 3 sequence test 0.849824 0.934437 0.895877 generator:3 for last 3 bits

monobit test 0.689157 0.689157 0.689157 2 sequence test 0.0199763 0.0000176682 1.88138–

3 sequence test 0.151307 0.625835 0.995448 generator:4 for last 3 bits

(24)

monobit test 1 0.841481 0841481

2 sequence test 0.0000704273 0.0000704273 0.0000381227 3 sequence test 0.999988 0.999988 0.999988 generator:5 for last 3 bits

monobit test 0.0026998 0.000318217 0.000673859 2 sequence test 6.47316– 1.21377– 1.70053–

3 sequence test 0 0 0

generator:1 for last 6 bits

monobit test 0.841481 0.689157 0.689157 2 sequence test 0.0248683 0.0.494368 0.00231953 3 sequence test 0.881277 0.881277 0.7038 generator:2 for last 6 bits

monobit test 0.423711 0.0278069 0.230139 2 sequence test 0.016033 0.0045509 0.00888689 3 sequence test 0.000555888 0.000487356 0.000487356 Here i am giving mathematica code in below

Length of subsequence to test t = 2;

outputbits = 3;

m = 400;

(25)

The generator

lincongen[a_Integer, b_Integer, n_Integer, x_Integer] :

= Mod[a*x + b, n]

Generate a suequence. list1 contains numbers modulo n lis2 contains numbers modulo 2

f[x_] := lincongen[25, 5, m, x];

list1 = {};

list2 = {};

s = 2;

x = s;

Do[

AppendTo[list1, x];

bits = IntegerDigits[x, 2, outputbits];

list2 = Join[list2, bits];

x = f[x];

, {i, 1, m}]

list1;

list2;

list2

Create a table containing the frequences of the different subsequences totalfreq = Table[0, {i, 0, 2^t - 1}]

Do[

pos = t*i + 1;

test = Take[list2, {pos, pos + t - 1}];

totalfreq[[FromDigits[test, 2] + 1]]++;

, {i, 0, Length[list2]/t - 1}]

{0, 0, 0, 0}

FromDigits[{1, 1, 1, 1}, 2]

totalfreq

{150, 250, 50, 150}

n = 100;

freq = Table[0, {i, 0, 2^t - 1}];

Do[

pos = t*i + 1;

freq[[FromDigits[test, 2] + 1]]++;

, {i, 0, n - 1}]

stat = N[Sum[(freq[[i]] - n*(1/2^t))^2/(n/2^t), {i, 1, 2^t}]]

testvalue = 1 - N[CDF[ChiSquareDistribution[2^t - 1], stat]]

21.84

0.0000704273 freq

(26)

{50, 50}

n = 100;

freq = Table[0, {i, 0, 2^t - 1}];

Do[

pos = t*i + 1;

, {i, n, 2*n - 1}]

0.04 0.841481 n = 100;

freq = Table[0, {i, 0, 2^t - 1}];

Do[

pos = t*i + 1;

, {i, 2*n, 3*n - 1}]

0.04 0.841481

(27)

SE-351 95 Växjö / SE-391 82 Kalmar Tel +46-772-28 80 00

dfm@lnu.se Lnu.se

Various statistical test of pseudorandom number generator

Title:Varies statistical test of pseudorandom number

generator

Abstract

Acknowledgments

Contents

1 Introduction

2 Applications of PRNG

3 Some Types of PRNGs

4 Probability Distribution

5 chi-square tests

∑

∑

6 statistical test

∑

∑

∑

∑

7 Investigates of some pseudo random number generators

8 Conclusion

References

9 appendix