Deterministisk Komprimering/Dekomprimering av Testvektorer med Hjälp av en Inbyggd Processor och Faxkodning

(1)

Deterministic Test Vector

Compression/Decompression Using an

Embedded Processor and Facsimile

Coding

by Jon Persson

LITH-IDA-EX–05/033–SE 2005-03-21

(2)

Master’s thesis

Deterministic Test Vector

Compression/Decompression Using an

Embedded Processor and Facsimile Coding

by Jon Persson

LiTH-IDA-EX–05/033–SE

Supervisor and Examiner: Erik Larsson

Department of Computer and Information Science at University of Link¨oping

(3)

(4)

Abstract

Modern semiconductor design methods makes it possible to design in-creasingly complex system-on-a-chips (SOCs). Testing such SOCs becomes highly expensive due to the rapidly increasing test data volumes with longer test times as a result. Several approaches exist to compress the test stim-uli and where hardware is added for decompression. This master’s thesis presents a test data compression method based on a modiﬁed facsimile code. An embedded processor on the SOC is used to decompress and ap-ply the data to the cores of the SOC. The use of already existing hardware reduces the need of additional hardware.

Test data may be rearranged in some manners which will affect the compression ratio. Several modifications are discussed and tested. To be realistic a decompressing algorithm has to be able to run on a system with limited resources. With an assembler implementation it is shown that the proposed method can be effectively realized in such environments. Experimental results where the proposed method is applied to benchmark circuits show that the method compares well with similar methods.

A method of including the response vector is also presented. This ap-proach makes it possible to abort a test as soon as an error is discovered, still compressing the data used. To correctly compare the test response with the expected one the data needs to include don’t care bits. The tech-nique uses a mask vector to mark the don’t care bits. The test vector, response vector and mask vector is merged in four diﬀerent ways to ﬁnd the most optimal way.

Keywords: System-on-a-chip(SOC) testing, test data compression/de-compression, processor-based testing, variable-to-variable-length codes, facsimile coding, deterministic testing.

(5)

(6)

Acknowledgements

A lot of thanks to Erik Larsson, my supervisor and examiner at IDA (De-partment of Computer and Information Science at University of Link¨oping) who helped me a lot. Not only with the explanation of how SOC’s are tested but also all practical issues and last but not the least, a lot of reasoning about upcoming ideas and problems.

I would also like to thank Kedarnath Balakrishnan, University of Texas, for sending me ISCAS’89 test vectors and Syed Irtiyaz Gilani at IDA for the D695 test and response vectors. Without ability to test the method with realistic data I would know nothing about the quality of the method. Thanks to all my friends who have discussed the subject with me and the biggest thank to Louise who encouraged me all the way during this work, I love you!

Thanks, Jon

(7)

(8)

Abbreviations

ATE Automatic Test Equipment

ATPG Automatic Test Pattern Generator BIST Built-In Self-Test

CPU Central Processing Unit CUT Core Under Test

DSP Digital Signal Processor (or Processing) FDR Frequency-Directed Run-Length

I/O Input/Output

MISR Multi-Input Signature (or Shift) Register NOP Dummy Instruction in Assembler

SOC System-on-a-Chip TAM Test Access Mechanism X Don’t Care Bit

(9)

Introduction

This chapter gives an introduction to System-on-a-Chip (SOC) and to test-ing.

1.1 System-on-a-Chip (SOC)

“System-on-a-chip (SoC or SOC) is an idea of integrating all components of a computer system into a single chip. It may contain digital, analogue, mixed-signal, and often radio-frequency functions – all on one chip.”

Wikipedia (http://en.wikipedia.org) Modern semiconductor design methods and manufacturing technologies enable the creation of a complete system on one single die, the so-called

system chip or SOC [4]. Such system chips typically are very large

In-tegrated Circuits (ICs), consisting of millions of transistors, containing a variety of hardware modules [4]. These modules, called cores are reusable, predesigned silicon circuit blocks. Embedded cores incorporated into sys-tem chips cover a very wide range of functions like processor, mpeg cod-ing/decoding, memory etc.

(14)

1.2. Testing Processor Memory Core A Core B 6 ?

Figure 1.1: Example SOC

Throughout this report we will look at a simple example SOC shown in Figure 1.1. It contains a processor, a memory and two small cores, the ones that will be tested.

1.2 Testing

When testing a core (referred to as core-under-test or CUT) the core will be set to a starting state and the system clock will be applied, bringing the core to its next state, the response, which is examined. If the response is the expected one then this test has passed. A core has a number of such tests to pass, each checking for diﬀerent modelled faults that can arise.

To easily set the starting state the core is equipped with scan chains, shift registers connected to the inner parts of the core. The scan chains are ﬁrst ﬁlled with a test vector by shifting it in, then the system clock is applied and the response is captured into the scan chains. The response is shifted out and compared with the expected response. The data bus used to transfer the test data, called Test Access Mechanism (TAM) is dedicated

(15)

Processor Memory Core A Core B 6 ? Wrapper Wrapper Scan Chain 1 Scan Chain 2 . . . Scan Chain n Scan Chain 1 Scan Chain 2 . . . Scan Chain n ? TAM ATE - -

Figure 1.2: Example SOC with TAM, wrappers and scan chains

to testing only. Often the TAM is of a width diﬀerent from the number of scan chains. To handle the interface between scan chains and the TAM every core is surrounded by a wrapper, applying the incoming bits to the right scan chain. As mentioned a number of test vectors is to be applied when testing a core, together these test vectors constitutes the test data, sometimes referred to as a test cube. The example SOC has n scan chains per core and the TAM is four bits wide. (Figure 1.2)

Where do the test vectors come from? Together with the speciﬁcation of the core an automatic test pattern generator (ATPG) can produce the test sets and responses. If the core is constructed as a black box where the buyers have no information of its internals the vendor of the core will deliver test sets and the corresponding responses. The test vectors are then usually stored in an automatic test equipment (ATE) which is connected to the SOC when testing it, sending over the vectors one by one.

(16)

1.3. Examine the Response

1.2.1 Don’t Care Bits (X’s)

Each test vector is designed to test the SOC for one or more modelled faults. Every such fault deals only with some of the input bits, thus leaving other bits that can be either 0 or 1. These are called don’t care bits and are represented with X’s in the test and response vectors. For the test vectors used in this report the number of don’t care bits can be as much as 95% of the total number of bits [2]. A good compression algorithm should maximize the compression ratio by assigning the don’t care bits to either 0 or 1 carefully.

1.3 Examine the Response

There are mainly two alternatives to examine the response. The ﬁrst one is to compare every bit of the response with the expected, modelled response. This approach will detect all possible errors and can also be used to ter-minate a test as soon as the ﬁrst error is detected, so called abort-on-fail. This way less time is used testing faulty SOCs.

The second approach is to compress the response before it is compared with an equally compressed expected response. The response can be com-pressed without keeping all the information as long as the probability of accepting a faulty SOC is low. One straightforward compressing algorithm would be to count the sum of all the 1’s in the responses, if the sum is diﬀerent from the expected one the SOC is faulty. If several faults occur there is a possibility that the sum ends up to the correct value and the SOC wrongly passes the test. Today the most commonly used approach is to place a multi-input signature register (MISR) at the outputs of each core.

1.3.1 MISR

A multi-input signature register (MISR) is a small circuit designed to create a signature of the data sent to its inputs. When all the tests are completed the signature is compared with the desired signature, if equal the MISR will signal that the tests are passed, otherwise fail is signalled. The desired signature is small enough to be stored inside the MISR itself. In Figure 1.3

(17)

D D D1 3 2 Inputs reg 1 reg 2 reg 3 Time Inputs

D1 D2 D3 reg1 reg2 reg3

0 0 1 1 0 0 0 1 1 1 0 0 1 1 2 0 0 0 0 0 1 3 1 0 1 1 1 0 4 1 1 0 1 1 0 5 (end) 1 0 1 Signature Figure 1.3: Example MISR and signature calculations

an example MISR is shown together with a signature calculated from some example inputs. The⊕-symbols represent modulo-2 adders, an odd number of 1’s to its inputs will set output to 1, even number of 1’s will set output to 0. As seen in the example MISR the output from register 3 is connected to the modulo-2 adders in front of register 1 and 2. Which modulo-2 adders that will be connected to the output of the last register can be changed to give the MISR other characteristic. Diﬀerently connected MISRs produces signatures of diﬀererent quality. [9]

Due to its cyclical behaviour a MISR distributes faults evenly over all its registers. This way multiple faults are less probably to produce the correct signature. It can been shown that the probability for erroneous inputs to generate the correct signature is nearly 2−n where n is the number of registers in the MISR. Figure 1.4 shows where the MISRs are added to the SOC. [9]

(18)

1.3. Examine the Response Processor Memory Core A Core B 6 ? Wrapper Wrapper MISR MISR Scan Chain 1 -Scan Chain 2 -. . . . . . Scan Chain n -Scan Chain 1 -Scan Chain 2 -. . . . . . Scan Chain n -? TAM ATE - -

(19)

Chapter 2

The Problem

2.1 High Test Data Volume

With rapidly increasing complexity in the SOCs the test data increases just as fast. This brings two problems, the ATE needs more memory to store test data and the tests takes longer time to perform. Especially the longer test times, that is a huge bottleneck in the production of SOCs, increases the production cost.

2.1.1 Solution

What can be done to reduce the size of the test data? One popular approach is the use of compression techniques. The test data for a particular SOC is compressed and stored in the ATE. This requires less memory than the original data, giving us a solution to the ﬁrst problem. When testing a SOC the compressed data is sent to the SOC where a decompressor restores the original data. The decompressor is usually some extra circuitry added to the SOC. The decompressed, original data is then sent to the CUT as if the ATE did send the original data directly.

There is still the same amount of data to be applied to each core even if it were compressed when sent to the SOC. How can the second problem with long test time be solved? Luckily the technique described above will

(20)

2.1. High Test Data Volume Processor Memory I/O _{Core A} Core B 6 ? 6 ? Wrapper Wrapper M I S R M I S R Scan Chain 1 -Scan Chain 2 -. . . . . . Scan Chain n -Scan Chain 1 -Scan Chain 2 -. . . . . . Scan Chain n -ATE ? 6 ? TAM - - -

Figure 2.1: Example SOC with processor connected to TAM

help also in this matter. ATEs are usually built with slower electronics than SOCs and a SOC will have to operate at very slow speed during test. When an ATE sends compressed data only the parts of the SOC receiving this data needs to operate at the same clock speed as the ATE. The decompressor and also the rest of the SOC can operate at higher clock speed applying test vectors in less time.

2.1.2 Using an Embedded Processor

Many of the SOCs of today have embedded processors to solve calculations speciﬁc to the operation of the SOC. Is it possible to use the embedded processor to decompress a compact version of the test data? This question was the starting point for this thesis. The idea is somewhat like Figure 2.1. The ATE will send precomputed, compressed test vectors to the SOC. The embedded processor then restores the original test vectors using a

(21)

decompression algorithm and applies them to the cores.

It turned out this approach had already been tested with good results, but there exists more compression algorithms that hasn’t been tested yet.

2.1.3 What is Given

Figure 2.1 shows the layout for the example SOC that is to be tested. The following requirements are fulﬁlled for this SOC:

• The ATE is capable of using the I/O-module to send data to the right

place in memory. Not only can it send the compressed data but also the decompression program can be transferred and executed.

• The memory is of suﬃcient size to hold the decompression program,

a buﬀer for the incoming data and also one copy of the longest test vector.

• There exists controlling circuitry which will synchronize the data ﬂow

from the ATE and also send enable signals to the right parts of the system.

• Test vectors are available and come, one for each core, in the following

format:

2 25

000000XXXXXXX101XXXXXXXX0 0000000011111XXX111100XXX

The ﬁrst two rows speciﬁes how many vectors there are and how long each vector is. Don’t care bits are represented with X’s.

What is left to be done is the compressing algorithm and the decom-pression program. The compressed data will only deal with the vectors. The two ﬁrst controlling rows may be transferred as they are, telling the processor how many vectors to decompress and how many bits each of them are. The output from the compression program will be a stream of

(22)

2.1. High Test Data Volume

bits which uncompressed will yield the same vectors as in the original data with one exception; each X is replaced with either 0 or 1. Since each vector is a stand-alone test, the vectors produced from the compressed data may be in diﬀerent order than the original vectors. What matters is that the response vectors needs to be reordered in the exact same way.

This report presents a technique that compresses the test data above into this:

2 25

1111001011011010 01111011010110

The two vectors are represented by 30 bits instead of 50 bits in the original data.

(23)

Chapter 3

Related Work

This chapter discusses some of the diﬀerent solutions to the problem of reducing test data volume. Both decompression techniques using hardware and software are represented.

3.1 Decompressing Using on-chip Circuitry

As long as the decompression scheme is not to diﬃcult decompression can be made in hardware using additional circuitry inside the SOC. The main advantage is that these techniques can be used in any SOC without the requirement of an embedded processor and/or memory. The cost is the area overhead inside the SOC to ﬁt the decompressing circuitry.

Frequency-Directed Run-Length (FDR) code (described in Section 4.2.3) is used by Chandra and Chakrabarty [3]. The report shows that FDR code outperforms other compressing schemes when dealing with a special case such as test vector compressing. In the report Chandra also applies the technique on diﬀerence vectors where every vector only represents the dif-ference with the previous one. This way longer runs of 0’s is achieved and better compression. The result is also compared with more complex method like gzip and compress, two Unix utilities for compressing data ﬁles.

(24)

3.2. Built-In Self-Test (BIST)

Gonciari and Al-Hashimi [5] propose a Huﬀman-coding algorithm using patterns of variable lengths. The method aims to solve three problems to SOC testing, on-chip area overhead, high test data volume and test application time.

3.2 Built-In Self-Test (BIST)

A BIST technique is only applicable when the interior of a module is known. The idea is to create the test vectors somewhat randomly and see which modelled faults these random vectors covers. It is important that the ran-domizing algorithm produces exactly the same vectors each time. Such algorithm is called pseudo-random generator. Those faults not covered by the random vectors are tested with ordinary, deterministic test vectors.

Hwang and Abraham [6] suggest a BIST technique where each pseudo random pattern is shifted cyclical to cover more simple faults. To avoid testing the circuit with a high number of unnecessary vectors the distance to the next good vector is sent for each test. For the deterministic part of the method they encode the diﬀerence for each deterministic vector to one of the random ones. The probability that there exist one similar random vector is high.

3.3 Decompressing Using Processor

A few other methods where an embedded processor is used for decompres-sion already exists. Compressed data is sent to the memory. A decom-pression program, running on the embedded processor decompresses the vectors and applies them to the CUT.

Jas and Touba [7] present an approach where only the diﬀerence from the previous vector is sent. The vectors are divided into blocks of a certain length and only blocks with changed bits will be sent. The compressed data consists of a list of blocks. For each block the position must be saved and also one bit to tell if a block is the last one for a vector. The vectors in the test set are reordered to achive less diﬀerence between the vectors.

(25)

data. For a number n the first n2 bits forms a n × n matrix. A set of equations is then solved to find two vectors which, together with a XORing algorithm, can reproduce the original matrix. XORing two or more bits works like this; if an odd number of bits are 1 the result is 1. Otherwise, the result is 0. If the equations can’t be solved the first n bits are sent uncompacted.

3.3.1 Decompression Using Linear Operations

The method proposed in Balakrishnan and Touba [2] where linear opera-tions are used to decompress the test set is presented in more detail. The scheme for testing a SOC with this method is based on word-based XOR operation. The length of the words is usually chosen to be the word-length of the processor, 32 is the most common today. The method works basically like this:

1 All the words from the compressed data are sent to the embedded memory.

2 A pseudo-random number generator inside the SOC creates a number of integers smaller or equal to the number of words in the compressed data.

3 The integers points out words in the compressed data which are XORed together bitwise.

4 The resulting word is sent to the CUT. 5 Unless all tests are done, repeat from step 2.

A pseudo-random number generator gives, what seems, a series of ran-dom numbers but the important thing is that each time it is restarted it will produce exactly the same series. This way it is known which words from the compressed data that will be XORed together to create a certain word in the decompressed data. The compressed data needs to be created in such a way that when decompressed it will correspond to the original data. This is done by creating linear equations using all that is known from above.

(26)

3.3. Decompressing Using Processor Test Vector 1 Test Vector 2 Test Vector 3 Test Vector 4 Test Vector 5 Test Vector 1 Test Vector 2 Test Vector 3 Test Vector 4 Test Vector 5 W2 W3 W4 W5 W6 W7 W8 W1 Compressed Data

Original Test Set

10 XX 0X XX X1 XX 1X 1X X0 XX 1X XX 11 XX X1 0X 01 XX X0 0X W1⊕W5⊕W8 W2⊕W6⊕W7 W3⊕W4⊕W5 W1⊕W2⊕W7 W2⊕W5⊕W6 W3⊕W6⊕W8 W4⊕W1⊕W6 W1⊕W7⊕W8 W2⊕W3⊕W4 W1⊕W3⊕W6 W5⊕W7⊕W2 W8⊕W4⊕W2 W5⊕W4⊕W8 W2⊕W1⊕W4 W7⊕W6⊕W3 W4⊕W5⊕W6 W8⊕W2⊕W6 W1⊕W5⊕W7 W2⊕W4⊕W6 W6⊕W7⊕W8

(27)

The method is illustrated with an example where the situation is like in Figure 3.1. W 1-W 8 refers to words, usually of length 32. To reduce the size of this example the word-length is set to 2.

In this case the pseudo-random generator produces the series 1-5-8-2-6-7-3-4-5-1-2-7-2-5-6-3-6-8..., three by three these the words correspoding to these numbers are XORed together inside the box in the middle of Figure 3.1. Setting the XORed expressions equal to the original data, found at the bottom of Figure 3.1, will give us the following equations.

W1⊕W5⊕W8 = 10 W5⊕W7⊕W2 = 1X W2⊕W6⊕W7 = XX W8⊕W4⊕W2 = XX W3⊕W4⊕W5 = 0X W5⊕W4⊕W8 = 11 W1⊕W2⊕W7 = XX W2⊕W1⊕W4 = XX W2⊕W5⊕W6 = X1 W7⊕W6⊕W3 = X1 W3⊕W6⊕W8 = XX W4⊕W5⊕W6 = 0X W4⊕W1⊕W6 = 1X W8⊕W2⊕W6 = 01 W1⊕W7⊕W8 = 1X W1⊕W5⊕W7 = XX W2⊕W3⊕W4 = X0 W2⊕W4⊕W6 = X0 W1⊕W3⊕W6 = XX W6⊕W7⊕W8 = 0X

All equations are then divided to handle one bit each, those where the right-hand side is X can be removed, whatever the bits of the left-hand side are they will always satisfy a don’t care bit. W1(1) refers to the ﬁrst bit of W1 and W1(2) to the second. This gives the followign equations.

W1(1)⊕W5(1)⊕W8(1) = 1 W1(2)⊕W5(2)⊕W8(2) = 0 W3(1)⊕W4(1)⊕W5(1) = 0 W2(2)⊕W5(2)⊕W6(2) = 1 W4(1)⊕W1(1)⊕W6(1) = 1 W1(1)⊕W7(1)⊕W8(1) = 1 W2(2)⊕W3(2)⊕W4(2) = 0 W5(1)⊕W7(1)⊕W2(1) = 1 W5(1)⊕W4(1)⊕W8(1) = 1 W5(2)⊕W4(2)⊕W8(2) = 1 W7(2)⊕W6(2)⊕W3(2) = 1 W4(1)⊕W5(1)⊕W6(1) = 0 W8(1)⊕W2(1)⊕W6(1) = 0 W8(2)⊕W2(2)⊕W6(2) = 1 W2(2)⊕W4(2)⊕W6(2) = 0 W6(1)⊕W7(1)⊕W8(1) = 0

Solving this system of equations is the major task in this method. Bal-akrishnan and Touba show that every such system of equations can be made solvable by increasing the size of the compressed data. This small example has one solution in the following values of the compressed data:

(28)

3.3. Decompressing Using Processor W1 = 00 W2 = 11 W3 = 10 W4 = 01 W5 = 10 W6 = 10 W7 = 11 W8 = 00

With this method we have compressed the test set from 40 bits (the original data at the bottom of Figure 3.1) to 16 bits (8 words of 2 bits each). 16 also happen to be the number of speciﬁed bits, stot, in the original test set. Most often this method only needs a few more bits than

stot to get solvable equations [2].

The major disadvantage with this method is the requirement of available memory. For every word it decompresses the method needs to look up words from diﬀerent parts of the compressed data, hence all of the compressed data needs to be sent to the systems memory before decompression can take place. There can also be a problem when solving enormous system of equations as they can be too large to solve in a reasonable amount of time. If these factors become an issue, then the test set can simply be partitioned and each partition processed one at a time [2]. Partitioning the test set will reduce the overall compression slightly (the larger the partitions, the better the overall compression) [2].

(29)

Chapter 4

Design and

Implementation

This chapter begins with a description of the facsimile standard. The method is then designed through a number of stages, each adding new features. An algorithm for decompressing the vectors constructed in as-sembler using an emulator for 8086 processor is also presented.

4.1 Facsimile Standard

The facsimile coding standard used in this report is the ITU-T Group 3 standard. The idea behind this facsimile coding is that many lines of a printed paper is similar to the line just above. Every dot on the paper is coded to be either white or black, also known as Bi-Level images. The sender compares the next runs of equally colored dots with the dots right above on the previous line. If they are somewhat similar special codewords are sent to the receiver. The receiver, who already got the previous line, can calculate the length of the runs. The facsimile standard in more details follows below, as described by Sayood [8].

In the recommendations for Group 3 facsimile the code is divided into two schemes. The ﬁrst is a one-dimensional scheme in which the data is

(30)

4.1. Facsimile Standard

Figure 4.1: Two rows of an Image. The transition pixels are marked.

coded independently of any other data. The other is two-dimensional where special codewords are sent using the line-to-line correlations.

4.1.1 One-Dimensional

The one-dimensional coding scheme is a run-length coding scheme in which the next block of data is represented as a series of alternating white runs and black runs. If this scheme is used at the beginning of a line, the first run is always a white run. If the first pixel is a black pixel, then a white run of length zero is sent first.

The run-length code used is a Huffman code, a way of choosing the best fitted codeword for each situation based on how frequently a situation occurs. Each line of a A4-size document is representated by 1728 pixels. Creating 1728 different Huffman codes are not very suitable, instead the code is divided into two parts, m and t and a run of length ri is expressed as

ri = 64× m + t for t = 0, 1, . . . , 63; and m = 1, 2, . . . , 27.

The codes fort are called the terminating codes and the codes for m are called the make-up codes. Black and white run length also have separate codes. Ifri < 63, only a terminating code is used. Otherwise, both a make-up code and a terminating code are used. This coding scheme is generally referred to as a Modiﬁed Huﬀman (MH) scheme.

4.1.2 Two-Dimensional

In the two-dimensional scheme, the key is the transition pixels. A transition pixel is a pixel of diﬀerent color than the pixel to the left of it. In Figure 4.1 the transition pixels are marked with dots. Even the leftmost pixel on a

(31)

row can be a transition pixel. One can think of each row extended with an imaginary white pixel to the left of the row, if the ﬁrst pixel is black it is also a transition pixel. In most documents a row is very similar to its neighbours and the transition pixels will be close to each other. The idea is to encode the position of a transition pixel in relation to a transition pixel on the previous line. This is a modiﬁcation of a coding scheme called

Rela-tive Element Address Designate (READ) code and is often called Modiﬁed READ (MR).

Some deﬁnitions are needed to explain the coding scheme:

a0: The last pixel of the row currently being encoded. The position and

color is known to both encoder and decoder. At the beginning of each line,a0 refers to the imaginary white pixel to the left of the ﬁrst

actual pixel. Often this pixel is a transition pixel but not always.

a1: The ﬁrst transition pixel on the same row and to the right of a0. The

location of this pixel is known only to the encoder.

a2: The second transition pixel on the same row and to the right of a0.

As with a1 its location is known only to the encoder.

b1: The ﬁrst transition pixel with the opposite color of a0 on the line

above and to the right of a0. As the line above is known to both

encoder and decoder, as is the value of a0, the location of b1 is also

known to both encoder and decoder.

b2: The second transition pixel on the line above and more than one pixel

to the right of a0. Also known to both encode and decoder.

For the implementation of the facsimile standard used in this report b1

and b2 may be placed to the right of the entire row. If only b2 is to the

right it is placed one pixel to the right. If both are outside, b1 is placed

one pixel and b2 is placed two pixels to the right. This is slightly diﬀerent

from Sayood [8] where an additional codeword is mentioned, representing the situation where all the remaining pixels of a row is equally colored.

In Figure 4.2 the example rows are labelled. In this situation the second row is the one currently being encoded and the encoder has encoded the pixels up to the second pixel (marked with a0). The pixel assignments

(32)

4.1. Facsimile Standard

0

a

₁

a

₂

b

₁

b

₂

Figure 4.2: The transition pixels are labelled.

for a slightly diﬀerent arrangement of black and white pixels are shown in Figure 4.3.

If a1 is to the right of b2, we call the coding mode used the pass mode.

This mode is coded with 0001. When the decoder receives this code it knows that all the pixels from the last one decoded to the pixel straight below b2 has the same color. For the next round this pixel below b2 is the

last pixel known to both encoder and decoder. This is the only time where the last known pixel is not a transition pixel.

Ifa1 is to the left of or straight below b2 one of two things can happen.

The vertical mode is used if the number of pixels from a1 to right under b1

is less than or equal to three. Seven diﬀerent codes tell the location of a1

in relation to b1. These are:

1: a1 is straight below b1.

011: a1 is to the right ofb1 by one pixel.

000011: a1 is to the right ofb1 by two pixels.

0000011: a1 is to the right ofb1 by three pixels.

010: a1 is to the left of b1 by one pixel.

000010: a1 is to the left of b1 by two pixels.

(33)

2

b

a

₀ 1 1

a

b

₂

a

Figure 4.3: Two slightly diﬀerent rows with transition pixels labelled.

After the decoder has received and decoded one of these codes the pixel at a1 is the last one known to both encoder and decoder and the coding

process is continued.

In the case where a1 is to the left of or straight below b2 and the

distance tob1is greater than three the one-dimensional technique described

in Section 4.1.1 is used. To inform the decoder about this mode the code 001 is sent followed by two sets of Modified Huffman codewords. The first run-length is of the same color as the last decoded pixel and the second of the opposite. This is in fact the runs from a0 to a1 and from a1 to a2. The

decoder then adds one pixel with the same color as the ﬁrst run and this is the last known pixel for the next round.

4.2 Compressing Test Vectors

4.2.1 Plain Facsimile, No Reorder

This first solution uses plain facsimile code to compress the vectors in the order given in the test cube. Later we will see that reordering the vectors improves the compression ratio. The first line to be coded also needs one previous vector. The algorithm uses a imaginary first vector containing only 0’s which forces the first vector to be coded with run-length codes only.

(34)

4.2. Compressing Test Vectors

A ﬁrst look at the test data clariﬁes that the X’s (don’t care bits) needs to be assigned 0 or 1 carefully. When the algorithm comes across don’t care bits it tries to set a1 afterb2 (see Section 4.1). If it fails it will try to place

a1 as close to b1 as possible. If a1 can not be placed as close to b1 as three

steps away the horizontal mode is used sending run-length codes. In the facsimile standard the run-length codes are compressed. This compression technique is based on the length of one row of pixels for a paper copy, which is ﬁxed. This is not applicable for test vectors with diﬀerent lengths. Instead of creating new compression techniques for each circuit the run-length case is not compressed at all. In Section 4.2.3 a better solution is presented.

4.2.2 Greedy Sort

Since each test vector is a separate test it does not matter in which order the test vectors are applied as long as all of them are applied. A reorder of the vectors is done to achieve better compression. A test data set with

n vectors can be reordered in n! ways. With conventional computers it is

impossible to test all n! combinations unless n is very small, a heuristic is necessary.

Even with a heuristic reordering the test set is a diﬃcult problem be-cause when one test vector is moved inside the test cube it will aﬀect the facsimile code for many other vectors. To start with the vector that is moved needs to get all its don’t care bits reassigned to achieve better com-pression. Then it will be coded in relation to its new previous vector. This vector will also force the next vector to be recalculated in the same way and this will propagate downwards. Only when a vector happens to be assigned the don’t care bits in the same way as before this chain reaction can be broken. Otherwise all following vectors needs to be recalculated.

The greedy sort heuristic starts with the ﬁrst imaginary vector of 0’s and compresses every vector in the test cube with this as the previous vector. The vector with the shortest facsimile code is chosen and acts as the previous vector in the next round. This way the algorithm chooses the next vector that extends the compressed data the least until all vectors are included. The biggest disadvantage is that the last vectors are not very well suited to be compressed in relation to each other.

(35)

4.2.3 Frequency-Directed Run-Length (FDR)

As mentioned in Section 4.2.1 the run-length code used in the facsimile standard is not very suitable for test vector compression. Chandra and Chakrabarty [3] show that FDR codes are easy to decompress and com-presses test data very good. Its ﬁnest characteristic is the ability to code runs of any length.

The FDR code is constructed to give short codewords for short runs and works like this: A codeword consist of two parts, the group prefix and a tail. The group prefix tells which group of run-lengths the codeword belongs to. The first group, A1, has a single 0 as its group prefix, group A2 has 10 as

preﬁx and A3 has 110. This way every next group gets one more leading 1.

Given a complete FDR code the group is determined by seeking the ﬁrst occurrence of the bit 0. If this is found on the k th position the group is

Ak.

The next part is the tail that points out one of the run-lengths in the group. It consists of the same number of bits as the group preﬁx, one for group A1, two for A2 and so on. With k bits available the group Ak will include 2k diﬀerent run-lengths, 0 and 1 for group A1, 2-5 for group

A2, etc. The ﬁrst 14 run-lengths are shown in Table 4.1. The right-most

column shows the codeword (the preﬁx and tail concatenated) used for each run-length. The FDR code has the following properties:

• It is easy to extract the preﬁx and the tail. The preﬁx is all bits from

the beginning including the ﬁrst 0. The tail is of equal length as the preﬁx.

• For any codeword the sum of the binary representation of the preﬁx

and the tail equals the run-length that is coded.

• Short run-lengths are coded with shorter codewords.

This next modiﬁcation uses FDR where the original run-length code should have been used.

(36)

Group Run-length Group prefix Tail Codeword

0 0 0 00 A1 1 1 01 2 00 1000 A2 3 10 01 1001 4 10 1010 5 11 1011 6 000 110000 7 001 110001 8 010 110010 A3 9 110 011 110011 10 100 110100 11 101 110101 12 110 110110 13 111 110111 · · · · · · · · · · · · · · ·

Table 4.1: The ﬁrst 14 run-lengths and their codewords

4.2.4 Modifying Facsimile Codewords

The choice of codewords in the facsimile standard is based on character-istics of paper copies. In this next modiﬁcation to the method statcharacter-istics were gathered of how many times each codeword were used in the com-pressed data. The ordering algorithm described in Section 4.2.2 will beneﬁt from using the short codewords, hence all the codewords needs to be made equally long, otherwise the shorter ones would be used more often than longer ones only because they are shorter. The statistics are the sum from all six circuits used in the experiments in Chapter 5.

Statistics shows that four of the codewords are rarely used. They cor-respond to the cases where a1 is placed two or three bits to the left or right

of b1. One by one these codewords were removed from the method. For all

of them the removal reduced the size of the compressed set. The remaining codewords can be changed further to enhance the compression even more. The new codewords can be found in the last column of Table 4.2. Not only

(37)

Situation Org. code Times used New codeword

run-length 001 7250 11

a1 > b2 0001 6990 10

a1 right under b1 1 12403 01

a1 one right of b1 011 2386 001

a1 two right of b1 000011 546 not used a1 three right of b1 0000011 253 not used

a1 one left of b1 010 2349 000

a1 two left of b1 000010 1120 not used a1 three left of b1 0000010 877 not used

Table 4.2: Statistics for codewords

do these changes reduce the size of the compressed set, it also makes the decompressing algorithm simpler and faster.

50 bits has become 30!

4.2.5 Local Search

As mentioned in Section 4.2.2 ordering the vectors is difficult. Local search is a heuristic, a looping algorithm working like this; the algorithm starts with a given starting solution, in this case a test set that has a specific order. Given this starting solution the facsimile coding algorithm compresses the data, rendering the size of the compressed data. The size of this compressed data is what the heuristic tries to minimize. For each loop the heuristic will try a set of different orderings and calculate the size of the compressed data. The one change that gives the best solution and is better than the one given is taken as the starting point for the next run in the loop. The set of orderings that are tested is determined by a rule. In every loop the algorithm will check all the solutions that can be reached with the rule, called the surroundings, to find a better one. Usually it is a good idea to keep the surroundings very small, hence the name ’local search’. Examples of suitable rules defining the surroundings could be:

(38)

• Switch place for two adjacent vectors • Switch place for two arbitrary vectors

The heuristic was added to the modiﬁcations described earlier. The surroundings were chosen to the last one in the list above. As we will see the result is not much better than greedy sort. Therefore this modiﬁcation is not part of the proposed method. Because of the long execution time of the heuristic, even for these small example cores, it is not suitable.

4.2.6 The Complete Proposed Method

The modiﬁcation mentioned above bring us to one complete algorithm for test vector compression. The local search heuristic is not part of the pro-posed method.

To encode a test vector the algorithm uses the previous vector and set the don’t care bits to get the best position of a1, preferably after b2

otherwise as close to b1 as possible. The diﬀerent cases are encoded with

the following codes:

• 10: a1 is to the right of b2

• 01: a1 is right under b1

• 001: a1 is placed one to the right of b1

• 000: a1 is placed one to the left of b1

If none of the above is applicable the code 11 is used and thereafter two sets of FDR codes. There is dependence to the bit at a0. The ﬁrst FDR

codeword gives the run-length with the same value as at a0 and the

sec-ond gives the run-length for the opposite bit. This code also includes one ﬁnal bit with the value at a0. For example 000001110 will be encoded

as 1110111001 if the preceding bit is 0. 11(FDR-code)+1011(runlength 5)+1001(runlength 3).

As described in Section 4.2.2 the vectors are sorted. Figure 4.4 shows pseudo-code that illustrates the process.

(39)

void GreedySort(testCube) { int lengthOfVectors;

string previous = 000...; //length = lengthOfVectors string tempfax;

int shortest;

for each vector in testCube { shortest = FindShortestCode(); MarkAsCoded(shortest);

tempfax = EncodeVector(shortest, previous); previous = DecodeVector(tempfax, previous); Write(tempfax);

} }

(40)

4.2.7 Example

The small test set from Section 2.1.3 is here encoded with the algorithm desribed above.

Vector1 - 000000XXXXXXX101XXXXXXXX0 Vector2 - 0000000011111XXX111100XXX

The two vectors are ﬁrst encoded with the imaginary ﬁrst vector of 0’s as the previous vector forcing them to be coded with run-length only. The vectors get the following compressed codes (decompressed data is shown under): Vector1 : decompressed: run−length 11 110111 0000000000000 01 1 0 | run−length 11 ₁₁₀₀₁₁00 111111111 0 Vector2 : decompressed: run−length 11 110010 00000000 110110 111111111111 0 | a1>b2 10 00000

Since vector2 is encoded with a shorter code it is included ﬁrst in the compressed data. Vector1 is then encoded with a decompressed vector2 as previous vector.

0000000011111111111100000 //Vector2 decompressed 000000XXXXXXX101XXXXXXXX0 //Vector1 with don’t-cares

Vector1 : decompressed: a₁=b₁ 01 000000001 | run−length 11 1011 11111 01 0 1 | a₁=b₁ 01 11110 | a₁>b₂ 10 00000 Compressed data Vector2 - 1111001011011010 Vector1 - 01111011010110

(41)

4.3 Decompression

When decompressing the compressed data the algorithm needs to know how long the vectors are and it also requires access to the previous vector. It is sufficient to treat the previous vector as an input stream since it only will be read in a sequence from the beginning. The decompression algorithm will consume each codeword in the compressed data and output the original data with the help of the previous vector. For any codeword to be decompressed the current bit denotes the value of the last bit that was decompressed. At the beginning of a new vector the current bit is set to 0. With different codewords different actions are taken:

10: a1 is placed after b2, keep producing bits with the same value as

current bit until b2 is reached, i.e. when the value in the previous

vector input stream has changed two times. Current bit keeps the same value.

01: a1 right under b1, produce bits with the same value as current bit

until there is a change in the previous vector. Add one bit with the opposite value and change current bit.

001: a1 one bit right of b1, same as with 01 except produce one extra bit

before the last opposite bit.

000: a1 one bit left of b1, same as with 01 except produce one bit less

before the last opposite bit.

11: FDR run-length code. The decompression algorithm should consume two sets of FDR codes. The ﬁrst set tells how many bits with the value of current bit that will be produced, the second tells how many bits of the opposite value. Finally one bit with the value of current bit is produced. Current bit keeps the same value.

After each codeword is taken care of, the algorithm should see if all the bits of one single vector is produced, otherwise continue with the next codeword. In each turn the algorithm should consume the same amount of bits from the previous vector as it produced itself. Codeword 10 is a bit special since b2 can be placed to the right of all bits in the vector.

(42)

4.4. Decompression in Assembler

When decompressing a 10 codeword the algorithm should stop producing bits when the right side is reached.

4.4 Decompression in Assembler

The decompression algorithm is easily implemented using a high level lan-guage such as C or java. But can it run on a simple processor with a small amount of memory? Since none of the tested compiler together with a disassemlber could generate a small program, an implementation of a facsimile decoder were made in assembler directly. For this the Emu8086 emulator were used to test the code. Without any SOC speciﬁc program-ming the size of the assembler code is 88 instructions. This is similar in size to implementation for other methods. The full code can be seen in Appendix A.

Instead of sending the output to the screen a real implementation would send the output to the CUT. Also there would be some I/O instructions to read the input stream and the previous vector.

4.5 Including Response Vectors

In most testing applications the response from a core is inserted into a MISR (multi-input signature register). The MISR only reports a signature of all its inputs at the end of the test. If the signature doesn’t match the expected one the chip is faulty and will be destroyed. An alternative is to compare every bit of each response with the expected one. The main advantage is that a test can be stopped as soon as a fault is discovered (abort-on-fail). There is also a risk that a response with multiple faults still generates the correct signature in a MISR. Usually the response is sent back to the ATE where the comparison is made. This transfer is done without any compression technique and that is why the MISR has become so popular, it decreases the test application time a lot.

A new approach to response examination is presented here. The idea is to send the responses in compressed form and let the embedded processor do the comparison with the actual response. The compression is done using

(43)

the same facsimile technique as in previous sections and the test vectors can simply be extended to include the responses.

The responses are very similar to the test vectors as they consists of many don’t care bits, but we need to be careful. A don’t care bit in the response can not only be chosen to 0 or 1. It must match the expected response when the corresponding test vector is used, a test vector with a lot of don’t care bits assigned to either 0 or 1. This can only be done with an ATPG which would need to be incorporated in the proposed method. This approach would probably make response vectors that do not ﬁt to be compressed with the facsimile method.

A better solution is to send the response vectors with don’t care bits left untouched. With a don’t care bit in the expected response the comparison program then should accept any bit in the actual response. Additional data needs to be sent to represent the don’t care bits. Four ways of coding the response vector is presented here. For all four methods an example is shown with the test vector X1XXXXX001X and the response vector XXXXX10XX0X. The complete vector is encoded to the representation that is sent to the facsimile coding algorithm.

4.5.1 Using Mask

The don’t care bits are chosen freely to 0 or 1 in the same way as in the test vector. To determine which ones are don’t care bits a mask is added at the end of each vector. A 0 in the mask indicates that the corresponding bit in the response vector is don’t care, a 1 indicates that the bit is speciﬁed and should be compared with the bit in the actual response.

Orig. test X1XXXXX001X Orig. response XXXXX10XX0X T he mask 00000110010

4.5.2 Two Bits Each

If each bit in the response vector is coded with two bits each there would be no need of a mask. Actually only two of the four diﬀerent combinations of two bits are used by the bit 1 and 0 leaving two combinations to code

(44)

4.5. Including Response Vectors

the don’t care bit. One solution would be to code 1 as 11, 0 as 10 and X as either 00 or 01. To code an X as either 00 or 01 we simple code it 0X.

Orig. test X1XXXXX001X Response 0X X 0X X 0X X 0X X 0X X 11 1 10 0 0X X 0X X 10 0 0X X You may ask why the X is coded as 0X and not simply a single 0. The reason is that the vectors would become diﬀerent in length and the facsimile coding algorithm requires a previous vector of the same length.

4.5.3 Merged Test and Response Vector

When running the test application there is a matter of timing not previously discussed. In Section 1.2 it is written that a test vector is shifted into the core, the clock is applied and the response is shifted out. However, at the same time as the response is shifted out it is possible to shift in the next test vector. This is called pipelining and saves a great amount of time. In order to use pipelining the application should compare the response vector with the expected one, at the same time as it shifts in the next vector.

The last two approaches merge the response vector with the next test vector. For each bit that is shifted into the core the decompression program also will decompress one bit from the response, compare it with the actual response shifted out and continue only if they match (or the decompressed bit is don’t care). The ﬁrst approach uses a mask that is placed at the beginning of the vector. The mask has to be decompressed and saved before comparison can take place. The second uses two bits for each bit in the response vector in the same way as the method above. In both these methods a last empty test vector needs to be added to include the last response and mask. In the examples below the response vector is the same as before but here it refers to the response vector of the preceeding vector. A t denotes a bit from the test vector. An r denotes a bit from the response. T he M ask 00000110010 M erged vector X t Xr 1tXr Xt Xr Xt Xr Xt Xr Xt 1rXt 0r0tXr 0tXr 1t0rXt Xr

(45)

M erged vector

X

(46)

(47)

Chapter 5

Experimental Results

With a set of experiments this chapter will show the efficiency of the pro-posed method. The experiments are made on real test data since real test data has special properties that will affect the results. Results from the different stages show which modifiation is the most valuable one and the results are also compared with results from other methods. During all test including only test vectors some of the ISCAS’89 circuit’s test vectors were used. For the algorithms that also include the response vectors test data for the circuit D695 were used. These ciruits are small circuits released in public for development purposes.

5.1 Compressing Test Vectors Only

The compression algorithm was implemented and tested with Java on a SunBlade100 (500 MHz). In Table 5.1 the result for the diﬀerent stages are shown. The second column shows the size of the uncompressed set, TD. For every stage both the compressed number of bits are shown and the percentage compression. The percentage data compression was computed as:

(48)

5.1. Compressing Test Vectors Only

Circuit No Reorder Greedy Sort With FDR Mod. Codew.

Size (Prop. Scheme)

of TD (bits) Comp. bits % Comp. Comp. bits % Comp. Comp. bits % Comp. Comp. bits % Comp. s13207 178500 22903 87.17 17028 90.46 14648 91.79 14356 91.96 s15850 90428 29040 67.89 22791 74.80 17426 80.73 16816 81.40 s38417 174720 66922 61.70 58542 66.49 42762 75.53 41452 76.28 s38584 191784 115329 39.87 107087 44.16 69388 63.82 63789 66.74 s5378 30602 16211 47.03 11619 62.03 10134 66.88 10000 67.32 s9234 36062 23777 34.07 19037 47.21 14890 58.71 14074 60.97

Table 5.1: Compression obtained for diﬀerent stages

Circuit FDR [3] Matrix [1] Linear [2] Prop. Scheme

Comp. bits % Comp. Comp. bits % Comp. Comp. bits % Comp. Comp. bits % Comp. s13207 30880 81.30 33470 79.99 9920 94.44 14356 91.96 s15850 26000 66.22 23552 67.88 11168 87.65 16816 81.40 s38417 93466 43.26 69556 56.00 30432 82.58 41452 76.28 s38584 77812 60.91 66838 65.15 30208 84.25 63789 66.74 s5378 12346 48.02 10390 59.20 5696 81.39 10000 67.32 s9234 22152 43.59 16888 53.49 9280 74.27 14074 60.97

Table 5.2: Comparison with other methods

As seen every modification gives better compression than the previ-ous one for all circuits. The compressions for the last modification are almost similar as without, the compressions with Modified Codewords are only 3-4% better than the ones with FDR. The main advantage with this modification is not its compression ratio but its simplified decompression algorithm due to fewer codewords in use. This will make the decompression program to run faster. In Table 5.2 the proposed method is compared with results taken from reports proposing other techniques. The original data used in different reports are not the same.

(49)

Circuit gzip Prop. Scheme

0-mapped Facs. map

Comp. bits % Comp. Comp. bits % Comp. Comp. bits % Comp. s13207 22600 87.34 23072 87.07 14356 91.96 s15850 22816 74.77 23600 73.90 16816 81.40 s38417 48864 72.03 46792 73.22 41452 76.28 s38584 68040 64.52 66872 65.13 63789 66.74 s5378 14040 54.12 14576 52.37 10000 67.32 s9234 19360 46.31 17920 50.31 14074 60.97

Table 5.3: Comparison with Unix gzip utility

The proposed scheme makes better compression than the methods in Chandra and Chakrabarty [3] and in Balakrishnan and Touba [1] for all circuits. The method in Balakrishnan and Touba [2] is still better, for circuit s38584 the compressed size is less than half the size achieved with the proposed method.

5.1.1 Unix gzip utility

To compare the result with the Unix gzip utility the test data can not simply be sent to the utility, it needs to be changed to get better comparison. All the don’t cares need to be assigned to either 0 or 1 and the vectors may be reordered. Which mapping that fits gzip the best is not known so two different test sets were explored for each circuit. The first test set was not reordered and all don’t cares were assigned a 0. In Table 5.3 column two and three presents the compressed size and ratio for this mapping. The second test set was the one calculated by the proposed scheme with reordered vectors and don’t cares assigned to fit facsimile coding. Results are found in column four and five. For three of the circuits the first one was the best and for the other three the second was the best. Also note that the difference in size between the two mappings is small.

Compressing an ascii text ﬁle similar to the ones usually used in these experiments would give gzip an advantage, all the 0’s are coded with the

(50)

Circuit With FDR Local Search Reduction

Comp. bits % Comp. Comp. bits % Comp. % s13207 14648 91.79 13898 92.21 5.12 s15850 17426 80.73 16541 81.71 5.07 s38417 42762 75.53 40509 76.81 5.27 s38584 69388 63.82 66423 65.37 4.27 s5378 10134 66.88 9694 68.36 4.34 s9234 14890 58.71 14090 60.93 5.37

Table 5.4: Comparison with Local Search

same eight bits which can be used to compress the ﬁle. The same for all the 1’s. Instead the test data ﬁles were transformed into a binary format consisting only of the bits in the test sets. We can see that the proposed method outperforms gzip on all circuits.

5.1.2 Local Search Heuristic

Even though all test sets used in this report are smaller than 200,000 bits the completion of this heuristic took a very long time. Days were needed to ﬁnish the calculations for the largests circuits. The main reason is, as mentioned in Section 4.2.2, that for each change in the test set, big parts of the compressed data needs to be recalculated.

The local search heuristic was implemented before the modified code-words were applied. In Table 5.4 the heuristic is compared not to the pro-posed scheme, but to the scheme without the modified codewords, called ’With FDR’ in Table 5.1. Since the run time for this heuristic is extremely long I did not redo this experiment with the final scheme, still we can see that the gain in compression is small.

5.2 Including Response Vectors

For this part, test vectors to the D695 SOC was used. It consists of ten circuits, some of them used in the previous section but with diﬀerent test

(51)

Circuit Size Test only Size Resp. only (test only) Comp. bits % Comp. (resp. only) Comp. bits % Comp. c6288 448 697 -55.58 448 742 -65.63 c7552 15525 11802 23.98 8100 7521 7.15 s838 5092 2218 56.44 2584 789 69.47 s9234 27417 14209 48.17 27750 12815 53.82 s38584 166896 72103 56.80 166896 72095 56.80 s13207 164500 25204 84.68 185650 28936 84.41 s15850 59267 19832 66.54 66348 23676 64.32 s5378 21400 11005 48.57 22800 10860 52.37 s35932 21156 3502 83.45 24576 3553 85.55 s38417 144768 66428 54.11 151554 71640 52.73

Table 5.5: Compression for D695 test and response vectors separate

sets.

First a compression was made on the test and response vectors sepa-rately in order to compare the compression ratio with the ones including the response. The result can be found in Table 5.5. Column two gives the original size of the test set, column three and four gives the compressed size and the percentage. Column ﬁve through seven shows the same things for the response vectors.

When compressing the test and response vectors separately the com-pression is good for all but the two ﬁrst circuits. The reason is that there are much fewer don’t care bits in their vectors. In fact, the ﬁrst circuit has no don’t care bits at all.

The results from compressing test and response vectors together are shown in Table 5.6. The second column shows the size of each test vector, each response vector and the sum of these two. For each of the four diﬀerent ways of coding the don’t care bits both the size of the compressed data and the compression percentage is shown. The results are not as good as in Table 5.5, mainly because the response data is twice the size in these experiments.

(52)

care bit with two bits. The difference between the two methods using a mask is not very big. For all but one of the circuits, it is better to place the response after the test but only for three of the circuits the difference is somewhat big. As described in Section 4.5 the scheme for a decompression program benefits from when the test and response vectors are in mixed mode.

(53)

Size Response after Mixed

Test Mask Two bits Mask Two bits

Circuit Resp. Comp. % Comp. % Comp. % Comp. %

Total bits Comp. bits Comp. bits Comp. bits Comp.

448 c6288 448 896 1451 -61.94 2062 -130.13 1455 -62.39 1972 -120.09 15525 c7552 8100 23625 24463 -3.55 30975 -31.11 28636 -21.21 39800 -68.47 5092 s838 2584 7676 3979 48.16 10851 -41.36 5346 30.35 12659 -64.92 27417 s9234 27750 55167 40754 26.13 48702 11.72 44163 19.95 58148 -5.40 166896 s38584 166896 333792 255166 23.56 259684 22.20 238922 28.42 266203 20.25 164500 s13207 185650 350150 117236 65.52 125179 64.25 135147 61.40 154607 55.85 59267 s15850 66348 125615 72680 42.14 103476 17.62 83152 33.80 125817 -0.16 21400 s5378 22800 44200 36556 17.29 47278 -6.96 40049 9.39 52423 -18.60 21156 s35932 24576 45732 10434 77.18 56930 -24.49 34405 24.77 85425 -86.79 144768 s38417 151554 296322 265498 10.40 290632 1.92 283693 4.26 340239 -14.82

(54)

(55)

Chapter 6

Discussion

Here some thoughts about compression of test vectors is discussed. What are the beneﬁts and disadvantages of the various techniques? Is there some lower limit for how small a compacted set of test vector can be? (Entropy bounds). Is it possible to use a more complex method like zip and if so, what is gained?

6.1 Proposed Method

When the work was initially started, the facsimile approach was chosen because of three reasons:

• Facsimile code compresses bi-level images (black-white) - the test set

is a square of 1’s and 0’s.

• Facsimile code uses the fact that each line of dots is often very similar

to the one above - the vectors (lines) in the test set can be reordered to minimize the diﬀerence.

• A short facsimile code can produce a very long output if the bits are

the right ones - all the don’t care bits in the test set can be assigned 1 or 0 in order to maximize this.

(56)

6.1. Proposed Method

The first one is not much to comment, the other two are more inter-esting. To illustrate how many different correct test sets we can find for a circuit we can stduy the smallest one of the circuits in this report.

Circuit s5378 has 143 test vectors with 214 bits each, 30602 bits in total. The second reason above is that we can reorder the vectors. This will give us 143! (≈ 3.8 × 10247) diﬀerent test sets to evaluate. The test set also contains 25500 don’t care bits which for every test set has to be set to either 0 or 1, 225500 ≈ 107676. Hence more than 107923 diﬀerent test sets are correct. Which one of these will have the best compression ratio when sent to the compression algorithm? No conventional computers will ever be able to test all combinations even for this small example. Imagine what will happen when circuits with billions of bits in their test sets are used.

The proposed method uses in fact two different heuristics to find a good test set. The first heuristic tries to make a short facsimile code given a pre-vious vector and the next vector including don’t care bits. It is possible to find the very best assignment of don’t care bits and produce the shortest facsimile code. Still it is a heuristic because it does not care about what is coming after. A shorter code may force all following vectors to be en-coded with longer codes than before. The second heuristic is Greedy-Sort, explained in Section 4.2.2.

If these two heuristics could be joined to one or cooperate more a lot would be won. One idea is to also incorporate an ATPG (automatic test pattern generator) in the heuristic. The ATPG creates the test vectors by analyzing the circuit’s specification. Different ATPG creates different sets of vectors which may differ in size. With the ATPG incorporated into the heuristic a bigger test set could be generated if the compressing algorithm would gain much. This may not be possible. It is also not sure if the gain would be high.

6.1.1 Local Search

There were big hopes when the work with the Local Search heuristic was started. In earlier work it has been showed that this heuristic can solve very hard problems and creating better solutions for other problems. These problems do not diﬀer very much from the problem with ordering test vectors, but there is one major diﬀerence. For all other problems, a change

(57)

in the data only forced the algorithm to recalculate a small part of the solution. In this problem a lot of vectors have to be compressed again because every change is propagated downwards.

If a better heuristic for sorting the vectors could be found, further tech-niques like hill-climbing or simulated annealing could be evaluated. When the implementation of Local Search had found the first local minima for all the circuits, it was clear that something had to be changed. The time it took to find the first local minima was way too long and the improvement in compression was very low.

6.1.2 Discarded Techniques

In the stages presented in Design and Results two techniques were left out. Experiments using the difference vectors, Tdif f, when compressing made almost no difference in compressed size. With a difference vector the bits that are changed from the previous vector is marked with a 1, all other bits gets a 0. The idea is that the complete test set will be mostly 0’s and that the facsimile code could make a better work. Probably the poor result is due to the fact that facsimile code already uses the difference between the vectors to compress the data.

The second discarded technique was modifying the codewords inside the FDR run-length code. As seen in Table 4.1 the group preﬁx is 10 and 110 for group A2 and A3 respectively. If the run-lengths in group A3 are more

common than the ones in A2, a gain could be achieved if the preﬁxes were

switched. Statistics from the experiments showed that a small gain could be made for some of the circuits. However, the gain in size does not justify the more complex decompression algorithm. A lookup table would have to be used assigning each group a preﬁx.

6.2 Storing Previous Vector

The facsimile coding approach uses the previous test vector when decom-pressing the next one. The SOC needs to be designed in such a way that it is possible to store a test vector without destroying it. Two solutions are discussed here.

Deterministisk Komprimering/Dekomprimering av Testvektorer med Hjälp av en Inbyggd Processor och Faxkodning

Deterministic Test Vector

Compression/Decompression Using an

Embedded Processor and Facsimile

Coding

Deterministic Test Vector

Compression/Decompression Using an

Embedded Processor and Facsimile Coding

Abstract

Acknowledgements

Abbreviations

Contents

Chapter 1

Introduction

1.1

System-on-a-Chip (SOC)

1.2

Testing

1.2.1

Don’t Care Bits (X’s)

1.3

Examine the Response

1.3.1

MISR

Chapter 2

The Problem

2.1

High Test Data Volume

2.1.1

Solution

2.1.2

Using an Embedded Processor

2.1.3

What is Given

Chapter 3

Related Work

3.1

Decompressing Using on-chip Circuitry

3.2

Built-In Self-Test (BIST)

3.3

Decompressing Using Processor

3.3.1

Decompression Using Linear Operations

Chapter 4

Design and

Implementation

4.1

Facsimile Standard

4.1.1

One-Dimensional

4.1.2

Two-Dimensional

a

a

a

b

b

b

a

a

b

a

4.2

Compressing Test Vectors

4.2.1

Plain Facsimile, No Reorder

4.2.2

Greedy Sort

4.2.3

Frequency-Directed Run-Length (FDR)

4.2.4

Modifying Facsimile Codewords

4.2.5

Local Search

4.2.6

The Complete Proposed Method

4.2.7

Example

4.3