• No results found

An On-Chip Memory for Testing of High-Speed Mixed-Signal Circuits

N/A
N/A
Protected

Academic year: 2021

Share "An On-Chip Memory for Testing of High-Speed Mixed-Signal Circuits"

Copied!
101
0
0

Loading.... (view fulltext now)

Full text

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

AN ON-CHIP MEMORY

FOR TESTING OF HIGH-SPEED

MIXED-SIGNAL CIRCUITS

Master thesis performed in Electronic Devices

by

Omar Jaber Omar

LiTH-ISY-EX--13/4738--SE

Linköping December 2013

TEKNISKA HÖGSKOLAN

LINKÖPINGS UNIVERSITET

Department of Electrical Engineering Linköping University

S-581 83 Linköping, Sweden

Linköpings tekniska högskola Institutionen för systemteknik 581 83 Linköping

(2)
(3)

AN ON-CHIP MEMORY

FOR TESTING OF HIGH-SPEED

MIXED-SIGNAL CIRCUITS

Master thesis

Performed in Electronic Devices

Department of Electrical Engineering

Linköping Institute of Technology

by

Omar Jaber Omar

LiTH-ISY-EX--13/4738--SE

Supervisor: Professor Atila Alvandpour

Examiner: Professor Atila Alvandpour

(4)
(5)

Presentation Date 17-12-2013 _______________________

Publishing Date (Electronic version) 14-01-2014

Department and Division

Department of Electrical Engineering

Type of Publication Licentiate thesis X Degree thesis Thesis C-level Thesis D-level Report

Other (specify below)

Language

X English

Other (specify below)

Number of Pages 101

ISBN (Licentiate thesis)

ISRN: LiTH-ISY-EX--13/4738--SE Title of series (Licentiate thesis) Series number/ISSN (Licentiate thesis)

URL, Electronic Version

http://www.ep.liu.se

Abstract

Mixed-signal processing systems especially data converters can be reliably tested at high frequencies using on-chip testing schemes based on memory. In this thesis, an on-chip testing strategy based on shift registers/memory (2 k bits) has been proposed for digital-to-analog converters (DACs) operating at 5 GHz. The proposed design uses word length of 8 bits in order to test DAC at high speed of 5 GHz. The proposed testing strategy has been designed in standard 65 nm CMOS technology with additional requirement of 1-V supply. This design has been implemented using Cadence IC design environment.

The additional advantage of the proposed testing strategy is that it requires lower number of I/O pins and avoids the large number of high speed I/O pads. It therefore also solves the problem of the bandwidth limitation that is associated with I/O transmission paths. The design of the on-chip tester based on memory contains no analog block and is implemented entirely in digital domain. In the proposed design, low frequency of 1 MHz has been used outside the chip to load the data into the memory during the write mode. During the read mode, the frequency of 625 MHz is used to read the data from the memory. A multiplexing system is used to reuse the stored data during read mode to test the intended functionality and performance. In order to convert the parallel data into serial data at high frequency at the memory output, serializer has been used. By using the frequencies of 1.25 GHz and 2.5 GHz, the serializer speeds up the data from the lower frequency of 625 MHz to the highest frequency of 5 GHz in order to test DAC at 5 GHz.

Publication Title

An On-Chip Memory for Testing of High-Speed Mixed-Signal Circuits

Author

Omar Jaber Omar

(6)
(7)

ABSTRACT

Mixed-signal processing systems especially data converters can be reliably tested at high frequencies using on-chip testing schemes based on memory. In this thesis, an on-chip testing strategy based on shift registers/memory (2 k bits) has been proposed for digital-to-analog converters (DACs) operating at 5 GHz. The proposed design uses word length of 8 bits in order to test DAC at high speed of 5 GHz. The proposed testing strategy has been designed in standard 65 nm CMOS technology with additional requirement of 1-V supply. This design has been implemented using Cadence IC design environment.

The additional advantage of the proposed testing strategy is that it requires lower number of I/O pins and avoids the large number of high speed I/O pads. It therefore also solves the problem of the bandwidth limitation that is associated with I/O transmission paths. The design of the on-chip tester based on memory contains no analog block and is implemented entirely in digital domain. In the proposed design, low frequency of 1 MHz has been used outside the chip to load the data into the memory during the write mode. During the read mode, the frequency of 625 MHz is used to read the data from the memory. A multiplexing system is used to reuse the stored data during read mode to test the intended functionality and performance. In order to convert the parallel data into serial data at high frequency at the memory output, serializer has been used. By using the frequencies of 1.25 GHz and 2.5 GHz, the serializer speeds up the data from the lower frequency of 625 MHz to the highest frequency of 5 GHz in order to test DAC at 5 GHz.

(8)
(9)

ACKNOWLEDGEMENT

I would like to express my deepest thanks and gratitude to the people who have helped me during my master thesis.

Before all, I would like to thank my supervisor Professor Atila Alvandpour, for providing the opportunity to pursue my studies in this field and for guiding and supporting me during the different phases of the thesis.

I would like to extend my thanks to Ph.D student Ameya Bhide for all his support and assistance during thesis work. A special thanks to Ph.D student Muhammad Irfan Kazim for his advices, guides, and support during my study life in Linköping. Furthermore, my thanks are also extended to Ph.D students who helped me during this study, Fahad Qazi, Ali Fazli, Amin Ojani, Dai Zhang, Duong Quoc Tai, and Daniel Svärd.

I am also grateful to my best friends, Mahir Al-Taie and Taif Mohsin for their encouragement and assistance during my master study.

My study life would not have been possible to continue without my family: parents, brothers, and sisters. All my best regards, appreciation, and respect with love to all of them for their warm feelings, patience, and for everything.

Lastly, without my wife, it was impossible to complete my goal, study, and whole life. I would like to thank her deeply for her patience, limitless support, and great encouragement through my whole master study and life way.

Omar Jaber Omar Linköping, 2013

(10)
(11)

ABBREVIATIONS

CMOS complementary metal-oxide-semiconductor

DAC digital-to-analog converter

DFF D flip-flop

FF flip-flop

IC integrated circuit

MSPS mixed-signal processing system

NMOS n-channel metal-oxide-semiconductor

PISO parallel-in/serial-out

PLL phase-locked loop

PMOS p-channel metal-oxide-semiconductor

SIPO serial-in/parallel-out

SoC system-on-chip

TFF T flip-flop

TSPC true-single-phase clock

(12)
(13)

T able of C ontents

TABLE OF CONTENTS

Abstract

i

Acknowledgement

iii

Abbreviations

v

Table of Contents

vii

List of Figures

xi

List of Tables

xiii

Chapter 1 Introduction

1

1.1

Motivation

1

1.2

Overview of the Design

3

1.3

Organization of the Thesis

4

Chapter 2 Implementation of the Proposed Design

5

2.1

Serial to Parallel Conversion Unit 5

2.2

Memory Unit

6

2.2.1

Introduction

6

2.2.2

Proposed Memory Design

6

2.3

Clock Divider Unit

9

2.3.1

Introduction

9

(14)

T able of C ontents

2.4

Control Unit

12

2.4.1

Introduction

12

2.4.2

Proposed Control Unit Design

12

2.4.2.1

Interface Unit

13

2.4.2.2

Clock Multiplexing Unit

14

2.4.2.3

Non-Inverting Buffered 2:1 Multiplexer Unit (1) 17

2.4.2.4

Pulse Generation Unit

18

2.4.2.5

Non-Inverting Buffered 2:1 Multiplexer Unit (2) 22

2.4.2.6

Enable Element Unit

23

2.4.3

Summary

24

2.5

Serializer Unit

26

2.5.1

Introduction

26

2.5.2

Previous Works

26

2.5.2.1

Serializer by using N:1 Multiplexer Unit 26

2.5.2.2

Serializer by using one N:1 Serializer Unit directly 27

2.5.2.3

Serializer by using Feed-Forward 8-to-1 CMOS scan-FF

based Serializer

27

2.5.3

Proposed Serializer Unit

28

2.5.3.1

Operation of 2:1 Serializer Unit

30

2.5.3.2

Timing of 2:1 Serializer Unit 31

2.5.3.3

Timing of the Clock and Data paths 32

2.5.3.4

Data Distribution of the Complete 64:8 Serializer 32

2.6

8-bit Parallel to Serial Conversion Unit

34

Chapter 3 Timing and Clocking

37

3.1

Timing Parameters for Sequential Circuits

37

3.2

Synchronous Timing Issue

38

3.2.1

Impact of Skew and Jitter

39

3.2.2

Clock Distribution

39

3.2.3

Timing Constraints for Different Synchronous Sequential

Circuits

41

Chapter 4 Testbench and Simulation Results

43

4.1

Introduction

43

4.2

Serial to Parallel Conversion Unit and Memory Simulation

44

4.3

Clock Divider Simulation

47

4.4

Control Unit Simulation

50

(15)

T

able of C ontents

4.5

Serializer Simulation

62

4.5.1

Timing Issue of Unit625M

64

4.5.2

Timing Issue of Unit1.25G

65

4.5.3

Timing Issue of Unit2.5G

66

4.5.4

Simulation Result of 4:1 Serializer Unit

67

4.5.5

Simulation Result of 8:1 Serializer Unit

69

4.6

Complete System Simulation

72

Chapter 5 Summary and Conclusion

77

(16)
(17)

L

ist of F igures

LIST OF FIGURES

Figure 1.1: Block diagram of the proposed design. ...3

Figure 2.1: The whole memory unit with the serial to parallel conversion unit...7

Figure 2.2: Non-inverting buffered 2:1 multiplexer unit...8

Figure 2.3: Clock divider unit based on a synchronous 3-bit counter...10

Figure 2.4: T flip-flop...10

Figure 2.5: Output signals of the clock divider unit...11

Figure 2.6: Control unit components...12

Figure 2.7: Interface unit schematic...13

Figure 2.8: The glitch free clock switching for unrelated clocks technique...15

Figure 2.9: Output signals of the clock multiplexing unit...16

Figure 2.10: Output signal (clk1M) of the non-inverting buffered 2:1 multiplexer circuit (1)...17

Figure 2.11: Schematic of the pulse generation unit...18

Figure 2.12: 6-Input NAND gate...18

Figure 2.13: Output signal (counterpulse) of the pulse generation unit with glitches...19

Figure 2.14: Last glitch of the counterpulse signal...20

Figure 2.15: Output signal (clk1M-625M) of the non-inverting buffered 2:1 multiplexer circuit (2)...22

Figure 2.16: Schematic of the enable element...23

Figure 2.17: Output signals of the control unit...25

Figure 2.18: Proposed 8:1 serializer unit...28

Figure 2.19: Static DFF with symmetrical complementary clock signals...29

Figure 2.20: Proposed 2:1 serializer unit...30

Figure 2.21: Output data sequence of EX.1...30

Figure 2.22: Timing regions of the 2:1 serializer unit...31

Figure 2.23: Timing features of the serializer unit between clock path and data path...32

Figure 2.24: Output data sequence of the 64:8 serializer...33

Figure 2.25: A 8-bit parallel to serial conversion schematic...34

(18)

L ist of F igures

Figure 3.1: Definition of setup time, hold time, and propagation delay of a synchronous DFF...37

Figure 3.2: Pipelined datapath of a synchronized circuit...38

Figure 3.3: H-tree clock distribution network of clk2.5G signal...40

Figure 3.4: Clock gating circuit...41

Figure 3.5: Timing issue cases (case A)...42

Figure 3.6: Timing issue cases (case B)...42

Figure 4.1: Serial to parallel converter and 2x2 memory with timing regions...44

Figure 4.2: Simulation results of serial to parallel converter and 2x2 memory...45

Figure 4.3: Clock divider testbench...47

Figure 4.4: Simulation results of the clock divider unit...48

Figure 4.5: Clock multiplexing testbench...50

Figure 4.6: Simulation results of the clock multiplexing unit...51

Figure 4.7: Clock multiplexing timing regions...53

Figure 4.8: Pulse generation testbench...54

Figure 4.9: Simulation results of the pulse generation unit...55

Figure 4.10: TFF schematic with a synchronous reset...56

Figure 4.11: Complete control unit testbench...57

Figure 4.12.a: Simulation results (1) of the complete control unit (Start of simulation)...58

Figure 4.12.b: Simulation results (1) of the complete control unit (End of simulation)...59

Figure 4.13: Simulation results (2) of the complete control unit...60

Figure 4.14: Simulation results of the 2:1 serializer unit (Unit625M)...63

Figure 4.15: Schematic of the 4:1 serializer unit...67

Figure 4.16: Simulation results of the 4:1 serializer unit...68

Figure 4.17: Simulation results of the 8:1 serializer unit...70

Figure 4.18: Testbench of the complete system...72

Figure 4.19: Simulation results of the complete system...73

(19)

L

ist of T ables

LIST OF TABLES

Table 2.1 Counter sequence with the counterpulse signal level...21

Table 4.1 Different timing characteristics for different DFFs...43

Table 4.2 Simulation results of serial to parallel converter and 2x2 memory across different process corners...46

Table 4.3 Serial to parallel converter and memory unit, δ calculations for different process corners...46

Table 4.4 Simulation results of clock divider across different process corners...49

Table 4.5 Clock divider unit, δ calculations for different process corners...49

Table 4.6 Simulation results of clock multiplexing across different process corners...52

Table 4.7 Clock multiplexing unit, δ calculations for different process corners...53

Table 4.8 Simulation results of pulse generation across different process corners...56

Table 4.9 Pulse generation unit, δ calculations for different process corners...56

Table 4.10 Simulation results of complete control unit across different process corners...61

Table 4.11 Control unit, δ calculations for different process corners...62

Table 4.12 Simulation results of Unit625M unit across different process corners...64

Table 4.13 Unit625M, δ calculations for different process corners...64

Table 4.14 Simulation results of Unit1.25G across different process corners...65

Table 4.15 Unit1.25G, δ calculations for different process corners...65

Table 4.16 Simulation results of Unit2.5G across different process corners...66

Table 4.17 Unit2.5G, δ calculations for different process corners...66

Table 4.18 Simulation results of 4:1 serializer across different process corners...69

Table 4.19 Simulation results of 8:1 serializer across different process corners...71

Table 4.20 Acceptable range of positive and negative δW for different clocks...74

Table 4.21 Simulation results of needed nodes...75

Table 4.22 Results of the constraints between the clock path and data path for different regions...75

(20)
(21)

1.1 Motivation

1

INTRODUCTION

1.1

Motivation

Complementary metal-oxide semiconductor (CMOS) technology has become the basis of the modern digital integrated circuits because of the increased performance in terms of high speed due to continuous scaling. Moreover, it provides high speed with low cost implementation and allows to integrate millions of transistors on the same chip. Therefore, the CMOS technology has been considered as the dominant technology for the very large scale integration (VLSI) chip design [1],[2].

As digital circuits cost is very low or almost free in ultra-deep submicron CMOS technologies, therefore nowadays the interest in on-chip testing is increasing by increasing the complexity of the VLSI digital circuits. The on-chip testing is less costly than the testing based on external instrumentation because of the increased performance requirement of the chip in terms of high speed operation. Moreover, it is becoming less practical that the tester be manufactured on other semiconductor chip when the device and the tester can easily be manufactured on the same single chip, especially in ultra-deep submicron technologies where the transistor has been largely scaled down. Furthermore, system-on-chip (SoC) allows to design and fabricate digital, analog, and mixed-signal integrated circuits on the same chip.

As the complexity of the mixed-signal processing systems (MSPSs) is increasing, the challenges of testing are emerging. In high speed testing, it is difficult to connect the external test instrument to the chip without loss or distortion. So, the testing of integrated circuits (ICs) by using external instruments has become very complicated due to the high performance requirements.

The measurement of very high frequency suffers from degradation of core circuit performance because of the bandwidth limitations. The bandwidth limitations are caused by the physical nature of the I/O pads and physical length of the transmission path [3].

(22)

Chapter 1 - Introduction

The on-chip memory tester overcomes these problems. The high frequency is generated from another on-chip circuit. The proposed design provides clock divider to divide this clock frequency into four frequencies which can perform high frequency tests. In such case, the bandwidth limitation problem imposed by I/O transmission paths is solved. Hence, the on-chip memory (2 k bits) is included in the proposed design in order to avoid a large number of very high speed I/O pads and the proposed design can use the lower number of possible pins [4]. The low frequency during write operation can be driven from outside the chip to write the data into memory. This reduces the cost and complexity of the design. In addition, the serializer is included in the design in order to test the intended device at high frequency.

Many aspects have been taken into account when choosing the word length and depth of the memory, such as total area consumption, power consumption, and design complexity. These aspects are affected not only by the memory, but also by the serializer and clock distribution network needed in the design. A tradeoff between the frequency of the first stage of the serializer and the word length of the memory should also be taken into account in order to choose the optimum design parameters. The higher the frequency of the first stage of the serializer, less bits are needed to be taken out from the memory unit (lower word length). But at the same time, increasing the frequency of the first stage of the serializer increases the overall area and power consumption of the design. It will also increase the design complexity. Therefore, lower frequency of 625 MHz and higher word length of 64 bits with depth of 32 have been chosen.

(23)

1.2 Overview of the Design

1.2

Overview of the Design

The proposed design is composed of six main units: serial to parallel conversion, memory, serializer, clock divider, control, and clock distribution network as shown in Figure 1.1. Serial to parallel conversion unit consists of 64 of memory elements. The memory unit consists of 2048 of memory elements. The multiplexing system is also presented in the memory unit design. The memory unit stores the data at low frequency and reuses the stored data at high frequency (625 MHz). The serializer unit uses different frequencies 625 MHz, 1.25 GHz, 2.5 GHz, and 5 GHz in order to speed up the stored data from 625 MHz to the 5 GHz to test at high frequency. The clock divider unit is responsible for dividing the frequency (5 GHz) into four frequencies (clocks) in order to use them in the different parts of the design. The control unit is responsible for generating all the control signals that the design needs. The clock distribution network consists of multiple stages of buffer in order to drive the signal from its source to the terminal port.

(24)

Chapter 1 - Introduction

It is worth mentioning that the word length of the memory (64 bits) can be considered as 8 groups of 8 bits. Therefore, the output of each group is connected to one of 8:1 serializer unit. So, the entire serializer design consists of 8 pages of 8:1 serializer unit. As it will be mentioned later, the output signals of the memory unit should be connected to the input of the entire serializer as shown in Figure 2.24 (section 2.5.3.4) in order to get the right sequence of the data at the output of the serializer unit.

1.3

Organization of the Thesis

The thesis is organized as the following. Chapter two discusses the implementation of the entire design through explaining the used architectures in each design and the functionality of each design. Moreover, it describes the operation of serial to parallel conversion, memory, clock divider, control, serializer, and 8-bit parallel to serial conversion units in detail. Chapter three describes the solutions of the timing issues. Moreover, the impacts of skew and jitter on the synchronous circuits have been discussed. In chapter four, the testbench and simulation results for each circuit and the entire system are presented. In chapter five, the summary and conclusion of the thesis are presented.

(25)

2.1

Serial to Parallel Conversion Unit

2

IMPLEMENTATION

OF THE PROPOSED DESIGN

2.1

Serial to Parallel Conversion Unit

The serial to parallel conversion technique is needed to convert the serial input data into parallel output data. Thereby, the output data of the serial to parallel conversion unit are the input data of the memory unit. This technique is active during the write mode. The conversion from the serial format to the parallel format is done by using serial-in/parallel-out (SIPO) shift registers. These registers work at the frequency of 1 MHz (clk1M signal). To generate the clk1M signal, see section 2.4.2.3. According to the proposed design, the memory unit has 64 bits as a word length with depth of 32. Thus, 64 registers are needed to convert the data into parallel format. Each data bit is shifted during one clock cycle of clk1M signal. After 64 clock cycles, the 64 bits of data are stored into these registers. It is important to ensure that each new set of 64 bits data should be stored first into these registers and afterwards, the data are shifted into the memory. The advantage of this operation is that each set of 64 bits data is going to be valid at the output of the memory at the same time. After the memory is filled, the clock signal (clk1M) of the serial to parallel conversion unit is turned off. This means that the serial to parallel conversion unit is off and thus no more data enter into the memory. In other words, the write mode is ended and the read mode has started.

(26)

Chapter 2 - I mplementation of the Proposed Design

2.2

Memory Unit

2.2.1 Introduction

The memory unit is considered as a data storage unit in computer hardware systems [5]. The semiconductor memory integrated circuit is used to store the digital information that is used to perform a particular operation. Nowadays, the on-chip memory has been widely used in many VLSI applications (circuits) [6]. The memory unit is one of the most important units in the proposed design. From the specification requirement of the design, 2 k bits memory should be designed to store the data at low frequency and read these data at high frequency in order to test the intended device. Memory design, and generally integrated circuit design, involves tradeoffs among many factors, such as speed, power consumption, chip area, and cost. In this work, the main objective is to generate the on-chip input data at 5 GHz for testing of high-speed mixed-signal circuits. Therefore, in this particular case, the key most important design requirements on the memory are the speed and operational robustness. As will be explained in the following, robust and low-power shift registers are utilized as memory cells, and the high-speed memory readout is enabled with a 3-step successive multiplexing of 64 bits at 625 MHz to 8 bits at 5 GHz.

2.2.2 Proposed Memory Design

The proposed memory (2 k bits) has two dimensional array of shift registers (64x32 cells). One register denotes one cell memory (one bit). The transistor size of the register can be small. The proposed memory operation concept is very simple. The data transfer from one DFF to the adjacent DFF during single clock cycle and so on. Thus, the first data bit entered into the memory, is the first output data bit from the memory. This memory requires two clocks in order to perform write and read operations. The clock port of the memory passes the write clock (low frequency) during the write mode. After the memory is filled, the clock port passes the read clock (high frequency) in order to read the stored data during the read mode. Therefore, clock multiplexer should be designed in order to perform the multiplexing from the write clock to the read clock. When the whole memory is filled, multiplexing system should be provided in the memory unit in order to reuse the stored data for testing during the read mode. So, 64 multiplexers should be designed. Moreover, control unit to mange the multiplexers for writing and reading operations should be also designed. The whole memory unit with the serial to parallel conversion unit are shown in Figure 2.1.

(27)

2.2 Memory Unit

Figure 2.1: The whole memory unit with the serial to parallel conversion unit.

The memory unit and serial to parallel conversion unit use static traditional master-slave positive edge-triggered registers using multiplexers. They are distributed as shown in Figure 2.1. Note that there is a buffer stage after each register in order to fulfill the timing requirements between two adjacent registers. In contrast to the serial to parallel conversion unit, the memory has two modes of operation, write and read modes. To generate the clock of memory unit (clk1M-625M signal), see section 2.4.2.5. When the memory works at write mode, clk1M-625M signal is high all the time (64 clock cycles of clk1M) except the last half clock cycle (low). This means that after each set of 64 bits data has been converted from serial to parallel, these data are shifted simultaneously into the memory. Note that the stored data inside the memory will not be shifted to the next column until the new set of 64 bits data is stored again after the serial to parallel converter. The memory is filled after 32x64 times the clock cycles of clk1M. The write mode is ended and

(28)

Chapter 2 - I mplementation of the Proposed Design

the serial to parallel conversion unit is deactivated while the read mode is started by activating the multiplexing system. The memory clock (clk1M-625M) is switched from low clock to the high clock, and the memory starts reading and sending the data to the serializer at frequency of 625 MHz in order to test the intended device after the serializer unit.

The multiplexing system is used to reuse the stored data in order to test the intended device for periodic signals. The multiplexing system uses one non-inverting buffered 2:1 multiplexer based on transmission gate [7] in each row of the memory. The enable selection of each multiplexer starts at high level to deactivate the multiplexing system during the write mode. When all data are stored into the memory, the enable selection of each multiplexer switches to the low level. The multiplexing system is activated. This means that the memory starts reading the stored data at high frequency, and the multiplexing system allows the memory to reuse the stored data again. To generate the enable signal of memory unit, see section 2.4.2.6.

In order to perform the multiplexing operation using the memory, 64 of the non-inverting buffered 2:1 multiplexer units are inserted between the serial to parallel conversion unit and the memory. All the multiplexers which are used in this thesis are based on the non-inverting buffered 2:1 multiplexer unit except the clock multiplexing unit. The schematic of the non-inverting buffered 2:1 multiplexer unit is shown in Figure 2.2.

(29)

2.3 Clock Divider Unit

The multiplexer consists of two pass transistors forming transmission gate, inverters, and buffers. In order to overcome the charge-sharing problems, inverters are added before and after the transmission gates. Besides, the output signal of the multiplexer is going to be more robust because the transmission gates produce degraded signal. The buffers can be needed to satisfy the timing requirements for the circuit. This type of multiplexer consumes very low power and offers high speed performance [8]. So, it is faster than the multiplexer based on gate level approach because of the slower charging and discharging operations in the later approach. Thereby, the use of the proposed multiplexer in a high speed circuit design will be very helpful than the other. In addition, the proposed multiplexer has less number of transistors than the other. This schematic is not used only in the memory unit, it is also used in other different units in the proposed design, such as control unit and serializer.

The operation of the multiplexer is very simple. When the (Select) signal is low, the PMOS and NMOS of the transmission gate (TG0) are turned on while the PMOS and NMOS of the transmission gate (TG1) are turned off. Therefore, the input (DTG0) passes to the output of the multiplexer (MUXout). On the other hand, when the (Select) signal is high, the PMOS and NMOS of the transmission gate (TG1) are turned on while the PMOS and NMOS of the transmission gate (TG0) are turned off. Therefore, the input (DTG1) passes to the output of the multiplexer (MUXout). As a result, when the selection signal is high (multiplexing system is deactivated), the data transmission is based on write operation. On other hand, when the selection signal is low (multiplexing system is activated), the data transmission is based on read operation.

2.3

Clock Divider Unit

2.3.1 Introduction

The clock divider is needed to generate the required clocks from the original frequency (EXclock signal) which is 5 GHz. The output signals (clocks) of the clock divider are clk5G, clk2.5G, clk1.25G, and clk625M, respectively.

There are two design topologies for the clock divider unit; asynchronous counter and synchronous counter. Using the asynchronous ripple counter, presented in [9],[10] reduces the power consumption due to the small capacitance at high frequency node. However, the jitter problems are increased because the jitter will be accumulated stage by stage. On the other hand, using the synchronous counter increases the power consumption due to the large capacitance at high frequency node, but the advantage is that the jitter accumulation problems are reduced [10]. In addition, the clock synchronous divider topology eliminates any cumulative time delay because all the DFFs are connected together to the same single clock. Therefore, all output clocks change simultaneously at the rising edge of clock, and the maximum frequency of the synchronous counter will be significantly higher than the asynchronous ripple counter.

(30)

Chapter 2 - Implementation of the Proposed Design

2.3.2 Proposed Clock Divider Unit

The proposed clock divider schematic is based on a synchronous 3-bit counter as shown in Figure 2.3.

Figure 2.3: Clock divider unit based on a synchronous 3-bit counter.

The clock divider schematic consists of three T flip-flops (TFFs), one AND gate, and buffers. The TFF consists of one DFF and one non-inverting buffered 2:1 multiplexer unit, as shown in Figure 2.4.

Figure 2.4: T flip-flop.

The DFFs are operating at the high frequency (5 GHz). The buffer before (clk5Ga) node is used to drive the gates after this node and makes the signal more robust, especially when the rise time and fall time of the output signal (clk5G) should be less than 20 ps across different process corners. In addition, the rise time and fall time of the output signals (clk625M, clk1.25G, and clk2.5G) should be less than 100 ps, 80 ps, and 20 ps, respectively, across different process corners. The rising edge of all signals (clocks) should happen at the same time. To achieve this condition, the node of the first TFF is connected to the input of the second TFF,Ǭ and the node of the second TFF is connected to the input of the AND gate. The output signals of the clockǬ divider unit are shown in Figure 2.5.

(31)

2.3 Clock Divider Unit

Figure 2.5: Output signals of the clock divider unit.

In order to achieve proper synchronization between the clocks, additional buffers are introduced. The synchronous buffers of the different clocks are very useful to use them in this phase of designing to facilitate minimization of the skew effect after the clock distribution. Thereby, four different clock signals (clk5G, clk2.5G, clk1.25G, and clk625M) have been generated, and the required performances have been achieved. The clk5G, clk2.5G, and clk1.25G are connected directly to the serializer unit while clk625M (Rclk signal) is connected to the clock multiplexing unit inside the control unit.

(32)

Chapter 2 - Implementation of the Proposed Design

2.4

Control Unit

2.4.1 Introduction

The control unit is one of the most important units in the design because it is responsible for fetching the CLOCK, ENABLE, RESET, and DATA signals (from outside chip) and uses them with the clock divider unit to generate all the control signals that are needed in the design. All the input signals of the control unit are fetched from the outside chip except the Rclk signal (read clock) which is fetched from the clock divider unit.

2.4.2 Proposed Control Unit Design

The proposed control unit design consists of the following units: interface, clock multiplexing, non-inverting buffered 2:1 multiplexer (1), pulse generation, non-inverting buffered 2:1 multiplexer (2), and enable element units. The overview of the control unit is shown in Figure 2.6.

(33)

2.4

Control Unit

2.4.2.1 Interface Unit

The interface unit consists of master-slave negative edge-triggered registers using multiplexers, drivers, and buffers. It is responsible for fetching the CLOCK, DATA, ENABLE, and RESET signals (from outside chip) to the proposed design. The interface unit schematic is shown in Figure 2.7. The Wclk signal is derived from the CLOCK signal by using driver, and it is used to synchronize the other control signals. Thus, the output signals of the interface unit are synchronized at the falling edge of the Wclk signal. The driver before the reset signal is used to increase the fan-out of the reset signal and to drive the gates in the pulse generation unit. All the timing regions of the interface unit have also been shown in the same Fig.

(34)

Chapter 2 - Implementation of the Proposed Design

The output signals of the interface unit are described as the following:

• DATAOUT: It is responsible for sending the data directly to the serial to parallel converter.

• ENABLEOUT: It is used in the clock multiplexing unit as a selection signal between the write and read clocks, and it is taken from this node to compensate the waited clock cycle during clock multiplexing operation. See clock multiplexing unit (section 2.4.2.2).

• Wclk: It is considered as a write clock (1 MHz).

enableSER: It is used to shut down the serial to parallel conversion unit during read mode. It is also used to enable the serializer unit during the read mode.

reset: It is used to force the counter to start from 0 value in the pulse generation unit. It is taken from this node to match the counting with the corresponding data transfer into the serial to parallel conversion unit.

2.4.2.2 Clock Multiplexing Unit

As it is mentioned in section 2.2.2, the memory has two operation modes, write and read mode. During write mode, WRclk signal is considered as the Wclk signal (1 MHz), while during read mode, WRclk signal switches to the (Rclk) read clock signal (625 MHz). Thus, the clock multiplexing unit is needed to generate the write-read clock signal (WRclk). The advantage of generating the WRclk signal is to generate the memory clock signal (clk1M-625M). During the clock multiplexing operation, the clock multiplexer should be switched from clock to another without introducing any glitch at the output of the multiplexer. The clocks may be multiples of each other, or totally unrelated to each other. These two different methods of implementing a glitch free clock multiplexing are presented and discussed in detail in [11]. The write and read clocks may not be related, because they are generated from different sources. So, the glitch free clock switching for unrelated clocks is used as shown in Figure 2.8. Hence, no data are missed during the clock switching operation.

(35)

2.4

Control Unit

Figure 2.8: The glitch free clock switching for unrelated clocks technique.

The glitch may happen at the output of the multiplexer in case that the output signal switches from the current clock to the next clock directly when the select signal changes. The two negative edge-triggered DFFs are added first in the selection path in order to prevent any kind of glitch at the output of the clock multiplexer where the clocks are multiples of each other by using the feedback from the selection of one clock to the other forward clock. This operation disables the current clock to propagate directly to the output and waits for the next clock before the propagation. Thereby, any glitches are avoided when the clocks are multiples of each other. In order to use this criteria to avoid the glitches when the clocks are completely unrelated, two positive edge-triggered DFFs are added in the selection path. The selection signal or the feedback selection signal may be applied in asynchronous manner. The meta-stability caused by these signals is avoided after adding these DFFs [11].

The read mode is activated after the memory is filled. This means, after (64x32xTWclk), the ENABLEOUT

signal goes low. Then, the output clock (WRclk) is switched to the read clock after a certain time. In order to avoid missing data, during clock multiplexing operation caused by the waiting state, the ENABLEOUT signal should be taken from the node that is described in section 2.4.2.1. The output clock (WRclk) is considered as a write-read clock. In other words, the WRclk signal is equal to Wclk signal during the write mode, while the WRclk signal is equal to Rclk signal during the read mode. The output clock (WRclk) of the clock multiplexing unit is presented in Figure 2.9.

(36)
(37)

2.4

Control Unit

2.4.2.3 Non-Inverting Buffered 2:1 Multiplexer Unit (1)

The non-inverting buffered 2:1 multiplexer circuit is used in order to generate the clock signal (clk1M) of the serial to parallel conversion unit. During the write mode, the enableSER signal is high, so WRclk signal passes to the output of the multiplexer, and the clk1M signal is considered in this phase of designing as the outside clock. Basically, the serial to parallel conversion unit works at low frequency in order to write the data into the memory unit. On the contrary, the enableSER signal is low during the read mode, so the zero level (vss) passes to the output of the multiplexer, and the clk1M is turned off in order to shut down the serial to parallel conversion unit when the memory is filled. The output signal (clk1M) of this circuit is shown in Figure 2.10.

(38)

Chapter 2 - Implementation of the Proposed Design

2.4.2.4 Pulse Generation Unit

As it is mentioned in section 2.1, after each 64 clock cycles of clk1M signal, the 64 bits data are stored first into the serial to parallel conversion unit before shifting these data into the memory. In order to create single pulse signal in each 64 clock cycles of clk1M, pulse generation unit is presented. The pulse generation unit is based on the 6-bit counter with synchronous reset in order to reset the counter. In addition, 6-input NAND gate is used to generate the needed (counterpulse) signal from the output of the 6-bit counter. The schematic of the pulse generation unit is shown in Figure 2.11.

Figure 2.11: Schematic of the pulse generation unit.

The 6-input NAND gate is based on CMOS logic gates with 6 inputs as shown in Figure 2.12.

Figure 2.12: 6-Input NAND gate.

Before adding the buffers in the pulse generation schematic, some glitches occur at the output signal (counterpulse) as shown in Figure 2.13, and Figure 2.14 shows the zoomed simulation result of the last glitch before adding the buffer.

(39)

2.4

Control Unit

(40)
(41)

2.4

Control Unit

For a short duration of time, all the outputs of the counter go high, hence the counterpulse signal goes low as an unwanted glitch. So, a time delay is needed, for example, of signal Q1 by introducing buffer. Therefore, the above mentioned buffers are used to remove the glitches that occur at the output of the pulse generation unit. Table 2.1 shows the counter sequence with its pulse level after adding these buffers.

Binary Decimal Pulse Q5 Q4 Q3 Q2 Q1 Q0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 1 1 0 62 1 1 1 1 1 1 1 63 0

Table 2.1 Counter sequence with the counterpulse signal level.

After adding the buffers, the simulation result shows that the functionality of the output signal (counterpulse) for the pulse generation unit is working perfectly across different process corners. The simulation result for (TT, 50 °C) corner process is shown in Figure 4.9.

The period of the counterpulse signal is (64xTclk1M). For the first 63 clock cycles, the counterpulse signal is

high, while it is low for the last clock cycle. In other words, during the write mode, the counterpulse signal goes low after every 64 clock cycles of clk1M signal. This means when all the 64 bits data are valid at the output of the serial to parallel conversion unit, the last counterpulse signal goes low to shift these 64 bits data into the memory. After 32 cycles of the counterpulse signal, the counterpulse signal goes to the last low level and gives the last permission to enter the last 64 bits data into the memory. During the last low level of the counterpulse signal, the read mode is activated, and the 6-bit counter shut down because the clk1M is turned off. Therefore, the counterpulse signal continues as a low level signal during the read mode.

(42)

Chapter 2 - Implementation of the Proposed Design

2.4.2.5 Non-Inverting Buffered 2:1 Multiplexer Unit (2)

The non-inverting buffered 2:1 multiplexer circuit is used to generate the clock signal (clk1M-625M) of the memory unit in order to write and read the data into and from the memory, respectively.

During the write mode, when counterpulse signal is high, the high level (vdd) passes to the output of the multiplexer, and clock (clk1M-625M) of the memory is also high. This operation prevents any data into the serial to parallel conversion unit to be shifted into the memory before the serial to parallel conversion unit is filled. After filling the serial to parallel conversion unit, the counterpulse signal goes low, so the WRclk signal passes to the output of the multiplexer, and the counterpulse signal gives the permission to the clock of the memory to be equal to the write clock for one period in order to shift the valid 64 bits data into the memory. It is worth mentioning that the power consumption of the memory has been reduced by minimizing the switching activity as much as possible. In order to avoid any kind of glitches during the multiplexing operation, the counterpulse signal should be delayed after the WRclk signal.

During the read mode, the counterpulse signal continues as low level signal, the WRclk passes to the output of the multiplexer, and clk1M-625M signal of the memory is considered as the read clock (625 MHz) at this phase of design. The output signal of this circuit is shown in Figure 2.15.

(43)

2.4

Control Unit

2.4.2.6 Enable Element Unit

The enable circuit element is used in order to generate the enable signal of the memory unit. It consists of one DFF and buffer as shown in Figure 2.16. The enable signal is derived from the enableSER signal in each clock cycle of the WRclk signal. The enable signal is responsible for activating the multiplexing system in the memory unit during the read mode.

Figure 2.16: Schematic of the enable element.

During the write mode, the enable signal is high, so the multiplexing system of the memory unit is deactivated until all the data are stored into the memory. While during the read mode, the enable signal is low, so the multiplexing system in the memory unit is activated and allows the memory to reuse the stored data to test the intended device after the serializer unit.

The most important issue is that, before the multiplexing system is activated, the data should be completely stored into the memory. The multiplexing system may loose one bit data in each row of the memory when enable selection goes from high to low. This happens if the enable signal goes low before the clk1M-625M signal goes high (after the distribution circuits). Thus, the unwanted data are entered into the memory from the multiplexing system. This may happen because of the propagation delay of the enable signal is much less than the propagation delay of the clk1M-625M signal. In order to solve this problem, buffers are introduced before the enable distribution in order to be sure that the enable signal goes low after the rising edge of the clk1M-625M (after the distribution circuits).

(44)

Chapter 2 - Implementation of the Proposed Design

2.4.3 Summary

The control unit is used to generate the control signals of the proposed design. The design of control unit is successfully implemented, and the functionality of the control unit is achieved.

The input signals of the control unit are described as the following:

• CLOCK: It is a low frequency (write) clock. It is responsible for synchronizing the outside signals, and it is used to write the data into the memory unit.

• ENABLE: It is a switching signal from the write mode to the read mode.

• RESET: It is used in order to force the counter to start from 0 value in the pulse generation unit. • DATA: It is responsible for fetching the data from the outside chip into the control unit after every

clock cycle of the write clock.

Rclk: It is a read clock (625 MHz). The control unit uses this signal with Wclk signal to generate the write-read clock (WRclk).

The output signals of the control unit are described as the following:

DATAOUT: It is responsible for fetching the data serially into the serial to parallel conversion unit on each clock cycle of clk1M signal.

clk1M: It is responsible for clocking the serial to parallel conversion unit. The clk1M signal is turned off at read mode.

clk1M-625M: It is the memory clock signal. It is responsible for writing and shifting the data into the memory at every 64 clock cycles of clk1M signal during the write mode, and it is responsible for reading the stored data at read frequency (625 MHz) during the read mode.

• enable: It is responsible for activation of the multiplexing system in the memory unit in order to reuse the stored data during the read mode.

(45)

2.4

Control Unit

The control signals of the control unit are shown in Figure 2.17.

(46)

Chapter 2 - I mplementation of the Proposed Design

2.5

Serializer Unit

2.5.1 Introduction

The serializer unit is used to convert the parallel data at low frequency into serial data at high frequency. In other words, it is responsible for speeding up the data from low frequencies to high frequencies. Thereby, the high speed issues and the robustness are considered as the most important keys of designing the serializer unit.

2.5.2 Previous Works

In this section, three different types of serializer from different papers are compared with respect to the specified design.

2.5.2.1 Serializer by using N:1 Multiplexer Unit

The serializer architecture in [4] is used to convert 10 bits parallel data at low frequency to one bit data streams at high frequency. Two 5:1 multiplexers are used with their indicated control signals to divide the input stream into even and odd data. Then, the traditional 2:1 serializer unit is used to get the output serial data at high frequency. This type of architecture needs an additional logic circuit to generate the control signals of the multiplexers.

Discussion

The proposed design on the other hand, does not need any additional control circuit to generate the control signals of the multiplexers, because the output signals of the clock divider unit are also used as control signals of the multiplexers. Actually, the clock divider already divides the high frequency (5 GHz) until it reaches the read frequency (625 MHz). So, the proposed design exploits this characteristic to use the divided frequencies (2.5 GHz, 1.25 GHz, and 625 MHz) as control signals of the multiplexers, and basically, this is the main advantage of using this type of architecture in the proposed design. Therefore, each stage is driven by its corresponding clock from the clock divider unit.

(47)

2.5

Serializer Unit

2.5.2.2 Serializer by using one N:1 Serializer Unit directly

The serializer architecture in [12] is used to convert 8 bits parallel data at low frequency to one bit data streams at high frequency directly. The serializer is designed as eight parallel tri-state buffers with position encoder circuit to provide the serializer selections. The common dynamic flip-flops true-single-phase clock (TSPC) logic [13] are implemented. The reduction of the area is the main advantage of using this architecture. Thus, the power consumption reduces as well.

Discussion

However, the area and the power consumption will be increased again by adding the position encoder circuit to the design. In addition, this architecture requires a symmetrical distribution of different clocks and selection signals to all the DFFs and tri-state buffers. On the other hand, the advantage of using the proposed serializer design is that the multiplexers operate at different frequencies, and only the last stage operates at the high (2.5 GHz) frequency. Thus, the power consumption can be reduced if dynamic logic is used [9]. But there are many advantages of using a traditional static CMOS logic technology, such as very low static power consumption, high noise margins, robustness, density, among others [14]. In addition, the design of the static logic is more easier than the dynamic logic.

2.5.2.3 Serializer by using Feed-Forward 8-to-1 CMOS scan-FF based Serializer

The serializer architecture in [15] is used to convert 9 bits data parallel at low frequency to one bit data streams at high frequency by using 9-bit channels. The design uses 17 CMOS scan-FF to perform the serialization operation. The main advantage of using this type of serializer is to speed up the data from the 625 MHz to 5 GHz directly.

Discussion

The area of the CMOS scan-FF is larger on the other hand, than the area of the static CMOS FF. In addition, there are two input ports which are added in the CMOS scan-FF, one of them needs a large driver (chain of buffers) to drive the signal to the CMOS scan-FF while the proposed design does not need such a driver because it uses a normal CMOS FF. In the serializer, based on scan-FF testing methodology, the high performance CMOS circuits dissipate significant dynamic power during testing because of the enhanced switching activity [16]. In addition, the design complexity of static CMOS FF is much less than the design complexity of CMOS scan-FF.

(48)

Chapter 2 - I mplementation of the Proposed Design

2.5.3 Proposed Serializer Unit

The proposed architecture of the serializer unit is based on the traditional tree structure technique and was implemented as an arrangement of 2:1 serializer units [14],[17]-[19]. In order to convert 64 bits parallel data at low frequency to 8 bits data streams at high frequency, 8 pages of 8:1 serializer with 5 GHz serial data at the output are presented. Each 8:1 serializer consists of seven 2:1 serializer units (four Unit625M, two Unit1.25G, and one Unit2.5G). Each set of 8 bits data is separated into even and odd data. At the output of the serializer unit, these data are serialized separately into 1 bit streams at 2.5 GHz. Then, the data are resynchronized at 5 GHz at the output. So, each 8 bits stream is interleaved into one bit streams at 5 GHz. In order to get the correct sequence as [b7, b6, b5, b4, b3, b2, b1, b0] at the output, the input data sequence should be as b0, b4, b2, b6, b1, b5, b3, and b7, respectively. The proposed 8:1 serializer unit is shown in Figure 2.18.

(49)

2.5

Serializer Unit

Basically, the 8:1 serializer has 3 stages working at different frequencies. The first and second stages are operating with basic CMOS logic 2:1 serializer unit at frequencies of 625 MHz and 1.25 GHz, respectively. But the last stage is operating with a special design of CMOS logic 2:1 serializer unit at 2.5 GHz. The basic 2:1 serializer unit uses static traditional transmission gate based DFFs because the used clock is not faster than 2 GHz, while the special designed 2:1 serializer unit uses static DFFs with symmetrical complementary clock signals. The reason of using this type of DFF at high frequency is that the internal clock signals of the static traditional transmission gate DFF are not very symmetrical because they are generated by using inverters. Hence, the propagation delay of the transmission gate switching in the DFF increases. Therefore, at high frequencies, a static DFF with symmetrical complementary clock signals as shown in Figure 2.19 is used [20]. Note that the symmetry of the complementary clock signals increases more by adding a transmission gate beside the clock port.

(50)

Chapter 2 - I mplementation of the Proposed Design

2.5.3.1 Operation of 2:1 Serializer Unit

The proposed 2:1 serializer unit schematic is shown in Figure 2.20. The two bits data are resynchronized by the rising edge of the clock signal ( /2)ϕ . Then, one of them is delayed by half clock cycle by the falling edge of the clock signal, and at the same time the non-inverting buffered 2:1 multiplexer unit passes the other bit to the output. The next rising edge of the clock signal passes the delayed bit. The output data of the 2:1 serializer unit can be retimed on clock signal ( ) ϕ by adding one register after the unit. The retiming operation is used to eliminate glitches and jitters due to the non-inverting buffered multiplexer circuit and duty distortion of the clocks [17].

Figure 2.20: Proposed 2:1 serializer unit.

In order to understand the operation of the 2:1 serializer unit, the following example (EX.1) is considered. If the digital sequences of even and odd data are for example [0 1 0 1 0 1 0 1 ...] and [1 0 1 0 1 0 1 0 ...], respectively, the output data sequence should look like [0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 ...] as shown in Figure 2.21.

(51)

2.5

Serializer Unit

2.5.3.2 Timing of 2:1 Serializer Unit

According to Figure 2.22, the timing issues of regions A, B, C, D, and E are specified.

Figure 2.22: Timing regions of the 2:1 serializer unit.

In order to satisfy the timing requirements and skew (δ) calculations of any 2:1 serializer unit, the following equations are introduced. Note that there is no need to calculate the hold time for regions A, B, and C.

Region -A-

1) 1

2

T

/2

 

t

cq−/2 p −ff

MAX

t

su

− 

Region -B-

1) 1

2

T

/2

 

t

cq−/2n−beforeMux1

MAX

t

su

− 

Region -C-

1) 1

2

T

/2

 

t

cq−/2 p −beforeMux0

MAX

t

su

− 

Region -D-

1) T

 

t

cq−/2 p−output

MAX

t

su

− 

2) t

h

 

t

cq−/2  p−output

MIN

− 

Region -E-

1) T

 

t

cq−/2 n−output

MAX

t

su

−  ±

0.02⋅T

/2

2) t

h

 

t

cq−/2 n−output

MIN

− 

Note that the ± (0.02xTϕ/2) is added to the first equation in the most critical Region -E-. This addition means

(52)

Chapter 2 - I mplementation of the Proposed Design

2.5.3.3 Timing of the Clock and Data paths

The matching between clock path and data path should be taken into account by adding the required time delays in the clock path. According to Figure 2.23, the following equations should be satisfied to avoid the race conditions.

Figure 2.23: Timing features of the serializer unit between clock path and data path. Region

-1-1) t

cq−clk5Ga−Q0

 

t

d2

 

t

cq−clk2.5G0−c0

 T

clk5G0

t

su1

t

d1

Region

-2-2) t

cq−clk5Ga−Q1

 

t

d3

 

t

cq−clk1.25G0−even

 

T

clk2.5G0

t

su2

t

d2

Region

-3-3 ) t

cq−clk5Ga−Q2

 

t

d4

 

t

cq−clk625M0−a0

 T

clk1.25G0

t

su3

t

d3 2.5.3.4 Data Distribution of the Complete 64:8 Serializer

(53)

2.5

Serializer Unit

(54)

Chapter 2 - Implementation of the Proposed Design

2.6

8-bit Parallel to Serial Conversion Unit

In order to test the proposed design at the layout level, the 8-bit parallel to serial conversion circuit is connected to the output of the proposed design in order to use only one pad for th e output data pin. Moreover, 3-bit counter circuit should be implemented in the control unit, clock divider unit should be extended, and clock distribution of the parallel to serial converter circuit should be added. However, these circuits are not included in this thesis work as the proposed design of this thesis is implemented only at the schematic level design (Transistor Level) and not the layout level. Thus, this section gives a short description of the 8-bit parallel to serial converter circuit.

The parallel to serial conversion technique is needed to convert the parallel input data into serial output data. The parallel to serial conversion schematic is based on parallel-in/serial-out (PISO) shift registers with multiplexers. The circuit consists of buffer, registers, and non-inverting buffered 2:1 multiplexers, as shown in Figure 2.25. Each set of 8 bits data is loaded into the circuit first, then the circuit shifts these data serially to the output of the circuit before the new round.

Figure 2.25: A 8-bit parallel to serial conversion schematic.

The last operation in the proposed design is that, the 8 bits data are resynchronized at clock ( /8ϕ ). Thereby, the bits [0-7] data are available to be loaded into the circuit. The

SH /LD

signal is responsible for loading the data into the circuit and shifting these data serially to the output of the circuit. The loading operation is dominant when the

SH /LD

signal is low, while the shifting operation is dominant when the

SH /LD

signal is high. In order to generate this signal, 3-bit counter should be designed with 3-input OR gate. The 3-bit counter should be designed with respect to the parallel to serial circuit clock (ϕ). So, for each 8 clock cycles of (ϕ), new set of 8 bits data is loaded and shifted to the output serially. Thereby, all the control signals of the circuit are designed in order to achieve the functionality of the circuit. Figure 2.26 shows the control signals of the circuit.

(55)

2.6 8-bit Parallel to Serial Conversion Unit

Figure 2.26: Control signals of the 8-bit parallel to serial conversion unit.

Note that the clock (ϕ) should be equal to the highest frequency in the circuit (5 GHz). This means that the clock ( /8ϕ ) is equal to (625 MHz). Therefore, the clock divider should be extended in order to generate the new clocks needed. The new clocks of the proposed design are going to be clk625M, clk312.5M, clk156.25M, and clk78.125M signals instead of clk5G, clk2.5G, clk1.25G, and clk625M signals, respectively.

The main advantage of using this type of technique is to reduce the chip size and the complexity of the design during the functionality verification of the design.

(56)
(57)

3.1 Timing Parameters for Sequential Circuits

3

TIMING AND CLOCKING

3.1

Timing Parameters for Sequential Circuits

There are three important timing parameters which are associated with the sequential circuits, the setup time, the hold time, and the propagation delay time. When DFF is based on positive edge trigger as shown in Figure 3.1, the setup time tsu is defined as the amount of the time that the D input must be stable before the

rising edge of the clock, while the hold time th is defined as the amount of the time that the D input must

remain stable after the rising edge of the clock, and the propagation delay tcq is defined as the amount of

time, the output changes after the rising edge of the clock. The setup, hold, and propagation times should be measured independently of each other. More details about the characterization of the setup and hold times can be found in [7].

(58)

Chapter 3 - Timing and Clocking

In the synchronous sequential circuits, the clock frequency is limited by these timing parameters along with the propagation delay time of the combinational logic circuit. The minimum time period (T) of the clock frequency for a particular sequential circuit should be larger than the longest delay of all paths in the network. In the high frequency design, these parameters should be minimized as much as possible in order to satisfy the timing requirements. So, the transistor sizing of the register should be increased slightly.

3.2

Synchronous Timing Issue

Nowadays, most of digital system designs use the synchronization technique (as the proposed design) in order to coordinate the operation of the design. The clock distribution of the design is used to distribute the clock signal to all units in the system. The speed of clock increases as long as the size of the VLSI design is scaled down. Thereby, the speed of clock is increasing in relation to the propagation delay in the digital system design [21]. Figure 3.2 shows a simple circuit of synchronous pipelined datapath.

Figure 3.2: Pipelined datapath of a synchronized circuit.

Under the ideal condition, the timing constraints of this synchronous sequential circuit are given by two equations:

T  t

cq

MAX

t

p−logic

t

su and,

t

h

 

t

cq

MIN

t

cd −logic

Where tp-logic and tcd-logic are the maximum and minimum delay (contamination delay) of the combinational

logic circuit, respectively. A good example of the propagation and contamination delays estimation for a combinational logic circuit is discussed in detail in [7]. Unfortunately, in reality, there is no ideal condition since the clock is never ideal. There are many factors that influence the timing performance of the digital design, such as on-chip variation [22] and manufacturing device variation [7]. Thereby, the clock is affected by either skew or/and jitter impacts. These impacts are very critical issues in the digital integrated circuit design.

References

Related documents

For a two-tone test, ideally, two spectrally clean sinusoidal signals with low phase noise must be added linearly to provide a test stimulus. The tones can be

Linköping Studies in Science and Technology

Table 1 presents descriptive statistics for the survey respondents. There are small age differences between the self-employed and the wage-employed; the inactive are older. There

However, instructions focused on accurate and informative reporting about past actions and corroborating evidence did not result in a larger number of correct details compared

Syfte: Syftet med studien är att beskriva arbetsterapeuters erfarenheter av att ge patienter negativa besked om en intervention och hur relationen mellan arbetsterapeuten

(a) Original graph (b) Ordered during the scheduling process for the time point that comes just after ܵ ଴ǡ଴ has been scheduled.. The number of nodes taken into account at a time

Three crucial thermal issues related to temperature variations, temperature gradients, and temperature cycling are addressed in this thesis.. Existing test scheduling techniques

The project can be divided into three main activities protein expression, protein purification and measurements and analysis of the results, which include the development of