Power-Eﬃcient Design of an Embedded Flash Memory Management System

(1)

Power-Efficient Design of an Embedded Flash Memory Management System

Master thesis

JONAS BRUNLÖF

December 2009

Supervisors:

Magnus Persson, KTH Barbro Claesson, ENEA

Detlef Scholle, ENEA Examiner:

Martin Törngren, KTH

(2)

(3)

iii

.

Master thesis MMK2009:99 MDA356

Power-Efficient Design of an Embedded Flash Memory Management System

Jonas Brunlöf

Approved: Examiner: Supervisor:

2009-12-17 Martin Törngren Magnus Persson

Employer: Contact person:

ENEA AB Detlef Scholle

.

Abstract

.The report is the result of a master thesis at ENEA AB during the fall of 2009. It aims to create a specification of flash memory management system which focuses on power efficiency and low RAM usage for embedded systems, and to design and implement a prototype of such a system to facilitate further development toward the created specification. The system used by ENEA today is a Flash Translation Layer (FTL). It has a complex structure which prohibits modifications and customization, therefore a new flash memory management system needs to be developed.

The suggested solution uses a translation layer called Metadata FTL (MFTL), where file system metadata and userdata are separated from each other in order to improve performance. The partition holding userdata uses a block-level mapped translation layer system called Fully Associative Sector Translation FTL. The other partition, holding metadata, will instead use a page-level mapped translation layer system which also separates often modified data from data modified seldom. The separation of data with different update frequencies is executed by a page allocation scheme called Modification Aware (MODA).

The result of this report is a specification of the system described in the section above and an implemented prototype which has all the basic features of an FTL.

The implemented design can successfully be used instead of the old FTL with a few restrictions. It can handle normal file system commands and can manage reboots without loss of information. However, the main goal of the implemented design is still to act as a prototype to facilitate further development toward the design explained in the specification.

(4)

iv

.

Examensarbete MMK2009:99 MDA356 Energieffektiv design av ett inbyggt

flashminneshanteringssystem Jonas Brunlöf

Godkänt: Examinator: Handledare:

2009-12-17 Martin Törngren Magnus Persson

Uppdragsgivare: Kontaktperson:

ENEA AB Detlef Scholle

.

Sammanfattning

.Denna rapport är resultatet av ett examensarbete på ENEA AB under hösten 2009.

Målet med arbetet är att skapa en specifikation över ett hanteringssystem för flashminnen som fokuserar på energieffektivitet och lågt RAM utnyttjande för inbyggda system, samt att designa och implementera prototyp som kan verka som grund för att vidareutveckla systemet mot den framtagna specifikationen. Det system som idag används av ENEA är ett translationslager (FTL). Det har en komplex struktur vilket förhindrar modifieringar och anpassningar, därför ska ett nytt hanteringssystem för flashminnen tas fram.

Den framtagna lösningen använder ett translationslager kallat Metadata FTL (MFTL) där metadata och användardata separeras från varandra för uppnå bättre prestan- da. Partitionen som håller användardata använder ett blocknivå-mappat translationslager kallat Fully Assosiative Sector Translation FTL, vilket är designat för minimera energikonsumtionen genom att begränsa kostsamma skriv- och raderop- erationer till flashminnet och samtidigt konsumera lite RAM. Den andra partitionen som innehåller metadata använder istället ett sidnivå-mappat translationslager som samtidigt separerar data som modifieras ofta och data som sällan modifieras för att spara ännu fler operationer. Separeringen av data med olika uppdateringsfrekvens utförs av ett allokeringsschema som heter MODA.

Resultatet av denna rapport är en specifikation över det system som är beskrivet ovan samt en implementering av prototyp som har alla de grundläggande funktioner ett FTL har. Den implementerade designen kan framgångsrikt användas istället för det gamla FTL:et med några restriktioner. Det klarar normala filsystemkommandon och kan hantera omstarter utan att tappa information. Fortfarande är det dock så att den implementerade designen först och främst skall ses som en prototyp som kan användas för vidareutveckling av systemet.

(5)

List of Figures

3.1 NAND flash architecture . . . 11

3.2 Flash memory management system architecture . . . 13

3.3 JFFS garbage collection . . . 15

3.4 The MODA scheme classification . . . 18

3.5 Page- and block-address translation . . . 21

3.6 Merge and switch operations . . . 22

3.7 DAC regions translation . . . 24

4.1 Overview of process states . . . 32

4.2 Structure of signal . . . 32

5.1 Object in linked list . . . 36

5.2 The structure of a metadata chunk . . . 37

5.3 Structure of delete-metadata chunk . . . 38

5.4 Overview of implementation in OSE . . . 40

6.1 Example of a result printout . . . 43

6.2 Result from test case 1:a . . . 44

6.3 Result from test case 1:b . . . 45

6.4 Result from test case 1:c . . . 45

6.10 Result from test case 3:d . . . 49

6.11 Result from test case 3:e . . . 49

6.12 Result from test case 3:f . . . 49

6.13 Result from test case 3:g . . . 49

6.17 Result from test case 4:d . . . 52

6.18 Result from test case 5:a . . . 52 viii

(9)

List of Figures ix

A.1 Complete result from test case 1 . . . 65

A.4 Complete result from test case 4a . . . 68

A.5 Complete result from test case 4b . . . 69

(10)

List of Tables

2.1 NAND flash characteristics . . . 6

3.1 Overview of flash file system properties . . . 19

3.2 Optimal flash file system . . . 20

3.3 Evaluation summary of discussed translation layers . . . 26

7.1 Evaluation of the requirements . . . 59

B.1 Descriptions of the requirements . . . 71

x

(11)

Abbreviations

JEFF Journaling Extensible File system Format JFFS Journaling Flash File System

YAFFS Yet Another Flash File system CFFS Core Flash File System

MODA MODification-Aware FTL Flash Translation Layer

AFTL Adaptive Flash Translation Layer BAST Block-Associative Sector Translation

FAST Fully-Associative Sector Translation FTL/FC FTL/Fast Cleaning

MDT Memory Technology Device DAC Dynamic dAta Clustering

XIP eXecute In Place

EEPROM Electrically Erasable Programmable Read-Only Memory LRU Least Recently Used

FAM Flash Access Manager

SFK-BSP SoFt Kernel Board Support Package

xi

(12)

(13)

Chapter 1

Introduction

1.1 Background

The master thesis was done at ENEA AB in collaboration with the Royal Institute of Technology, KTH. It is a part of GEODES, a project with the aim to provide power awareness and innovative management capabilities for operating systems, protocols and applications, and also to apply the notion of quality of service (QoS)[10]. Power awareness can be considered as a concept within QoS and to implement it in QoS, feedback from affected devices is required. This master thesis will focus on creating a specification of an optimized flash memory management system where QoS can be implemented on a flash media, and on designing and implementing a prototype of a such a system.

1.2 Problem statement

In an embedded system it is important to minimize power consumption at every level in the design. However, this needs to be achieved without reducing the performance of the system in an unsatisfactory manner or disrupting any crash safety features.

Minimizing power consumption but still maintaining performance is an important part of QoS and this can be helped by making modules power-aware.

1.2.1 Flash memory management system

A flash memory management system involves many functions besides the obvious read and write. Due to the construction and properties of flash media, functions such as erase, garbage collection, error correction and wear leveling are also needed.

ENEA’s own operating system OSE¹ is using the JEFF² file system and a flash translation layer (FTL) to handle flash media. This complete flash management system needs to be optimized and previous studies at ENEA shows that the FTL

1Operating System Embedded

2Journaling Extensible File system Format

1

(14)

2 CHAPTER 1. INTRODUCTION is the bottleneck. The FTL is a complete flash media manager involving all the functions stated above. However, it is a general purpose system and unnecessary features are included in the software.

Flash memory management system design

Instead of analyzing and modifying the old FTL a new flash media manager is requested. Hence, other, preferably open source, solutions for flash memory man- agement require evaluation. What flash management systems are available and is there a solution fitting the requirements? As a result, a specification of how an optimized and power-aware flash memory management system should be constructed, will be acquired. How can power awareness and QoS be implemented on the flash memory management system?

In case of a need for an interface between a novel file system and the flash memory management system, optimization could be required. How can a potential interface between a file systems and an optimized flash memory management system be improved?

Flash memory management system implementation

A prototype of a flash memory management system is to be implemented in OSE but due to time restrains it is not necessary that the implementation and the optimized design are identical. What will a suitable implementation design look like to be able to be constructed within the limited time frame?

When the design is implemented, verification and validation of the design needs to be performed to confirm that the system can manage the basic requirements of a flash memory management system. Can the implemented design manage the basic requirements of a flash memory management system?

1.3 Method

The master thesis starts with an academic study involving research papers, manuals, technical reports and books in the area of flash memory management systems.

It also involves an evaluation of the different flash media management systems found during the study. It ends in design specification describing the requirements obtained during the academic study.

The master thesis also includes an implementation, which is supposed to illus- trate the ideas suggested in the design phase. A demonstration of the implementation and a report covering all the stages in this master thesis are also required.

(15)

1.4. DELIMITATIONS 3

1.4 Delimitations

The time limit for this master thesis is 20 weeks. The academic study and the evaluation of the results need to be completed within the first ten weeks. The design and implementation phases need to be completed within the remaining ten weeks together with the rapport and the presentation.

The foundation for the specification is only the data gathered during the academic study.

The implementation needs to end in a prototype with the basic features of a flash memory management system but not manage all the features fulfilled by the specification.

During development the OSE 5.4 operating system will be used together with a software kernel from ENEA.

(16)

(17)

Chapter 2

Challenge: Merge energy efficiency and low RAM usage

In an embedded system resources are limited and it is important to be restrictive when using them. But the limited resources are even more important to consider during the design of a system. For example, a goal to minimize power consumption usually comes on the expense of something else, e.g. RAM-usage. Therefore, a system needs to be able to find a good compromise between declared criteria in order to be considered optimized.

2.1 Power management

Power management involves monitoring the energy consumption and also changing it to meet performance and energy demands. Power management is usually split in two separate areas; static and dynamic power management.

Static methods of power management are predictions of how a system will function followed by design modifications to find the best compromise between performance and energy consumption. However, static power management is not always a sufficient solution. In a system with a very dynamic workload the static power management method sometimes needs to be adjusted to the worst case scenario, and is therefore only optimal when the system is fully utilized.

Dynamic methods analyze the system during run-time and adjust the performance to the present demand of resources. A branch in QoS focusing solely on the energy aspect and on dynamic power management was developed by Pillai et al. [17] and is called Energy-aware QoS (EQos). The Pillai et. al. EQoS method includes making the best use of limited stored energy and also varying the service level in each process to ensure that system runtime goals are met.

First of all it is preferred that a system saves as much energy as possible without affecting performance; this can be applied to both static and dynamic power management. It is also applicable in both hardware and software implementations.

Dynamic voltage scaling (DVS) of the CPU or GPU is, for example, a well known 5

(18)

6 CHAPTER 2. CHALLENGE: MERGE ENERGY EFFICIENCY AND LOW RAM USAGE hardware approach for dynamic power management which can save energy without affecting performance [17]. In software effective algorithms can be optimized for minimizing energy consumption.

At a certain level it is no longer possible to save energy without affecting the performance of the system. The question then is what is of most importance; performance or energy consumption. Sometimes it is necessary to degrade performance because the energy source is depleting. However, the opposite is just as plausible; If a very important task needs to be executed maximum performance can be requested by the system.

2.1.1 Flash memory characteristics

Because of more energy efficient processors, CPUs are no longer alone at the top of energy consumers in embedded systems. Storage devices are becoming bigger and faster and thus requiring more energy. Flash memories in particular have undergone a huge expansion in the last decade. Power-aware flash memories with dynamic power management properties are therefore requested.

Flash memory management systems have, however, special characteristics. Read, write and erase operations have different latency and requires different amount of energy [14].

Table 2.1 shows the latency and energy consumption of a typical NAND flash memory. It shows that write operations cost far more than read operations and erase operations cost even more in both latency and energy. This affects how power management on flash memories is carried out and prioritized. Not so much energy can be saved if the number of read operations are minimized but reduction of write and erase operations will show a great impact on energy consumption.

It is important to note that minimizing the number of read, write and ultimately erase operations in this report not mean that the amount of read and write requests to the flash volume is to be affected. The decrease of these operations instead aims at the internal read, write and erase operations due to copying overhead during flash memory management operations, a more extensive explanation can be found in Chapter 3.

Operation Latency Energy consumption Page read 47.2 µs 679 nJ Page write 533 µs 7.66 µJ

Block erase 3 ms 43.2 µJ

Table 2.1. NAND flash characteristics [14]

(19)

2.2. REQUIREMENTS 7

2.1.2 RAM usage in embedded systems

The amount of RAM as primary storage in devices has been increasing steadily since the day it was introduced and will probably continue to increase, but there is still a need to be restrictive with RAM usage. In embedded system low RAM usage is of extra importance due to low cost demands and space restrictions. The flash memory management system will therefore need to consider the amount of RAM used.

2.2 Requirements

The following requirements for the flash memory management system can be derived from this chapter.

REQ1: The design of the system shall be power-efficient.

REQ2: The system shall strive to be power-aware.

REQ3: The number of erase and write operations shall be minimized.

REQ4: The system shall use as little RAM as possible.

Power-efficiency has the highest priority of the four requirements stated above.

However, consideration still needs to be given to the other requirements to make sure that none of them reach exaggerative proportions.

(20)

(21)

Chapter 3

Flash memory management

The first of two main goals of the master thesis is to form a specification of how an optimized flash memory management scheme should be constructed. This chapter includes a brief introduction to flash memory for readers with no prior knowledge in the area, followed by an explanation of different flash memory management schemes available and ends with a discussion of the presented schemes and a design descrip- tion of the best solution.

3.1 An introduction to flash memory

The flash memory industry has exploded the last decade. Flash memory is now one of the top choices when it comes to storage media in embedded systems. The technology continues to improve and expand into new areas. The capacity nearly doubles every year and solid state disks (SSD) are beginning to become a serious competitor to regular hard drives [20].

3.1.1 Functionality of a flash memory

A flash memory is a non-volatile memory and a specific group in the EEPROM family. Non-volatile memory has the advantage over volatile memory such as DRAM and SRAM by being able to hold stored data without supply of power. Because of a flash memory’s low power consumption, shock resistance and small size, it has many advantages compared to other non-volatile memory such as regular hard- drives. Flash memory can also access data randomly, unlike hard-drives which suffer from seek time because of their sequential data access properties [19, 4, 16].

Handling a flash drive requires three different main operators, read, write and erase, in contrast to hard-drives which only require read and write. The erase operator is needed because overwrites is not possible on flash. Instead of overwrites, updated data is written to a new location in the flash memory and the old data is marked as invalid or dead. This is also known as out-of-place updates. Over time the amount of invalid data in memory increases and will, if not handled correctly,

9

(22)

10 CHAPTER 3. FLASH MEMORY MANAGEMENT fill up the whole flash memory. To reclaim an area occupied by invalid data the erase operator is used. Once a memory area has been erased, it is free to be used again by the system [4].

Flash memory also suffers from another disadvantage. A flash cell can only handle a limited amount of erase cycles before it becomes unreliable or faulty. This hardware problem can be addressed in software. The solution is to use the whole memory area as evenly as possible and make sure that the different sections of flash memory area are erased approximately the same number of times. This method avoids a situation where one flash area is worn out before all the others. This whole process of evening out the load on the flash memory is called wear leveling [19].

3.1.2 Flash memory types

There are typically two main types of flash memory: NAND and NOR flash. They are both built with floating gate transistors but the two types differ in the layout of these transistors. This report will focus on NAND flash because it is most wildly used in embedded systems today [14, 4]. A brief explanation of the two will follow but the rest of the report is based on the NAND flash memory. Some of the content can, however, be applied to both.

NOR flash

The NOR flash memory gets its name from the resemblance of a logical NOR gate.

The floating gates are connected in parallel just as a NOR gate which makes it possible to access them individually. This gives NOR flash fast random access speed but is instead more costly and is less dense in its architecture compared to NAND. The possibility to access each cell individually makes NOR flash ideal for eXecution In Place (XIP) where programs can be executed directly on the flash without first being copied to RAM.

Devices today usually have a small NOR flash to boot from because of its short read latency and its XIP abilities but use a NAND flash memory for storing data [4].

NAND flash in general

The NAND flash has the advantages of being smaller and cheaper to manufacture compared to NOR, it also has faster write and erase cycles. The drawback with NAND flash is that it can only be accessed in sizes of a page where a page typically contains 512 bytes. This makes NAND flash unsuitable for XIP and more suited for storage [4].

Figure 3.1 shows the architecture of a NAND flash memory. The memory is arranged in blocks with a typical size of 16 KB where each block consists of 32 pages. The read and write operators can act on page basis but erase operation can only be done on a whole block [4].

(23)

3.1. AN INTRODUCTION TO FLASH MEMORY 11 This creates another difficulty when working with NAND flash. If the block that is to be erased still contains pages with valid data these have to be copied to another block before the erase can take place. The copying of valid data and the erase procedure is called garbage collection.

Also seen in Figure 3.1 is the layout of a page and its dedicated spare area. The spare data area is reserved for flash management metadata such as logical block address and erase-count although it is up to the designer of the flash management system to use it as he see fit. For pages with a size of 512 bytes the spare area will be 16 bytes. The page and the spare area can be written/read independently but as they exist in the same block they are removed together on erase [14].

Page 1 Page 2 Page 3 Page 4

Page 32

User data Spare data

Flash pages in block Flash blocks in

memory

Figure 3.1. NAND flash architecture

NAND flash development

NAND flash is the type of flash memory which has been subject to most research and development in recent years. It is also the type mostly used in embedded systems. A few years ago a new type of NAND flash memory was developed which changed the conditions for flash management system completely. In older NAND flash memories it was possible to write to a page two or three times before an erase needed to take place. It was also possible to write to arbitrary pages within a block [15].

With the new NAND flash hardware only one write is allowed before an erase needs to take place and arbitrary writes is no longer possible. Instead pages in a block needs to be written sequentially from the first page to the last. During the same time as the release of the new NAND flash hardware a bigger block size was also introduced [15]. 2048 bytes per block became the regular size for the new

(24)

12 CHAPTER 3. FLASH MEMORY MANAGEMENT NAND flash while older usually have a block size of 512 bytes, although both sizes can appear in either NAND flash hardware version.

3.1.3 Wear leveling

Due to the fact that the flash blocks only can sustain a limited amount of erase cycles, typically 10⁵ for NOR and 10⁶ for NAND [20], the problem with potentially worn out blocks needs to be addressed. This is where wear leveling comes in. Wear leveling is the technique used to distribute block erases evenly across the whole flash volume [4].

First of all the flash memory utilization needs to be even, but that is not enough.

Some data might be static and never updated or deleted. The blocks containing this type of information will always hold data which is valid thus will never be erased. The wear leveling mechanism could instead move the static data to a block which have had many erase cycles in order to even out the wear. The wear leveling mechanism can be included in the software of the flash memory management system but there are many different ways to deal with the problem [20].

3.1.4 Garbage collection

Garbage collection is the process of reclaiming invalidated memory. When data is modified and updated the file system will allocate new pages for the updated file and the old pages will be invalidated or marked as dead. Without garbage collection the flash memory would become full with invalid data and no more free space would be available.

The garbage collector reclaims the invalidated pages and makes them free again by erasing blocks containing dead pages. After an erase the block is free to be used for new data. However, it not certain that all pages in a block are invalid when an erase needs to take place. The valid data then first needs to be copied to another block with free space before the erase can be executed [19].

Garbage collection can be triggered either when the amount of free space has reached a certain threshold value or it can run in background when the system is idle. It is important to note that garbage collection needs to be requested before the flash memory is fully utilized. A full memory has no free blocks for copies of potential valid data which means that the system would deadlock itself. Therefore, a few free blocks are always left for garbage collection purposes.

The effectiveness of a garbage collector is very much depending on how data is allocated in the flash memory. Different methods of data allocation are discussed further in the Sections 3.4 and 3.5.

3.1.5 Requirements

Because NAND flash memory is mostly used in embedded systems today the design of the flash memory management system shall focus on NAND flash. It is also important that the flash memory management system can handle the requirements

(25)

3.2. OVERVIEW OF FLASH MEMORY MANAGEMENT SYSTEMS 13 that the new type of flash memory introduces. These requirements are therefore introduced.

REQ5: The design of the system shall focus on the NAND flash memory type.

REQ6: The system shall write pages in a block sequentially starting from the first page.

REQ7: A page shall only be written once before it is erased.

3.2 Overview of flash memory management systems

Basically there are two different ways to access a flash memory, see Figure 3.2. The first way is to use a traditional file system like ext3 or FAT on top of a flash translation layer. The translation layer maps the logical addresses addressed by the file system to the actual physical address on the flash. Besides supplying the translation between logical to physical addresses the flash translation layer also provides garbage collection and wear-leveling. The translation layer can also emulate the flash memory as a block device so that traditional file systems such as FAT and ext3 can work against flash just as it was a normal hard drive [4].

The other option is to have a file system specifically developed for flash; two such examples are JFFS or YAFFS. With a flash dedicated file system there is no need for a logical to physical address translation table, instead the file system keeps track of the locations of the pages belonging to each file. Garbage collection and wear leveling is in this case a part of the flash file system [4].

To be able to control a flash memory a Memory Technology Device (MTD) driver is needed. This layer supports the primitive flash memory operations, such as read, write and erase [20]. The MTD is located on top of the flash drive and interfaces with either the flash translation layer or the flash file system.

Virtual File System

File system (ext3 , FAT... etc)

Flash Translation Layer (FTL)

Memoy Technology Device (MTD) drivers JFFS YAFFS

Flash Memory

Figure 3.2. Flash memory management system architecture

(26)

14 CHAPTER 3. FLASH MEMORY MANAGEMENT

3.3 Current setup

As of today ENEA uses the JEFF file system on top of a flash translation layer when working with flash memory.

3.3.1 JEFF - Journaling Extensible File system Format

JEFF is developed by ENEA and is the file system currently used in ENEA’s operating system OSE. JEFF is a journaling file system, i.e. changes to the file system are first written to a journal before changes are made to the actual data. This is done to ensure the consistency of the file system in event of a crash. If a crash occurs during a transaction and data is corrupted the system can restore itself to the previous state by reading the journal. This means that transactions are either performed completely or not at all, i.e. transactions are atomic in JEFF [7].

JEFF is designed to run on block devices but has the ability to adjust the block size to suit the layer it is operating above; common block sizes are 512, 1024 and 2048 bytes.

3.3.2 Current FTL

The current FTL emulates itself as a block device and allows JEFF to access it as if it was a hard-drive. The FTL includes all the required features of a flash management system such as garbage collection and wear leveling [7]. However, it is a general purpose system and unnecessary features are included in the software.

3.4 Flash file systems

This section will take a closer look at a few available flash file systems and try to pinpoint the important elements in them. It seems reasonable to start from the very beginning.

3.4.1 JFFS - Journaling Flash File System

JFFS was the first file system designed specifically for flash media. JFFS was designed for NOR flash and is a log-structured file system, this means that it writes data sequentially on the flash chip [18]. It was developed in 1999 by the Swedish company Axis Communications and can be seen as the predecessor for all dedicated flash file systems of today.

The basic idea of JFFS is that it uses a circular log. It starts to write data in the head of the log which at the first entry will be at the beginning of the flash area.

JFFS then continues to write data sequentially in the tail of the log and invalidates old data on the way. This works fine until the flash runs out of free space and the tail of the log is about to reach the end of the flash.

Figure 3.3 shows what happens when free space in JFFS is running low and the garbage collector starts to act. When the amount of free space reaches a certain

(27)

3.4. FLASH FILE SYSTEMS 15 level, Figure 3.3(a), JFFS garbage collection is initiated. It simply starts at the head of the log and checks if the data is valid or invalid. Valid data is copied to the tail and invalid data is ignored, Figure 3.3(b). This process is continued until all valid data in an erase block has been copied to the tail. The erase block is then erased and is clean and available to use for new data, Figure 3.3(c).

(a) Memory

nearly full (b) Copying valid data

(c) After garbage collection

Valid Invalid Free

Figure 3.3. JFFS garbage collection

The problem with this method is that the garbage collector works sequentially and will clean blocks even if they only contain valid data. However, this method will provide perfect wear leveling because every block is cleaned exactly the same amount of times, but on the other hand are blocks cleaned when cleaning is unnecessary.

At mount time, the whole flash memory is scanned and the location of all nodes¹ in the memory are stored in RAM. File reads can then be performed immediately without extra computation by looking at the data structures held in RAM and then reading the corresponding location on the medium. The drawback of this method is that it is very memory consuming.

The problem with garbage collection and some other issues made it obvious that JFFS needed to be improved. The second version of the flash file system is called JFFS2. It was developed by David Woodhouse and is in many ways very similar to JFFS. Two of the differences are that JFFS2 has limited support for NAND flash alongside with the NOR support and an improved garbage collector.

JFFS2 separates files in three different lists. The clean list holds blocks with valid data, the dirty list contains blocks with at least one obsolete page and the free list contains erased blocks. Instead of selecting all the blocks sequentially as in JFFS, the garbage collector in JFFS2 takes a block from the dirty list when

1Building blocks of the file system, both metadata and data are stored in nodes

(28)

16 CHAPTER 3. FLASH MEMORY MANAGEMENT garbage collection is requested. This method leads to a wear leveling problem.

Blocks containing static data are never updated and will never be erased whilst other blocks will be erased constantly. This problem is addressed by letting the garbage collector select a block from the clean list once every hundred time.

JFFS2 still scans the whole flash memory at mount to index all the valid nodes.

Thus the RAM footprint and mount time increases linearly with the amount of data stored on the flash memory [16]. JFFS2 was first designed for small flash devices and this issue becomes obvious with flash sizes over 128 Mbytes [3].

Both JFFS and JFFS2 are released under the general purpose license (GPL).

3.4.2 YAFFS - Yet Another Flash File System

YAFFS was written by Aleph One specifically for NAND flash file systems. It was developed because it was concluded that JFFS and its successor was not suitable for NAND devices [9].

YAFFS was the first flash file system to fully utilize the spare area in each page.

Just as JFFS it is a log-structured file system. By the time of developed the type of flash devices available supported writes to arbitrary pages within a block. It was also possible to write to a page two or three times before an erase was needed. This was used when a file had been updated and old pages needed to be invalidated. A bit in the spare area of the affected pages was rewritten and set to 0 to show that they were invalid.

YAFFS uses a tree structure which provides the mechanism to find all the pages belonging to a particular file [15]. The tree holds nodes containing 2-byte pointers to physical addresses. The 2-byte data is, however, not enough to individually map each page on a larger flash memory. For that reason, YAFFS uses approximate pointers which instead point to a group of pages. The pages themselves are self describing and that makes it possible to search each page in the group individually to find the right page. In this way the RAM-footprint is smaller compared to JFFS2 [9].

At mount only the spare area, containing the file ID and page number, of each page needs to be scanned. The result is faster mount compared to JFFS2 but the mount time will still increase linearly with the size of the flash memory [16].

No particular wear leveling function is used in YAFFS. When blocks are allocated for storage they are chosen sequentially so no block will repeatedly be left unused. On the other hand, no consideration is taken to blocks which are allocated with static data [1]. The author argues that wear leveling is not as important for NAND flash because the system needs to address bad blocks² anyway. So even if uneven wear will lead to a few more bad blocks the file system will still continue to work properly [9].

Garbage collection in YAFFS has two modes: passive and aggressive. Passive garbage collection only cleans blocks which have a big majority of invalidated data

2Bad blocks are described in Subsection 3.6.1

(29)

3.4. FLASH FILE SYSTEMS 17 and is active when there is a lot of free blocks available. The Aggressive garbage collection is activated when the amount of free space starts to run out. It will then clean more blocks even if there are many valid pages in them. [15]

A few years after YAFFS was introduced, the flash memory hardware changed;

it was no longer possible to write to a page more than just once and writes to pages within a block had to be made sequentially. This meant that the method of invalidating pages was not allowed any more. This was why YAFFS2 was developed.

Instead of invalidating pages a sequence number was added in the spare area to make it possible to determine which page that is still valid when a file was updated. Each time a new block is allocated the sequence number is incremented and each page in that block is marked with that number. The sequence shows the chronological order of events, thus making it possible to restore the file system. [15]

Just as with JFFS both YAFFS and YAFFS2 are released under GPL and have its source code available on the internet. YAFFS has been a very popular file system among computer scientists and many new file systems are based on YAFFS. One of them is the Core Flash File System (CFFS) explained in the next subsection.

3.4.3 CFFS - Core Flash File System

CFFS is based on YAFFS and the fundamental structure is the same but some improvements have been made.

The blocks in CFFS have three classifications; inode-stored blocks, data blocks and free blocks. The inode-stored blocks contain the locations of all data in the memory. This means that only the inode-stored blocks need to be scanned at mount time. To be able to locate the inode-stored blocks at mount their locations are written to an Inodemapblock at unmount and the Inodemapblock is always the first physical block in the flash memory. The method of saving a snapshot of the data structure on the flash memory at unmount is usually called checkpointing [16].

The separation of inode-stored blocks and data blocks in CFFS has another advantage besides the faster mount; it also improves the effectiveness of the garbage collector. Metadata is updated more often than regular data, e.g., renaming, moving and changing attributes of a file will only change the metadata and not the regular data. By separating metadata and data in different blocks the probability that all the pages in a block will be invalidated around the same time increases. This will decrease the copying overhead in the garbage collector and in that way save both energy and time. The separation of data according to their update frequencies is called hot-cold separation or data clustering [16].

The separation between metadata blocks and data blocks, however, creates a wear leveling issue. Because of the fact that metadata is updated more often, the metadata blocks will be erased more frequently than the data blocks. This is solved by using a weight value in each block. If the block was an inode-stored block last time it will be allocated as a data block next time it is erased, thus solving that wear leveling problem.

(30)

3.4.4 MODA - MODification Aware

The MODA scheme is not a complete file system; it is only a modification in the page allocation scheme in YAFFS. The MODA page allocator is a further development of the one used in CFFS. It separates not only metadata and userdata, it also distinguishes between different update frequencies of userdata [2].

The MODA scheme uses a queue to classify how often a file is modified. The file stays in the queue for a specific amount of time and its classification depends on how many times the file is modified during this period. Figure 3.4 shows an overview of the separation.

When a page is allocated to a specific area it will stay there during its life time even if its update frequency changes. The garbage collector mechanism used in MODA operates in each area independently to avoid mixture of pages between blocks with different classifications [2].

Data

Meta data User data

Hot-modified user data

Cold-modified user data Unclassified

user data Level0

Level1

Level2

Figure 3.4. The MODA scheme classification

3.4.5 Summary and discussion

This subsection will serve as a summary of this section and will also include a discussion of the six flash management schemes presented above. Table 3.1 shows an overview of the most significant differences of the presented flash management schemes.

As a part of the discussion, this subsection will also reference to the requirements established earlier in the report. A complete table of these requirements can be found in Appendix B.

Individual flash management scheme evaluation

According to REQ5, JFFS can be ruled out as an optimized solution because it is developed for NOR flash and not NAND flash. JFFS2 has, on the other hand, a limited support for NAND flash but its functionality is surpassed by YAFFS.

However, JFFS2 is the only file system which has a wear leveling function which handles uneven wear caused by static data.

Although YAFFS/YAFFS2 has no wear leveling function they are preferred in front of JFFS2 because they have a better garbage collector, also used in CFFS and

(31)

3.4. FLASH FILE SYSTEMS 19 FFS MT GC policy Wear leveling Data clustering JFFS slower Collects each

block sequentially regardless of the contents of the block

Not needed because of the garbage collector’s behavior

No clustering

JFFS2 slower Selects random block with at least one obsolete page

Once every hundred time a clean block is chosen for garbage collection

No clustering

YAFFS

YAFFS2 slow Random selection within the bound- aries of the passive and aggressive modes.

No wear leveling No clustering

CFFS fast Same as YAFFS Weight value;

metadata block last time -> data block next time

Separates metadata and user data

MODA slow Same as YAFFS No wear leveling Separates metadata and userdata and uses hot-cold separation of userdata.

Table 3.1. Overview of the different flash file system properties.

FFS = Flash File system, MT = Mount time, GC = Garbage Collector

MODA, which considers the amount of valid data left in a block and not only that the block has at least one obsolete page. Nevertheless, YAFFS needs to be ruled out in favor of YAFFS2 because the older version fails to meet REQ6 and REQ7.

When the data clustering policy is considered there are only two options available; separation of metadata and userdata as in CFFS or as in MODA, where the userdata is also separated in hot and cold areas.

When considering REQ1 and REQ3 it seems that the MODA allocation scheme is the better solution because it is an improvement of the one used in CFFS. How- ever, CFFS has the benefit of using checkpointing and thus has the best mount time of all schemes presented.

REQ2, which is the requirement concerning the introduction of power awareness, has not been applied by any of the flash management schemes and can therefore not be used to facilitate the choice of the best scheme.

(32)

Combination of the best flash management scheme features

Although an optimized solution cannot be found when looking at these schemes individually a combination of them can lead to a good result.

The foundation of the optimized solution will therefore be the CFFS scheme because of its fast mount properties. The data clustering scheme is, however, changed to the MODA variant. The wear leveling scheme can also be combined with the one used in JFFS2 to be able to handle wear cased by static data.

An optimized solution will, according the above stated arguments, look like Table 3.2.

FFS MT GC policy Wear leveling Data clustering Optimized fast Random selection

within the bound- aries of the passive and aggressive modes.

Metadata block last time -> data block next time and once every hundred time a clean block is chosen for garbage collection.

Separates metadata and userdata and uses hot-cold separation of userdata.

Table 3.2. Optimal flash file system.

FFS = Flash File system, MT = Mount time, GC = Garbage Collector

3.5 Flash Translation Layers

Just as there are many different flash memory file systems available there are also many different flash translation layers (FTL). This report will explain and discuss a few of the most significant ones. As explained in Section 3.2 the main function of the FTL is to translate the logical block addresses to a physical block addresses.

There are two major alternatives adopted for the translation table; page-level mapping seen in Figure 3.5(a) and block-level mapping depicted in Figure 3.5(b)[11, 19]. The page-level mapping translation scheme maps each logical sector number to each physical page number. The mapping table will therefore have one entry for each page on the flash memory. The block-level mapped translation scheme splits the logical sector number into a logical block number and a page offset instead. The data stored in the mapping table for the block-mapping technique is only the logical to physical block numbers. This means that the block-level mapping approach needs extra operations to translate the logical block numbers and page offset to a physical address but consequently it consumes far less RAM [4, 11].

Being restrictive with RAM usage is very important especially in products developed for mass production and choosing the right mapping table can make a huge difference. For example a 4 Gbyte NAND flash with the large block size of 2048

(33)

3.5. FLASH TRANSLATION LAYERS 21 Kbyte requires 8 Mbytes of RAM for maintaining the page-level mapping table while the block-level mapping table only requires 128 Kbyte [11]. For this reason, varieties of the block-level mapping translation schemes are mostly used today.

Logical sector number

Page-level mapping table

Physical page number

Flash block Physical block number

Logical sector number

Block-level mapping table

Physical block number

Locical page number

Flash block

(a) Page-mapped FTL (b) Block-mapped FTL

Logical block number

Figure 3.5. Page- and block-address translation

Most of the resent flash translation layers have variations of a scheme using log blocks. They usually have an overall block-mapping scheme but introduces page- level management for a few blocks [12]. A few schemes using log blocks and one using another method are explained further in the following subsections.

3.5.1 BAST FTL - Block-Associative Sector Translation FTL

The Block-Associative Sector Translation (BAST) scheme is a translation layer developed by Kim et al. It manages the majority of the blocks at block-level but a number of blocks are managed at the finer page-level. The former blocks are referred to as data blocks and hold ordinary data, the latter are called log blocks and are used for temporary storage for small writes to data blocks [12].

When a page in a data block is updated, a log block is allocated from a pool with available free blocks. Because of the out-of-place update characteristics of a flash medium the update is written to the log block instead where the writes are performed incrementally from the first page and onward. A log block is dedicated to a specific data block and if a page in another block needs to be updated a new log block is allocated. The updates can be carried out until the log block is full, when this happens a merge operation takes place.

In the merge operation, see Figure 3.6(a), a new free block is allocated and the valid data is copied from the data block and the log block to the free block. Note that the internal page locations are kept intact so that the page offset in the block- level mapping does not need to change. The free block becomes the new data block and the other two blocks can be erased [12].

During special circumstances the merge operation can be replaced with a switch operation, see Figure 3.6(b). This happens when all the pages are updated sequentially. No new free block is then required, the log block can instead directly be turned into a data block and the old data block can be erased. This is an ideal

(34)

22 CHAPTER 3. FLASH MEMORY MANAGEMENT situation and saves a lot of energy because no copying overhead is needed and one erase operation is saved [12].

1 2 3 4 5 6

1 5 5 2 2 2

Data block Free block Log block

Free block Data block Free block

(a) Merge (b) Switch

1 2 3 4 5 6

Data block

Free block

1 2 3 4 5 6

Log block

Data block

1 2 3 4 5 6

Data block

Free block

1 2 3 4 5 6

Log block

Data block

Figure 3.6. Merge and switch operations

During both the merge and switch operation the data is moved to another block, thus, the mapping information needs to be updated. According to Kim et al.

previous schemes have had reverse physical to logical mapping information stored in the spare area of each page. This requires scanning of the whole flash at mount time to be able to locate all mapping information. Instead Kim et al. purposes a mapping table stored in dedicated blocks called map blocks to enable faster mount.

At mount only the map blocks are scanned and a map over the map blocks, the map directory, is stored in RAM. When a page is updated both the corresponding map block and the map directory needs to be updated. However, this ensures the consistency of mapping table even at an unexpected power failure and thus simplifying the recovery.

3.5.2 AFTL - Adaptive FTL

Another variant to the log block scheme called Adaptive FTL is purposed by Wu and Kou. AFTL uses a combination of a block-level mapping scheme and a page- level mapping scheme. Two hash tables are held in RAM, one for the each mapping table but the page-level mapped table has a limited amount of slots available. Wu and Kou uses a log block when pages are updated but instead of using the merge operation, described in Figure 3.6(a), when the log block is full AFTL leaves the log block intact and stores its valid data in the page-level hash table. The argument is that this data can be considered as hot data and is more likely to be accessed frequently thus requires a page-level mapping [19].

There are only a finite number of page-level mapping slots available and which pages that staying in the hash are handled by a linked list using the Least Recently Used (LRU) policy [19]. When the list is full a new entry will force old pages to be but back in the block-level mapped hash table. The pages are then returned to a block and inserted in their original position to make sure that the page offset still is pointing to the right data.

(35)

3.5. FLASH TRANSLATION LAYERS 23

3.5.3 FAST FTL - Fully-Associative Sector Translation FTL

The Fully-Associative Sector Translation (FAST) scheme is developed by Lee et al.

and it is built on the BAST scheme, but it introduces two important differences.

The first is that FAST adopts fully-associative address-mapping. The second is that FAST uses two different kinds of log blocks; one kind for sequential writes and another for random writes [13].

Fully-associative address-mapping means that a log block no longer is associ- ated with a particular data block. Instead updates from many data blocks can be stored in the same log block. This method actually introduces the need for the second difference. As shown in Figure 3.6 a switch operation is superior to a merge operation because it needs no copying overhead and only one erase operation. But with the fully-associative method its highly unlikely that a merge operation ever occurs. This is where the sequential log block make its contribution.

When a write is taking place the system first checks if the page which is being updated is the first page in a data block i.e. logicalsectornumber mod numberofpa- gesinablock = 0. If it is the first page it is put first in the sequential writes (SW) log block. The data (if any) already in the SW log block is merged with its data block before the insertion. If the following updates are coming sequentially they will continue to fill the SW log block and when it is full a switch operation can take place. However, if the data is not written sequentially the data is added to the SW log block anyway but only a merge operation can be applied when it is full or when another first block needs to be inserted.

The random write (RW) log blocks are used when a sequence of writes does not start with a page from the first position in a block. A switch operation can never occur for a RW log block, only merge operations. The merge for RW log blocks is, however, a bit different from the merge shown in Figure 3.6(a). It still involves one log block but can involve many data blocks due to the fully-associative address-mapping. A comparison between BAST and FAST has been done by Lee et al. and it shows that FAST can reduce the erase count by 10-50% depending on test case and is in a worst case at the same level as BAST [13].

3.5.4 FTL/FC - FTL/Fast Cleaning

FTL/Fast Cleaning (FTL/FC) is a translation layer developed to speed up cleaning time for larger flash memories. FTL/FC is not a log block based translation layer, instead it uses a data placement policy called Dynamic dAta Clustering (DAC).

DAC is a flash memory management scheme for logical partitioning of storage space. The idea is to cluster data with similar update frequencies together. Fig- ure 3.7 shows how DAC operates. Data which is updated frequently will be moved upwards towards the top region and be considered as hot data, not so frequently updated data will instead end up in the bottom regions as cold data [4].

A new page of data is first written to the bottom block. A promotion to an upper region can only happen if the page is updated and it is not older than a

(36)

24 CHAPTER 3. FLASH MEMORY MANAGEMENT predefined “young-time”. If the page is updated after the “young-time” deadline it will stay in the same region. Demotions to a lower level happen when blocks are selected for cleaning. Pages in the selected block that are still valid and older than a predefined “old-time” will be demoted to the previous level, younger pages are written back to the current region [5].

Region 2 ... Region n

Top Region 1

Bottom

Too old

Too old Too old Too old

Updated

& young

Updated

& young

Updated

& young

Updated

& young

Figure 3.7. DAC regions translation

Each region includes multiple LRU lists; there is one list for every number of invalid pages a block can have, i.e., if the block layout is 32 pages there will be 32 LRU lists in each region. There will also be a separate cleaning list shared with all regions holding the blocks with no valid data [4].

In FTL/FC DAC is set to partition the memory in three different regions, hot, neutral and cold.

The cleaning policy used in FTL/FC makes use of the multiple LRU lists in each region. First of all the blocks in the cleaning list are selected for cleaning. If the cleaning list is empty the cost benefit policy, see Subsection 3.6.3 for details, is used. Instead of having to search through all the blocks to find the optimal one for cleaning only the first block in each LRU list needs to be searched. The other blocks can be ignored because the cost-benefit policy wants the oldest block and that will always be the first block in the list [4].

3.5.5 MFTL - Metadata FTL

Wu et al. purposes a file system aware FTL, named MFTL, with the ability to separate metadata and userdata. Wu et al. argues that metadata is accessed more often than userdata and metadata also usually consist of very small files. Small files does not use a considerable amount of memory and therefore can a page- level mapping scheme be used for the metadata partition without too much RAM overhead. The userdata, however, can still be handled on block-level [20].

Writes in the page-level mapped area are performed sequentially in a logging- fashion.

3.5.6 Summary and discussion

This subsection will summarize the current section and also discuss the benefits and drawbacks of the five translation layers presented with the goal of choosing an optimized solution. References to the requirements will be made during the discussion, a table with all requirements can be found in Appendix B.

(37)

3.5. FLASH TRANSLATION LAYERS 25

Individual evaluation

In the AFTL scheme the switching operations between the hash tables creates copying overhead. Performance gains like faster access to more frequently accessed data, however, makes up for this overhead. The problem with AFTL is that when pages are switched from the page-level to the block-level mapped hash table writes are performed to a specific page in the designated block. This is required to make sure that the page offset still points to the right data but it also means that AFTL does not meet the sequential write demand of REQ6.

Another consideration that needs to be taken into account is what is most important; energy saving or memory consumption. The only FTL of the ones discussed here able to handle hot and cold separation is FTL/FC using the DAC technique.

However, the DAC technique does not keep inter-block locations of pages intact.

Therefore, FTL/FC cannot use a block-level mapping scheme and have to use the much more memory-consuming page-level mapping scheme instead, and thus going against REQ4. This might be acceptable in systems with a lot of physical memory but for embedded systems where RAM usage is crucial it is not a suitable solution.

When considering the two translation schemes BAST and FAST, the FAST translation layer is, as mentioned above, an improvement of the BAST scheme and is therefore a more suitable choice.

The last translation layer, MFTL, is not really a competitor to the others, it is more a compliment. Separation between metadata and user data can be implemented in any of the other FTL:s. Here ENEA has an advantage compared to the developers of MFTL, ENEA also controls the file system JEFF. The filtering technique used in MFTL is therefore not even necessary. It is already possible for JEFF to inform the translation layer whether the sent data is metadata or userdata.

When REQ2, with its strive towards power-awareness, is considered it can be concluded that none of the described translation layers apply any functionalities to support power-awareness.

Optimized solution

The optimized solution would, because of the above stated arguments, be the FAST FTL on the block-level mapped userdata in MFTL and make use of JEFF so that the translation layer is aware of if incoming data is userdata or metadata.

This solution will meet requirements; REQ1, REQ3, REQ4, REQ5, REQ6 and REQ7. REQ2 can also be considered to be met because a strive towards power awareness has taken place although it could not be applied in any of the presented translation layers.

The discussion in Subsection 3.5.6 is summarized in Table 3.3.

(38)

26 CHAPTER 3. FLASH MEMORY MANAGEMENT FTL Opinion derived from discussion

AFTL Cannot handle new flash memories with sequential write requirement FTL/FC Uses too much RAM for use in embedded systems

BAST Surpassed by FAST

FAST Optimal flash transition layer for block-level mapping MFTL Preferable solution in collaboration with JEFF and FAST

Table 3.3. Evaluation summary of discussed translation layers

3.6 Other flash memory related functionalities

This section describes functionalities which are embedded in a flash memory management scheme but are not fully elaborated in this study.

3.6.1 Bad block management

NAND flash are designed to get high density for a low cost and a perfect flash memory is not guaranteed in production. Usually new flash memory has a few bad blocks which are unusable and a few more are expected to go bad during its lifetime [15]. Bad block management is a feature which makes bad blocks invisible to the system. It can be done either in hardware or software but every good flash memory management system needs a bad block manager.

3.6.2 ECC - Error correction code

Just as NAND flash requires bad block management, it also needs error correction code to handle frequent bit errors. ECC does not need to ask a sender if the received message was correct to detect an error; it is capable of detecting a specific amount of errors within a certain quantity of data by itself. The ECC feature can be implemented in hardware or a separate software application or in the file systems and translation layers themselves.

3.6.3 Cleaning policies

This subsection describes different cleaning policies used by a garbage collector in either a flash file system or a flash translation layer.

One of the simplest cleaning policies is the greedy policy; it always selects the block with the most amount of invalid data. The goal is to reclaim as much free space as possible in each garbage collection [4]. The greedy policy has been proven to be efficient when data is accessed uniformly. However, if some data is updated more frequently than other, also known as hot data, it would be preferred that this data is not copied because it will soon be invalidated anyway. The greedy policy does not consider this and can therefore not avoid copying of hot data [14].

Power-Eﬃcient Design of an Embedded Flash Memory Management System

Power-Efficient Design of an Embedded Flash Memory Management System

Abstract

Sammanfattning

Contents

List of Figures

List of Tables

Abbreviations

Chapter 1

Introduction

1.1 Background

1.2 Problem statement

1.3 Method

1.4 Delimitations

Chapter 2

Challenge: Merge energy efficiency and low RAM usage

2.1 Power management

2.2 Requirements

Chapter 3

Flash memory management

3.1 An introduction to flash memory

3.2 Overview of flash memory management systems

3.3 Current setup

3.4 Flash file systems

3.5 Flash Translation Layers

3.6 Other flash memory related functionalities