Developing Random Compaction Strategy for Apache Cassandra database and Evaluating performance of the Strategy

(1)

Developing Random Compaction

Strategy for Apache Cassandra

database and Evaluating performance

of the Strategy

Roop Sai Surampudi

Faculty of Engineering, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

(2)

Contact Information:

Author(s):

Roop Sai Surampudi

E-mail: surp17@student.bth.se

External advisor:

Per Ötterstrom Ericsson

Karlskrona, Sweden

University advisor:

Siamak Khatibi

Department of Aesthetics and Technology

Faculty of Engineering Internet : www.bth.se

Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

(3)

monitor and manage this data generation efficiently. Apache Cassandra is a NoSQL database which manages any formatted data and a massive amount of data flow efficiently.

Aim: This project is focused on developing a new random compaction strategy and evaluating this random compaction strategy’s performance. In this study, limitations of generic compaction strategies Size Tiered Compaction Strategy and Leveled Compaction Strategy will be investigated. A new random compaction strategy will be developed to address the limitations of the generic Compaction Strategies. Im- portant performance metrics required for evaluation of the strategy will be studied.

Method: In this study, a grey literature review is done to understand the working of Apache Cassandra, diﬀerent compaction strategies’ APIs. A random compaction strategy is developed in two phases of development. A testing environment is created consisting of a 4-node cluster and a simulator. Evaluated the performance by stress-testing the cluster using diﬀerent workloads.

Results: A stable RCS artefact is developed. This artefact also includes support of generating random threshold from any user-deﬁned distribution. Currently, only Uniform, Geometric, and Poisson distributions are supported. The RCS-Uniform’s performance is found to be better than both STCS and LCS. The RCS-Poisson’s performance is found to be not better than both STCS and LCS. The RCS-Geometric’s performance is found to be better than STCS.

Keywords: Apache Cassandra, Compaction, Random Probability Distributions, IBM Cloud, NoSQL databases

(4)

I express my sincere gratitude to Per Otterström for supervising me from the in- dustry side.

I would like to thank all my friends, parents and my beloved ones for their con- tinuous support.

ii

(5)

Acknowledgments ii

1 Introduction 1

1.1 Problem Statement and Motivation . . . 1

1.2 Aim . . . 2

1.3 Objectives . . . 2

1.4 Research Questions . . . 2

1.5 Document Outline . . . 3

2 Background 5 2.1 Apache Cassandra . . . 5

2.2 Architecture of Cassandra . . . 6

2.2.1 Peer-to-Peer Architecture . . . 6

2.3 Generic Compaction Strategies . . . 7

2.4 Key Components of Apache Cassandra Database . . . 9

2.5 Write Path . . . 10

2.6 Read Path . . . 11

2.7 Diﬀerent Tools Used in this project . . . 12

2.7.1 CQL & CQLSH . . . 12

2.7.2 Default Cassandra-Stress Tool . . . 13

2.7.3 Node Exporter . . . 13

2.7.4 Cassandra Metrics Exporter . . . 13

2.7.5 Prometheus & Grafana . . . 13

2.7.6 IBM Cloud . . . 13

3 Related Work 14 4 Method 16 4.1 Literature Review . . . 16

4.1.1 Search Strategy . . . 17

4.1.2 Digital Libraries . . . 17

4.1.3 Inclusion-Exclusion Criteria . . . 17

4.2 Development of new Compaction Strategy . . . 18

4.2.1 Phase 1 of Development . . . 18

4.2.2 Phase 2 of Development . . . 18

4.3 Testing the Developed Strategy . . . 19

4.3.1 Test Environment Setup . . . 19

4.3.2 Operating the Cassandra Cluster . . . 21

iii

(6)

5.1.2 Phase 2 of development . . . 27

5.2 RQ2 - Evaluating the performance of Compaction Strategies under diﬀerent workloads . . . 29

5.2.1 Cassandra Metrics - Live SSTables & SSTables Per Read . . . 29

5.2.2 Operating System Metrics - CPU Utilization & Disk Space Utilization . . . 40

5.2.3 Cassandra Performance Metric - Operation Latency . . . 48

6 Discussion 50 6.1 Answers to Research Questions . . . 50

6.1.1 RQ1: How a compaction strategy can be developed in such a way the compaction strategy compacts a random number of SSTables . . . 50

6.1.2 RQ2: How the performance of developed random compaction strategy can be evaluated . . . 50

6.2 Threats to Validity . . . 51

6.2.1 Internal Validity . . . 51

6.2.2 Conclusion Validity . . . 51

6.3 Limitations . . . 51

7 Conclusions and Future Work 53 7.1 Conclusions . . . 53

7.2 Future Work . . . 54

A Supplemental Information 58

iv

(7)

2.1 Architecture of Apache Cassandra . . . 6

2.2 Compaction in Apache Cassandra . . . 7

2.3 Compaction Process Using STCS . . . 8

2.4 SSTables residing on the disk after several Inserts using STCS . . . . 8

2.5 Compaction process using LCS . . . 9

2.6 SSTables after several Write requests using LCS . . . 9

2.7 Write Path of Apache Cassandra . . . 11

2.8 Read Path of Apache Cassandra . . . 12

4.1 Research Methodology . . . 16

4.2 Network Architecture . . . 20

4.3 Software Installed in virtual server instances . . . 21

5.1 Random Compaction Strategy’s Working Algorithm . . . 28

5.2 Live SSTables & SSTables Per Read under WRITE, READ and MIXED Workloads (Size Tiered Compaction Strategy) . . . 30

5.3 Live SSTables & SSTables Per Read under WRITE, READ and MIXED Workloads (Leveled Compaction Strategy) . . . 31

5.4 Live SSTables & SSTables Per Read under WRITE, READ and MIXED Workloads (Random Compaction Strategy - Uniform Distribution) . . 32

5.5 Live SSTables & SSTables Per Read under WRITE, READ and MIXED Workloads (Random Compaction Strategy - Poisson Distribution) . . 34

5.6 Live SSTables & SSTables Per Read under WRITE, READ and MIXED Workloads (Random Compaction Strategy - Geometric Distribution) 35 5.7 Average number of Live SSTables . . . 37

5.8 SSTables Per READ . . . 39

5.9 CPU Utilization and Disk Space Utilization under WRITE, READ and MIXED Workloads (Size Tiered Compaction Strategy) . . . 41

5.10 CPU usage and Disk Space Usage under WRITE, READ and MIXED Workloads (Leveled Compaction Strategy) . . . 42

5.11 CPU usage and Disk Space Usage under WRITE, READ and MIXED Workloads (Random Compaction Strategy - Uniform Distribution) . . 43

5.12 CPU usage and Disk Space Usage under WRITE, READ and MIXED Workloads (Random Compaction Strategy - Poisson Distribution) . . 44

5.13 CPU usage and Disk Space Usage under WRITE, READ and MIXED Workloads (Random Compaction Strategy - Geometric Distribution) 45 5.14 Comparison of Average CPU Utilisation under diﬀerent heavy workloads 48 5.15 Operation Latency . . . 49

v

(8)

5.4 One Way ANOVA Test for Live SSTables . . . 38

5.5 SSTables Per READ - READ HEAVY Workload . . . 38

5.6 SSTables Per READ - MIXED HEAVY Workload . . . 39

5.7 One Way ANOVA Test for SSTables Per READ . . . 40

5.8 Summary Statistics - CPU Usage under WRITE heavy workload on application of diﬀerent compaction strategies . . . 46

5.9 Summary Statistics - CPU Usage under READ heavy workload on application of diﬀerent compaction strategies . . . 47

5.10 Summary Statistics - CPU Usage under MIXED heavy workload on application of diﬀerent compaction strategies . . . 47

5.11 One Way ANOVA Test for CPU Utilization . . . 48

5.12 Summary Statistics - Operation Latency under WRITE heavy workload on application of diﬀerent compaction strategies . . . 49

5.13 Summary Statistics - Operation Latency under READ heavy workload on application of diﬀerent compaction strategies . . . 49

1

(9)

STCS Size Tiered Compaction Strategy LCS Leveled Compaction Strategy

RCS-Uniform Random Compaction Strategy, in which random threshold is generated from discrete Uniform Distribution. This threshold decides the number of SSTables to be compacted.

RCS-Poisson Random Compaction Strategy, in which random threshold is generated from Poisson Distribution.

RCS- Geometric

Random Compaction Strategy, in which random threshold is generated from Geometric Distribution.

SSD Solid-state device HDD Hard disk drive

IT Information Technology SSTable SortedString Table

GB Gigabyte

MB Megabyte

2

(10)

of any format which includes unstructured, semi-structured and structured. There is a need by Telecommunication Industries, IT industries to store, manage, monitor, retrieve data efficiently and effectively. To effectively store, manage, and monitor the data, there is a need for database management systems by IT industries. Database Management Systems plays a crucial role in Telecommunication Industries [14].

In the recent past, diﬀerent database management systems are emerging and evolving fast to systematically and spontaneously tackle the growing varieties and vagaries of data structures, schema, sizes, speeds, and scopes [26]. The database management systems fall under two categories.

• Relational Database Management Systems (RDBMS)

• NoSQL Databases (non-relational DBMS)

NoSQL databases are being used by almost every large organization to manage Big Data eﬃciently. Big data is the general term used to represent massive amounts of data that are not stored in the relational form in traditional enterprise-scale SQL databases [14, 26]. Apache Cassandra is one of the NoSQL databases developed at Facebook to power their inbox search feature [15]. Apache Cassandra is an open- source distributed database management system designed to handle vast amounts of any formatted data, including unstructured, structured and semi-structured formats.

Apache Cassandra stores the persistent data in the form of immutable ﬁles called SSTables. As these SSTables are immutable, they can’t be altered directly. Apache Cassandra uses a method called Compaction to reclaim the disk space and to improve the performance of the database [5].

This thesis project is part of the thesis project Creation of Random Compaction Strategies for Apache Cassandra database at Ericsson. Another Random Compaction Strategy for Apache Cassandra database is described in the paper [13].

1.1 Problem Statement and Motivation

In earlier versions of Apache Cassandra, two generic compaction strategies were developed. These strategies are Size Tiered Compaction Strategy and Leveled Com- paction Strategy. These strategies are developed to improve the performance of the

1

(11)

database. These compaction strategies have signiﬁcant limitations which can degrade the performance of the database in the long run. To overcome these limitations, developing a new compaction strategy is needed. This thesis addresses the limitations of these strategies and alternate solutions to overcome these limitations. This thesis focuses on the development of a new compaction strategy. Moreover, this compaction strategy is named as Random Compaction Strategy.

1.2 Aim

The main aim of this project is to address the limitations of generic compaction strategies and to develop a new random compaction strategy to overcome those limitations.

1.3 Objectives

The main objectives/goals to achieve the aim of this project are discussed as follows:

• Investigate the working of Compaction and compaction strategies.

• Investigate the limitations of Size Tiered Compaction Strategy and Leveled Compaction Strategies.

• Investigate which parameters are involved in eﬀecting the Compaction.

• Based on the knowledge of APIs of generic compaction strategies and diﬀerent parameters involved in Compaction develop a new random compaction strategy that will not break Apache Cassandra’s functionality.

• Investigate which metrics are useful for the evaluation of the performance of the new compaction strategy.

• Evaluate the developed compaction strategy’s performance by comparing the developed compaction strategy’s performance metrics with generic compaction strategies.

1.4 Research Questions

1. How can a new compaction strategy be developed so a random number of SSTables will be compacted whenever a compaction process is triggered?

Method Used To answer this question, conducted a detailed literature review, to gain knowledge on Apache Cassandra database APIs and generic compaction strategies APIs. Based on these APIs, a new compaction strategy API is developed from scratch. Meetings with Ericsson’s CIL team and advisor at BTH are conducted to gain more knowledge regarding compaction strategies and how the randomness properties can be applied in creating a compaction strategy.

(12)

1.5 Document Outline

This document is organized as described below:

Chapter 1 Introduction

In this chapter, an overview of Apache Cassandra is given. This chapter exposes the aim of the thesis project. The research questions framed for this project are also provided in this chapter.

Chapter 2 Background

In this chapter, an overview of databases is provided. This chapter exposes the trend of NoSQL databases in the Telecommunication & IT industries. In this chapter, the data model, the architecture of Apache Cassandra is provided. The compaction mechanism and diﬀerent compaction strategies implemented for Apache Cassandra till date and the signiﬁcant limitations are depicted in this chapter.

Chapter 3 Related Work

This chapter exposes the related work done in the creation of Apache Compaction Strategies. In this chapter, the related work of evaluating the performance metrics for Compaction Strategies is also mentioned.

Chapter 4 Method

This chapter exposes the methodologies administered to achieve the aim of the project.

Chapter 5 Results and Analysis

This chapter depicts the results and analysis obtained from this project. The results and analysis are described in Graphical and tabular representations. This chapter illustrates the results obtained under diﬀerent workloads for diﬀerent compaction strategies.

(13)

Chapter 6 Conclusions and Future Work

This chapter depicts the conclusions for the thesis work done. This chapter exposes the future work that can be implemented.

(14)

but the data being generated is also of diﬀerent forms (varieties) [25]. In the recent past, Telecommunication Industries and IT industries used relational databases to manage the information. The relational databases use structured schema to store the data, and the relational database cannot support the management of unstructured data. NoSQL databases have been proposed as a solution to storing the unstructured data [26]. A NoSQL or Not Only SQL, which is a non-relational database designed to provide a mechanism for storage, management and retrieval of unstructured data, and it is faster than traditional SQL, as principles used by the NoSQL databases are diﬀerent from those principles of relational SQL, and it also makes data operations such as READ, WRITE operations faster when compared to relational databases.

And NoSQL has a simple design and follows CAP Theorem instead of ACID principles [1] and almost there are more than 225 diﬀerent NoSQL databases widely used by enterprises like Google, Yahoo, Amazon Twitter, Facebook, Ericsson and much more large IT Enterprises[22].

The most popular NoSQL databases are Wide Column Store, Key-Value Store, Doc- ument Store, and Graph Databases. Apache Cassandra belongs to the family of Key-Value Store. As the name suggests the data in this particular database is stored in the format of "Key – Value", where the key is a string assigned with a unique value for its identiﬁcation and a value is an object which can be a string or numeric value or even a complex binary large object[7].

2.1 Apache Cassandra

Cassandra is a horizontally scalable and highly available NoSQL database, which follows peer-to-peer networking architecture and the CAP theorem. Cassandra can be used to manage and store large volumes of unstructured, semi-structured and as well as correctly structured data across various data centres while providing scalability, fault tolerance and high performance with no single point of failure[12].

5

(15)

2.2 Architecture of Cassandra

Cassandra’s architecture follows peer to peer architecture(P2P), where within a cluster, the data can be replicated over multiple nodes. Nodes use Gossip Communi- cation Protocol to exchange information regarding state and location among them- selves. Due to data replication, the risk of a single point of failure is dropped to zero while providing high availability and scalability. In this architecture, commit- logs are used to ensure the durability of the data and data gets immediately stored in in-memory data structures, called memtables; upon memtables reaching a size of configurable threshold (by default 123MB), it will be flushed into persistent disk storage as SSTable immutable data files. Cassandra uses a method called compaction, to periodically erase obsolete data from the persistent disk storage and improve the performance of the READ operation [1].

Figure 2.1: Architecture of Apache Cassandra

2.2.1 Peer-to-Peer Architecture

Peer-to-peer architecture (P2P architecture) is the most widely used computer networking architecture in which every Node has similar capabilities and responsibili- ties. Peer-to-peer architectures’ functionalities are often contrasted with client/server model architecture, where some particular systems(nodes) are conﬁned to serving others [3].

(16)

performance of READ queries [27]. The compaction process is depicted in ﬁgure 2.2.

Figure 2.2: Compaction in Apache Cassandra

Size Tiered Compaction Strategy

The Size Tiered Compaction Strategy is the most widely used strategy by Telecom- munication Industries. Size Tiered Compaction Strategy merges SSTables of approximately the similar size. Similar sized SSTables are placed in buckets. Based upon the hotness of the bucket, the SSTables in a hot bucket gets compacted[28]. The compaction process using STCS is depicted in ﬁgure 2.3. After several writes, small number of SSTables are generated using STCS, it is depicted in ﬁgure 2.4. Variable sized SSTables are created using STCS.

(17)

Figure 2.3: Compaction Process Using STCS

Figure 2.4: SSTables residing on the disk after several Inserts using STCS Limitations of Size Tiered Compaction Strategy

Compaction requires a lot of temporary space as the new larger SSTable is written before the duplicates are purged. In the worst-case up to half the disk space needs to be empty to allow this to happen [2, 28]. This problem is referred to as Space Am- pliﬁcation. In the past, to overcome Space Ampliﬁcation problem IT Administrators at Telecommunication Industries used to increase the number of disks attached to a node. In the past, the disk used is HDD which is cheaper compared to SSD being used nowadays. Nowadays, increasing the number of disks(horizontal scaling)[23] is not a viable solution.

Leveled Compaction Strategy

Leveled Compaction creates small, fixed-size (by default 160 MB) SSTables divided into different levels. Each level represents a run of a number of SSTables.To overcome space amplification limitation, the Leveled Compaction Strategy uses levels and "Runs" for Compaction. "Run" refers to a group of SSTables, in which each SSTable has a default size of 160MB[2, 28]. The compaction process using Leveled Compaction Strategy is depicted in figure 2.5. The SSTables created after compaction using LCS resides in different runs. Most of the data reside on the last level.

This is the main reason why READ performance is higher in the case of LCS. Most of the time, it is just enough to query SSTables residing in the last level which ultimately makes the READ performance better and it is also depicted in the ﬁgure 2.6.

The SSTables residing on disk after several write requests using LCS is depicted in ﬁgure 2.6.

(18)

Figure 2.5: Compaction process using LCS

Figure 2.6: SSTables after several Write requests using LCS Limitations of Leveled Compaction Strategy

It consists of various levels, and as the levels go up, the number of SSTables in each level increases by ten times the previous level. So, this won’t consume much space during the compaction process. But due to less size of SSTables and levels, more compactions are performed on the same data. The writing of the same rows, again and again, is called write amplification. Data written in one level is compacted to multiple levels. Hence, a greater number of writes are done by the end that results in higher write amplification. Since this is correlated with read performance, the efficiency, on the whole, is affected. To this write amplification, more CPU cycles are utilized.Unnecessary usage of more CPU cycles are not recommended, and ultimately it consumes more power[13].

2.4 Key Components of Apache Cassandra Database

The key components of Cassandra are described as follows[8]:

1. Node: Node is the most fundamental component of Cassandra where the data gets stored.

2. Datacenter: A datacenter is a group of related nodes, and Datacenter can either be a physical or a virtual datacenter, depending on the replication factors the data can be replicated to multiple data centres.

3. Cluster: A cluster is a group of one or more data centres, and a cluster can be spanned across all the physical locations where a data centre can not be spanned across all the physical locations.

4. Commit Log: During a write query, the data are initially written to the commit log for durability. Later, when the data is ﬂushed into SSTables (persistent disk storage), the data saved at the commit log gets deleted.

(19)

5. SSTables: Cassandra periodically ﬂushes stored memtables to immutable data ﬁles, which are called SSTables.

6. Keyspaces: At the top level, the data in Cassandra is stored in the form of key spaces, where the keyspaces act as containers for tables, where the data in the tables get stored in the form key-value pairs.

7. Gossip Communication Protocol: The gossip communication protocol is used in Cassandra to share information regarding state and location with other nodes inside the Cluster. Information within the nodes can be retrieved when nodes are restarted, as state information is persisted locally within the cluster nodes.

8. Partitioner: The partitioner is a hash function used by Cassandra to determine which Node gets the ﬁrst replica of data. It also administers the distribution of data replicas across other cluster nodes, Murmur3Partitioner is the default Partitioner strategy and the most used strategy in all the use cases.

9. Replication factor: Replication factor is the total number of replicas of row data across the Cluster, and the replication factor is set per-table. For example, if RF=1, it means only one copy of data and does not provide fault tolerance.

If RF=3, three similar copies of data are getting replicated across three nodes providing higher availability and fault tolerance. For a single data centre, it is highly recommended to use a replication factor of 3.

10. Replica Placement Strategy Apache Cassandra uses replica placement strategy to determine the selection of nodes on which should replicate the data.

11. Configuration Files Cassandra.yaml is the main configuration file that describes the Cassandra Cluster’s entire configuration.

2.5 Write Path

Cassandra implements writing of data at various vital stages, starting from imme- diate logging of request to ensure durability and ends with the writing of persistent data in immutable ﬁles called SSTables into the SSD or HDD. The ﬁgure 2.7 depicts the write path of Apache Cassandra. The actual steps that involve the writing process are as follows[8]:

1. Logging requests and Memtable Storage: When any of the coordinator nodes receives a WRITE request, the request is immediately logged in commit-log.

Simultaneously, data is written in the form of an in-memory data structure called MemTable. Every incoming request is logged into the commit-log to ensure durability. If a node fails due to any disaster, the data stored in Memtable gets lost. Each incoming request is logged in the commit-log, and when the node comes back to live, the commit-log is replayed.

2. Flushing data from memtable: When the size of Memtable reaches a conﬁg- urable threshold limit, the memtable is ﬂushed as SSTable into disk storage.

(20)

Figure 2.7: Write Path of Apache Cassandra

3. Storing data on disk in the form of SSTables: SSTables and Memtables are maintained per table. The commit-log is shared between tables.SSTables are immutable which once ﬂushed from Memtable these SSTables cannot be updated in place. When a key-value pair is updated or created a new SSTable is created. Old time-stamped SSTables and unused SSTables are purged using Compaction method of Apache Cassandra.

2.6 Read Path

Apache Cassandra uses complex data structures in both memory and disk to opti- mize read requests and reduce disk I/O operations[31]. The read path of Apache Cassandra is depicted in figure 2.8 Different data structures used by Cassandra to process a read request are as follows: Different data structures used by Cassandra to process a read request are as follows:

1. Memtable: Cassandra ﬁrst searches the requested data in memtable. If it exists within memtable, the data within memtable combined with data within SSTable is returned.

2. Row Cache: It is a data structure which stores a subset of data within SSTa- bles. As seeking the data from memory is faster than seeking from the disk, row cache boosts the READ performance.

3. Bloom Filter: If the partition key is not present in both Memtable and Row Cache, Cassandra queries Bloom Filter to ﬁlter all SSTables where the key might exist. Bloom Filter is a probabilistic data structure; it may sometimes return false positives.

4. Key Cache: It is an in-memory data structure which stores partition indices.

If the requested partition key exists in Key Cache, the READ process directly jumps to Compression Oﬀset Map. It reduces disk seeks for a READ request.

5. Partition Summary: It is an in-memory data structure which stores a sample of partition indices. If the key is not found in Key Cache, the process hits the

(21)

Figure 2.8: Read Path of Apache Cassandra

Partition Summary, and Cassandra searches for the range of possible partition indices.

6. Partition Index: It is a data structure which resides on disk. Partition Index Stores the location of partition key on disk.

7. Compression Offset Map: Compression Offset Map Stores the pointers to the location of requested data on disk. The data is returned from correct SSTables once the compression offset map identifies the location of data on disk.

2.7 Diﬀerent Tools Used in this project

2.7.1 CQL & CQLSH

Cassandra uses Cassandra Query language (CQL) as it is the primary interface to interact with the database. Furthermore, CQLSH is used as the base to interact with Cassandra. CQLSH being a primary interaction tool and also a powerful tool, where we can query any operation ranging from the creation of keyspace to data operations[8].

(22)

Node exporter is an exporter provided by Prometheus to collect the operating system and CPU metrics and also metrics related to hardware[24].

2.7.4 Cassandra Metrics Exporter

Cassandra metrics exporter is a java agent which exports the metrics of Cassan- dra to the Prometheus server, the performance of Cassandra exporter is better in comparison to JMX exporter[20].

2.7.5 Prometheus & Grafana

Prometheus is a monitoring and alerting software which is open source, and it also provides multiple modes of graphing support. Moreover, time-series metrics can be pulled easily using Prometheus; the setup of Prometheus is easy. The main component of Prometheus is the Prometheus server, and its primary function is to pull and store the time-series metrics in the speciﬁed data storage directory[24].

Grafana is a visualization tool and analytics software, which allows us to query and visualize the metrics depending upon the data source provided[10].

2.7.6 IBM Cloud

The virtual private cloud(VPC) is an Infrastructure-as-a-Service provided by IBM cloud; we can establish our cloud environment using VPC[11]. The main cloud resources of VPC being used for this project are as follows [29]

• Compute: Virtual server instances are virtual machines with predeﬁned virtual cores, computing power and memory provided by the IBM cloud.

• Storage: Diﬀerent types of Storage such as File Storage, Object Storage, Block Storage are provided by IBM cloud, in which Block Storage is used for this project.

• Networking: By default, virtual server instances of the VPC are private, a ﬂoating IP address must be attached to the instance to make it public, and by default, IBM provides a public internet gateway to connect the virtual public instance to the internet.

(23)

Related Work

Initially, the implementation of compaction for NoSQL databases was proposed in Bigtable [6]. Compaction is a method used by Apache Cassandra merge SSTables to remove obsolete data [6]. In earlier versions of Apache Cassandra, two generic compaction strategies are proposed.

According to Size Tiered compaction strategy, the merging of the SSTables depends on the size of the SSTables and hotness of the bucket. According to this strategy, compaction merges similar-sized SSTables removing tombstones and replacing old data with updated data. This strategy is developed based on Google’s BigTable[2].

According to this paper, [9] the compaction of SSTables is formulated to be an optimization problem, and the problem is proved to be NP-hard problem. In this paper, three heuristics are implemented and evaluated the performance of these three heuristics under diﬀerent workloads. All these heuristics are proved to be O(logn) approximations.

In this paper, [4] different Garbage Collection Algorithms targeted to different Big Data platforms are studied. In this paper, [4], the scalability of the latest compaction algorithms are characterized by throughput and latency. A taxonomy of described works and open problems related to garbage collection algorithms targeted to different Big Data platforms is presented in this paper [4].

In this paper, [19] comparison of performances of Cassandra and ScyllaDB is discussed. In the case of read-heavy workload, there is a performance improvement of 41% for Cassandra and 9% for ScyllaDB using default conﬁgurations, but the throughput is 7.5% for Cassandra and 6.9% - 7.8% for ScyllaDB can be predicted for unseen workloads conﬁguration.

This paper [17] discusses the eﬀects of time series data storage in Apache Cassandra and related strategy Date Tiered compaction strategy and other storage strategies:

Size Tiered compaction strategy and Level compaction strategy and in this paper[27]

the methods used to extract the performance metrics and the essential metrics required for comparison of these strategies are discussed.

This paper [14] discusses the comparison of different compaction strategies of Apache Cassandra for different use cases and also concluded which strategy can be best used for different use cases. This paper [14] exposes the general setup of testing envi-

14

(24)

a compaction strategy be developed. Some of the studies also describe compaction as an NP-hard problem. Recently Scllyadb Enterprise [28] announced that a new compaction strategy named Hybrid Compaction Strategy would be in this year, but yet not released the strategy. In the case of studies related to performance evaluation, the tools described by the authors to evaluate the performance are not reputed ones. Using such tools can lead to false evaluations. Most of the studies considered a single-node cluster for simulating the workloads. These studies also motivate for further research on Apache Cassandra.

As in previous research works no new compaction strategy for Apache Cassandra is discussed; this motivated me to do further research on how a compaction strategy can be developed. In previous studies on Apache Cassandra, only single-node cluster testing environment is created and evaluated the performance of the strategy. To further reﬁne the results and to get a better understanding of how a compaction strategy and how the performance alters, I created a multi-node cluster(mimics real- world scenario) and evaluated the performance of the strategy.

(25)

Method

This chapter discusses the method administered to develop the new random compaction strategy and to analyze the developed strategy’s performance by comparing the performance metrics of diﬀerent strategies. The research is divided into three phases, as shown in ﬁgure 4.1.

Figure 4.1: Research Methodology

4.1 Literature Review

During this research, a Grey literature review is conducted to gain an understanding of Apache Cassandra, Compaction, and related APIs of generic compaction strategies. Grey literature includes data not found in commercially published research works. It includes technical papers, academic papers, theses, technical reports [21].

16

(26)

Strategies" AND "Leveled Compaction Strategy") Keywords such as Apache Cassandra, Compaction Strategies, Performance Evaluation, Discrete Probability Distributions are searched and gathered to get a good understanding of Apache Cas- sandra working and Compaction Strategies. The following electronic databases are used to perform the search strategy

4.1.2 Digital Libraries

1. Google Scholar 2. ACM Digital Library 3. IEEE

4. Scopus

4.1.3 Inclusion-Exclusion Criteria

This step is carried out to reﬁne the search further to gather the papers that address the research questions.

Inclusion Criteria

1. Papers which are written in English.

2. Papers describing research works on Performance Evaluation of Compaction Strategies.

3. Papers describing research works on compaction strategies.

4. Papers describing research works on algorithms of compactions.

5. Papers describing research works on diﬀerent garbage collection algorithms.

Exclusion Criteria

1. Papers which are not written in English.

2. Papers that don’t focus on any of the keywords or search string.

3. Papers which include works on NoSQL databases other than Apache Cassandra.

4. Papers focusing on the comparison of SQL and NoSQL databases.

(27)

4.2 Development of new Compaction Strategy

4.2.1 Phase 1 of Development

Initially, after studying generic compaction strategies Size Tiered Compaction Strat- egy (size) and Leveled Compaction Strategy (hierarchy), it is found that these strategies are very systematic. Though these strategies are very systematic, these strategies got some major limitations, so a compaction algorithm developed from scratch by implementing Abstract Compaction Strategy and overriding methods of it. This strategy compacts a random number of SSTables instead of a ﬁxed number of SSTa- bles every time compaction is triggered automatically or manually.

4.2.2 Phase 2 of Development

For Size Tiered Compaction Strategy, the number of SSTables that should be compacted can be provided via table-level properties max_threshold and min_threshold.

By default, min_threshold is four and max_threshold are 32. Using this setting, STCS always compacts the number of SSTables where this number lies between min_threshold and max_threshold. During the previous phase of development, a random number that speciﬁes the number of SSTables to be compacted is generated using a uniform distribution. The maximum and minimum bounds for uniform distribution are assigned from max_threshold and min_threshold, respectively. During this phase of development, other table-level sub-properties are added to the random compaction strategy:

1. compaction_chance: It is a sub-property which speciﬁes the chance for compaction to occur. If the randomly generated value by the system is greater than compaction_chance, then the compaction occurs. By default, compaction_chance is equal to 0.5. This chance is chosen after considering values from 0.2 to 0.8.

When values less than 0.5 are selected, fewer compactions occur, leading to the creation of large SSTables and ultimately increases space ampliﬁcation (Size Tiered Compaction Strategy limitation). If the value is more than 0.5, then more CPU cycles are utilized.

2. distribution: It is a sub-property which speciﬁes the distribution that should be used to generate the random value. The generated random value determines the number of SSTables that should move into the compaction phase. The random value is generated using a uniform distribution during the previous phase, but this value can also be generated using Poisson distribution, Geometric distribution, and any other probability distribution. But default the distribution is set to uniform.

• lambda: It is sub-property which is mandatory when using Poisson distribution to generate the random threshold. By default, it is set to 4.

• probability: It is sub-property which is mandatory when using Geomet- ric distribution to generate the random threshold. By default, it is set to 0.2.

(28)

4.3.1 Test Environment Setup

To measure the performance metrics of strategies Size Tiered Compaction Strategy, Leveled Compaction Strategy, and Random Compaction Strategy, a testing environment is created. This test environment consists of a four-node cluster and a simulator.

4-Node Cluster Platform

A 4-node cluster is created using virtual server instances provided by IBM Cloud. It is an Infrastructure-as-a-Service provided by IBM Cloud [29]. The cluster is private by default. No ﬂoating IP addresses are assigned to Virtual Machines of the cluster.

These Virtual Machines are labelled as server-a, server-b, server-c, server-d. Another virtual machine labelled as simulator is created to stress the cluster. The hardware speciﬁcations of a virtual server in the 4-node cluster and simulator is shown in the table 4.1.

Node No. of vCPU cores RAM Machine Type

Virtual Server(4-Node Cluster) 8 64GB mx2-8X64(Memory Intensive)

Simulator 8 32GB bx2-8X32(Balanced)

Table 4.1: Hardware Speciﬁcations of Virtual Server Instances

Network

The ﬁgure 4.2 depicts the network deployed for the Testing Environment. A private cluster is built on top of Virtual Private Cloud infrastructure provided by IBM Cloud.

The private cluster consists of 4 Virtual Instances. A public instance is deployed, which acts as both simulator and bastion for the private cluster. The private cluster instances act as nodes of Cassandra Cluster and simulator acts as the Client to Cassandra Cluster. A public or ﬂoating IP Address from the pool of ﬂoating IP addresses provided by IBM Cloud is assigned to the client.

Software

1. Apache Cassandra A tarball of Apache Cassandra v3.11.5 is installed on all the 4 virtual machines. Later this tarball is unarchived into the directory /apache-cassandra-3.11.5/.

(29)

Figure 4.2: Network Architecture

2. Random Compaction Strategy Jar ﬁle Random Compaction Strategy Jar ﬁle is integrated into Apache Cassandra 3.11.5’s library. This library is located at directory apache-cassandra-3.11.5/lib/.

3. cassandra.yaml ﬁle The cassandra.yaml ﬁle is located at directory apache- cassandra-3.11.5/conf/cassandra.yaml. The following are required to run and test the cluster:

• Cluster Name: The name of the cluster. This setting prevents the nodes from joining from one logical cluster into another one. All the nodes in the cluster must have the same value. In this project, Cluster Name is set to ’Testing Cluster’.

• Listen Address: This setting binds the Cassandra to IP address or hostname of the node in the cluster.

• Seed Nodes: A list of nodes which a new node contacts when joining the cluster. Note that these nodes are not bootstrapping nodes. This setting is set to value [’server-a’,’server-c’].

• rpc address: The listen address for client connections (Thrift RPC service and native transport). This setting is set to respective node’s hostname.

The ﬁle should be the same on all the VMs of the cluster.

4. Prometheus Prometheus tarball(Prometheus-2.0.0.linux-amd64.tar.gz) is installed on simulator Virtual Machine later tarball is unarchived and copied the

(30)

private cluster. Later, the tarball is unarchived and copied the binaries to /usr/local/bin. For running the node exporter as a system process, systemd service ﬁle is created later; it should be rebooted to scrape the CPU metrics.

Node exporter exposes the metrics through port 9100.

6. cassandra-exporter Cassandra exporter is a java agent that exports metrics related to Apache Cassandra, and installed on all Virtual Machines integrated to Apache Cassandra’s library. It exposes the metrics via port 9500.

7. Grafana Server Grafana server is installed on simulator Virtual Machines (v7.3.4). Prometheus is added as a data source for Grafana. Grafana Server exposes Prometheus metrics in graphical format via port 9090 using REST API.

Figure 4.3: Software Installed in virtual server instances

4.3.2 Operating the Cassandra Cluster

Starting the Cassandra Cluster

To start the cluster, the VMs are accessed using SSH protocol via Command Line Interface and the following command is used to start the Apache Cassandra on a single Virtual server

~/apache-cassandra-3.11.5/bin/cassandra.sh

This command may change depending upon diﬀerent factors such as the package’s location, Operating System, type of installation used. After starting the Apache Cassandra on all Virtual Machines, the cluster is formed. And to check the status of the cluster, this command is used:

~/apache-cassandra-3.11.5/bin/nodetool status

(31)

Running System Load Tests on the Cassandra Cluster

The client/simulator is accessed using SSH protocol. The general stress command used to run the system load test on the Cassandra Cluster is as follows

~/apache-cassandra-3.11.5/tools/bin/cassandra-stress <operation>

<no.of operations>

-mode <mode> cql3 -rate <no.threads>

-schema <schema-options> <target-keyspace>

-pop <sequence> -graph <file-to-write-the-graph>

Description of Command Line Arguments of Cassandra Stress Command 1. <operation>: It speciﬁes the type of workload to be performed. The opera- tions include write, read, mixed. If the operation is write insertion operations are performed. If the operation is mixed both write and read operations are performed. A ratio can also be given as input argument describing the ratio of write to read operations to be performed.

2. <mode>: It speciﬁes in which way the stress command should be performed.

For this project mode is set to native cql3.

3. rate: It speciﬁes the number of parallel threads that should be used during the execution of stress test.

4. schema-options: Schema Options include the target keyspace, replication factor.

• keyspace: It speciﬁes the target keyspace name

• replication(factor): It speciﬁes the replication factor for the cluster 5. pop: It is used to specify the key ranges instead of distributing them randomly.

6. graph: It speciﬁes the target ﬁle location to save the performance metrics of stress test in the form of graphs.

Cassandra Stress Commands for diﬀerent workloads 1. Write heavy workload

~/apache-cassandra-3.11.5/tools/bin/cassandra-stress write n=150m cl=one -mode native cql3 -rate threads=96

-schema "replication(factor=3)"

keyspace="TestingKeyspace150M"

-pop seq=1..30000000 -node server-a

2. Read heavy workload

(32)

3. Mixed Heavy Workload

apache-cassandra-3.11.5/tools/bin/cassandra-stress mixed

\(write=1,read=1\)

n=150m cl=one -mode native cql3 -rate threads=96 -schema

"replication(factor=3)"

keyspace="TestingKeyspace150M"

-pop seq=1..30000000 -node server-a

Size Tiered Compaction Strategy is set as the default compaction strategy. Com- paction option is given to schema ﬂag to specify the compaction strategy.

4.4 Data Collection

The developed compaction strategy is stressed by heavy mock data provided by the stress tools using default Apache Cassandra stress tool and CCM-Stress tool.

Some custom scripts are created to run the stress tests, open-source tools such as Prometheus and Grafana are used to monitor, visualize the appropriate metrics.

System-Level metrics and Cassandra metrics are collected for analysis. All the metrics collected are of a quantitative type. CPU Utilization, Disk Space Utilization are System Level metrics, Operation Latency, Number of Live SSTables, SSTables Per READ, and Rate of Compaction are Cassandra metrics. More than 100 system level and Cassandra Application metrics are exposed by node exporter and Cassan- dra exporter. The collected metrics are fed as an input in the form of raw data to Microsoft Excel for data analysis purpose.

4.5 Performance Metrics

There are more than 100 performance metrics scraped by Node Exporter and Cas- sandra Exporter. These metrics are exported to Prometheus server. As per the suggestions by the supervisor at Ericsson and advisor at BTH, only a few essential metrics are considered for evaluating the Random Compaction Strategy’s performance. These metrics include Operating System Metrics and Cassandra Application Metrics. Prometheus scrapes these metrics at an interval of 15s.

(33)

4.5.1 Cassandra Metrics

1. cassandra_table_sstables_per_read: This metric describes the number of SSTables queried per READ request.

2. cassandra_table_operation_latency_seconds: This metric describes the latency occurred in completing any operation. This metric is measured in mi- croseconds.

(a) Write Latency: The response time for completing a write request.

(b) Read Latency: The response time for completing a read request.

3. cassandra_table_live_sstables: This metric describes the number of SSTa- bles at given point of time.

4.5.2 Operating System Metrics

1. CPU Utilization: This metric describes the number of CPU cycles utilized by system. From this metric only CPU utilization by user application is considered.

2. Disk Space Utilization: This metric describes the disk space occupied by generated SSTables. This metric helps to determine the size of SSTables being generated.

4.6 Data Analysis

After gaining knowledge from the meetings conducted and taking opinions from advi- sors at BTH and Ericsson, relevant performance metrics are considered for evaluation purpose. The metrics collected are of quantitative type; the metrics are visualized using graphs provided by open-source graphical visualization tool Grafana.

4.6.1 Statistical Test

One-Way ANOVA Statistical Test is used in this project to determine statistical significance among different strategies. The one-way ANOVA test is performed to determine if any statistically significant difference exists between two or more independent groups. One-way ANOVA compares the means of the independent groups and determines whether the means have any statistically significant difference between them[16]. In this research, the null hypothesis is specified as follows:

H0 : μST CS = μLCS = μRCS−Uniform= μRCS−P oisson = μRCS−Geometric

The null hypothesis is that there is no signiﬁcant diﬀerence between the means of the strategies.

The alternate hypothesis is speciﬁed as follows:

H1 : μST CS == μLCS == μRCS−Uniform= μRCS−P oisson = μRCS−Geometric

The alternate hypothesis is there is at least one group whose mean can be diﬀerent from means of other strategies.

In this thesis, Compaction Strategy is considered an Independent variable for

(34)

(35)

Results

5.1 RQ1-Development of Random Compaction Strat-

egy

5.1.1 Phase 1 of development

1. During this phase of development as per the advisor suggestions, related APIs of Cassandra and Cassandra Compaction Strategies are studied and developed a dummy compaction strategy. Compaction can be triggered either manually or automatically. Compaction is triggered automatically whenever the compaction manager is called or whenever an SSTable is ﬂushed from MEMTABLE.

Compaction can be triggered manually through an API or CLI. Whenever compaction is triggered, using dummy compaction strategy, only 4 SSTables will be compacted. After compaction is completed the merged SSTable will always be one. The size of output SSTable depends upon diﬀerent factors such as tombstones, size of input SSTables(eligible for compaction).

2. Later instead of compacting a constant number of SSTables I added a feature of generating a random threshold prior to the process of compaction whenever compaction is triggered. This random threshold serves as the number of SSTa- bles that should be compacted. Initially, this number is generated from the range[4, 32] uniformly. This range of numbers is inspired by Size Tiered Com- paction Strategy’s range of thresholds. So the dummy compaction strategy is now using randomness to compact the random number of SSTables. Now this dummy compaction strategy can be named as Random Compaction Strategy.

After developing the strategy’s API, the API is converted into a jar ﬁle using Package Manager Maven and integrated into Apache Cassandra’s library. The developed strategy is tested on the local system by creating a small cluster. The strategy is tested on the local system using the ccm-stress tool. system.log ﬁle is traced to check whether the strategy is properly working and whether compaction is happening or not using this strategy. The data directories are traced to check whether the compaction is producing new SSTables after compactions.

Nodetool is used to check the status of the cluster.

26

(36)

taking place which leads to the production of the high number of new SSTables.

Ultimately, this high new SSTables leads to a lack of temporary space problem called Space Ampliﬁcation. And another major defect of the strategy developed in the phase is that as high number of compactions occur, the CPU utilization required by Apache Cassandra is also increased.

3. The supervisor at Ericsson suggested to reduce the occurrence of compactions which will ultimately eliminate the Space Amplification problem and wastage of CPU Cycles problem. One major complication should be noted here is the triggering of compaction by compaction manager cannot be altered; it is inbuilt of Apache Cassandra. To reduce the occurrence of compactions, a decision of whether compaction should take place or not should be integrated into the strategy. Based on this decision, the compaction takes place or not. A new used-defined parameter is added to the strategy called ran- dom_compaction_chance which decides whether the compaction should take place or not. A random threshold is generated from Uniform distribution, and whenever this threshold is less than user-defined random_compaction_chance, the compaction occurs.

4. Later the strategy is redeveloped to accept user-deﬁned and inbuilt parameters min_threshold and max_threshold.

5. In the previous implementation, the random number required for compaction is generated from Uniform distribution. To investigate if the random number generated from other discrete distributions could change the developed strategy’s performance, the random number generator module is developed. This random number generator module can generate a pseudo-random number based on the input probability distribution. Uniform, Poisson, Geometric, and General discrete distribution can be given as input to this module. And this module is integrated into the strategy’s API. The random number generating functions are developed based on algorithms designed by Donald Knuth [18]. An extra user-deﬁned parameter distribution is added to the strategy. An extra param- eter lambda is added to the strategy to generate the random threshold from Poisson distribution. An extra parameter probability is added to the strategy to generate the random threshold from Geometric discrete distribution. After adding all the required user-deﬁned parameters to the strategy, the strategy is tested on the local system by building the cluster, and the cluster is stressed with mock data using the ccm-stress tool.

6. Another major problem which can lead to Space Amplification is after compaction the new SSTable is a single file. In many runs of compaction, this file is

(37)

found to be large. For the next compaction, compacting large SSTables can be time-consuming. During this compaction, the next upcoming compaction will be waiting in buffer queue or in some worst cases the next compaction is killed, which can sometimes lead to degradation of READ performance. To eliminate this problem, the SSTable is split into small SSTables. The splitting of SSTa- bles is determined by used-defined parameter file_splitting_ratio. For example if file_splitting_ratio = 0.75 and 20 SSTables participated in the compaction then resultant SSTable will be split into 0.75 ∗ 20 = 15 SSTables.

7. The algorithm of completely developed random compaction strategy is shown in ﬁgure 5.1.

Figure 5.1: Random Compaction Strategy’s Working Algorithm

(38)

tion is deployed on virtual simulator machine. The collected metrics are exported to Prometheus through communication ports 9500, 9100, respectively. Prometheus exposes these metrics through REST API via 9090 port. Grafana server is deployed on the simulator machine which collects the metrics from Prometheus and displays the metrics in the form of graphs. Grafana server exposes the visualizations using REST API via port 3000. Node Exporter exports approximately more than 100 Operating System-related metrics. Cassandra Exporter exports approximately more than 100 Cassandra Application related metrics. After considering suggestions from the supervisor at Ericsson and advisor at BTH, only a few interesting and important metrics are considered. These metrics are discussed in chapter 4.

The graphs described below are direct results extracted from Grafana. Each graph depicts time-series data of performance metric. And each graph includes performance metric for WRITE HEAVY Workload, READ HEAVY Workload, MIXED HEAVY Workload over a chronological time period. The number of requests for each stress test is 150 million. To sustain such a high number of requests high computing virtual machines each with 64GB RAM and 8 vCores are used. Each stress test runs at least 1 hour. Each colour in the graphs corresponds to the performance metric on the respective node.

5.2.1 Cassandra Metrics - Live SSTables & SSTables Per Read

Live SSTables describe the number of SSTables residing on the disk at a given point of time. As the number of live SSTables is the direct consequence of the compaction, this metric is one of the most important metrics. There are no units for this metric.

Nearly 240 data points per node are collected for each stress test.

SSTables Per READ describes the number of SSTables queried per READ request.

As the occurrence of compactions directly aﬀects the number of SSTables Per READ request, this metric is of more interest for comparison among diﬀerent strategies.

There are no units for this metric.

Note that if the size of SSTable is large, the READ performance will be degraded.

The disk space utilization is considered to determine the average size of SSTables.

Based on the SSTables Per READ query, the performance of the strategy can be determined. Note that if SSTables Per READ query is lower, then the strategy’s performance is high.

Average SSTable Size = Total Disk Used / Live SSTables

(39)

Figure 5.2: Live SSTables & SSTables Per Read under WRITE, READ and MIXED Workloads (Size Tiered Compaction Strategy)

Size Tiered Compaction Strategy

1. Live SSTables: As shown in figure 5.2 the first graph depicts the number of Live SSTables during WRITE HEAVY, READ HEAVY and MIXED HEAVY workloads on the application of STCS. On a node-by-node basis, during WRITE stress test initially, the number of Live SSTables gradually increases in a period of 35 minutes. Later, the Live SSTables start rising faster in a shorter period of 25 minutes. On a node-by-node basis, during READ stress test, initially, the number of Live SSTables sharply falls within a short period of 10 min. The decrease in Live SSTables is due to compactions. And later in the remaining period, the number of Live SSTables is not altered, in- dicating no compactions are undergoing. During the MIXED stress test, a pattern of dips and peaks are observed. The peaks are due to INSERTs and dips are due to compactions. We can clearly observe from the figure 5.2 the number of SSTables is around 17 during READ workload, and average size of SSTable is nearly 0.5GB (large in size).

2. SSTables Per READ: As shown in ﬁgure 5.2 the second graph depicts the number of SSTables Per READ request during WRITE HEAVY, READ HEAVY and MIXED HEAVY workloads. On a node-by-node basis, during the WRITE stress test, the number of SSTables Per READ request is always equal to zero as no READ request is made during the WRITE stress test. On a node-by-node basis, during READ stress test the number of SSTables per READ request jumps to 3, and for server-a, server-b, server-d the number of SSTables per READ ﬂuctuates between 4 and 3, but for server-c, the SSTables

(40)

Leveled Tiered Compaction Strategy

Figure 5.3: Live SSTables & SSTables Per Read under WRITE, READ and MIXED Workloads (Leveled Compaction Strategy)

1. Live SSTables: As shown in ﬁgure 5.3 the ﬁrst graph depicts a Live SSTa- bles during WRITE HEAVY, READ HEAVY and MIXED HEAVY workloads.

On a node-by-node basis, during WRITE stress test initially, the number of Live SSTables gradually increases. During READ stress test, the number of SSTables gradually falls in a shorter period of 20 minutes and gets stabilized around 50. We can clearly observe from the ﬁgure 5.3 the Live SSTables during READ heavy workload are higher. The average size of the SSTable is nearly 200MB(small). During MIXED stress test, Live SSTables peaks and dips where peaks are due to WRITE requests and dips are due to compactions.

2. SSTables Per READ: As shown in ﬁgure 5.3 the second graph depicts SSTa- bles per READ during WRITE HEAVY, READ HEAVY and MIXED HEAVY

(41)

workloads. On a node-by-node basis, during the WRITE stress test, the number of SSTables per READ request is always equal to zero as no READ requests are made during the WRITE stress test. On a node-by-node basis, during READ stress test initially, SSTables per READ jumps to 3 and within a short period of 10 minutes falls to 2 and within 15 minutes falls to 1 and gets stabilized at 1. The point of time at which Live SSTables gets stabilized to 50 is same at which number of SSTables per READ falls to 1 and gets stabilized; it can be concluded that Live SSTables are directly related to SSTables per READ. On a node-by-node basis, during MIXED stress test the SSTables per READ jumps to 1 and within in a short period of 20 minutes SSTables per READ jumps to 2 and starts ﬂuctuating between 1 and 2 and after 30 minutes, the SSTables per READ stays around the neighbourhood of 2. The SSTables Per READ is stabilized to 1 during READ heavy workload because a large number of SSTables(data) resides in the last level, and the average size of SSTable is also small(reduces key search time). The SSTables are organized in levels(hierarchy).

Random Compaction Strategy - Uniform Distribution

Figure 5.4: Live SSTables & SSTables Per Read under WRITE, READ and MIXED Workloads (Random Compaction Strategy - Uniform Distribution)

1. Live SSTables: As shown in ﬁgure 5.4 the ﬁrst graph depicts the Live SSTa- bles during WRITE HEAVY, READ HEAVY and MIXED HEAVY workloads on the application of RCS-Uniform. On a node-by-node basis, during WRITE stress test initially, the Live SSTables gradually increases. On a node-by-node

(42)

at 50. On a node-by-node basis, during the MIXED stress test, the number of Live SSTables gradually rises and falls ultimately reaching around the neighbourhood between 25 and 50. The average size of SSTable during READ heavy workload is nearly 300MB(very near to LCS). In the case of RCS-Uniform, the SSTables are not organized into levels and not compacted based on size. Af- ter every compaction, the SSTable is split into smaller SSTables, consequently which decreases the size of SSTables gradually and increases the number of SSTables. This is the main reason why the Live SSTables is lower than LCS’s Live SSTables and higher than Live SSTables in case of STCS. Moreover, the size of the random bucket also varies for each compaction. The size of the random bucket is chosen uniformly.

2. SSTables Per READ: As shown in figure 5.4 the second graph depicts SSTables per READ during WRITE HEAVY, READ HEAVY and MIXED HEAVY workloads on the application of RCS-Uniform. On a node-by-node basis, during the WRITE stress test, the number of SSTables per READ request is always equal to zero as no READ requests are made during the WRITE stress test. On a node-by-node basis, during READ stress test initially, SSTables per READ jumps to 3 and starts immediately fluctuating between 2 and 3 within a short period of 10 minutes falls to 2 and within 15 minutes falls to 1 and gets stabilized at 1. STCS lags behind RCS-Uniform. In case of RCS-Uniform the SSTables per READ stabilizes to 1 wherein the case of STCS, the SSTables Per READ on stabilization fluctuates between 3 and 4. SSTables Per READ in both the LCS and RCS-Uniform falls to 1 within the same period of 30 minutes.

Comparably, the performance of LCS and RCS-uniform is similar. On a node- by-node basis, during the MIXED stress test, the SSTables per READ jumps to 1 and within a short period of 20 minutes SSTables per READ jumps to 2 and starts ﬂuctuating between 1 and 2. We can observe though the SSTables are scattered without any order in case of RCS-Uniform, SSTables Per READ is almost equal to 1 during READ heavy workload. Notice that as the SSTables are not organized into levels, write ampliﬁcation is also reduced in case of RCS-Uniform.

Random Compaction Strategy - Poisson Distribution

1. Live SSTables: As shown in ﬁgure 5.5 the ﬁrst graph depicts the Live SSTa- bles during WRITE HEAVY, READ HEAVY and MIXED HEAVY workloads on the application of RCS-Poisson. On a node-by-node basis, during WRITE stress test initially, the Live SSTables gradually increases, reaching a maximum

(43)

centring

Figure 5.5: Live SSTables & SSTables Per Read under WRITE, READ and MIXED Workloads (Random Compaction Strategy - Poisson Distribution)

of 100. During READ stress test Live SSTables falls gradually at a slower rate to 75. In the case of RCS-Poisson, the number of Live SSTables is not constant;

it keeps altering due to compactions. Compared to LCS, the Live SSTables in case of RCS-Poisson falls above 75, but in the case of LCS, the Live SSTa- bles gets stabilized at 50. During the MIXED stress test, the number of Live SSTables gradually rises and falls ultimately reaching around the neighbourhood between 75 and 100. The average size of SSTable during READ heavy workload is nearly 400MB(very near STCS). In the case of RCS-Poisson, the lower random threshold has a higher frequency. Due to this high frequency, after every compaction, the SSTable created is split into a small number of large-sized SSTables, consequently resulting in accumulation of a large number of large-sized SSTables. This is the main reason why Live SSTables are higher than LCS’s Live SSTables. The size of the random bucket is chosen according to the pseudo-random threshold generated from Poisson distribution. The mean of Poisson Distribution is set to 4. The frequency of generating random threshold equal to 4 is high.

2. SSTables Per READ: As shown in ﬁgure 5.5 the second graph depicts SSTa- bles per READ during WRITE HEAVY, READ HEAVY and MIXED HEAVY workloads on the application of RCS-Poisson. On a node-by-node basis, during the WRITE stress test, the number of SSTables per READ request is always equal to zero as no READ requests are made during the WRITE stress test.

During READ stress test initially, SSTables per READ jumps to 4 and starts

(44)

Random Compaction Strategy - Geometric Distribution

Figure 5.6: Live SSTables & SSTables Per Read under WRITE, READ and MIXED Workloads (Random Compaction Strategy - Geometric Distribution)

1. Live SSTables: As shown in ﬁgure 5.6 the ﬁrst graph depicts the Live SSTa- bles during WRITE HEAVY, READ HEAVY and MIXED HEAVY workloads on the application of RCS-Geometric Strategy. On a node-by-node basis, during WRITE stress test initially, the Live SSTables gradually increases, reaching a maximum of 100. During READ stress test gradually falls below 50. Com- pared to LCS, the Live SSTables in the RCS-Geometric case also falls near 50, but in LCS, the Live SSTables gets stabilized at 50. During the MIXED stress test, the number of Live SSTables gradually rises and falls ultimately reaching around the neighbourhood of 50. Average size of SSTable during READ heavy workload is nearly 350MB(very near to LCS and STCS). In the case of RCS-Geometric, the SSTables are not organized into levels and not compacted

(45)

based on size. After every compaction, the SSTable is split into smaller SSTa- bles, consequently which decreases the size of SSTables gradually and increases the number of SSTables. The frequency of generating lower pseudo-random number is higher than the higher pseudo-random number. This is the main reason why the Live SSTables is near to Live SSTables in LCS and higher than Live SSTables in STCS. Moreover, the size of the random bucket also varies for each compaction. The size of the random bucket is chosen as per the random threshold generated from Geometric distribution.

2. SSTables Per READ: As shown in figure 5.6 the second graph depicts SSTables per READ during WRITE HEAVY, READ HEAVY and MIXED HEAVY workloads on the application of RCS-Geometric strategy. On a node- by-node basis, during the WRITE stress test, the number of SSTables per READ request is always equal to zero as no READ requests are made during the WRITE stress test. During READ stress test initially, SSTables per READ jumps to 4 and starts immediately fluctuating between 3 and 4 within a short period of 20 minutes falls to 3 and within 15 minutes fluctuates between 1 and 2 and gets stabilized at 2. In case of RCS-Geometric the SSTables per READ stabilizes to 2 wherein the case of STCS, the SSTables Per READ on stabilization fluctuates between 3 and 4. During the MIXED stress test, the SSTables per READ fluctuates between 1 and 2. We can observe though the SSTables are scattered without any order in case of RCS-Geometric, average SSTables Per READ is almost equal to 3 during READ heavy workload and 2 during MIXED heavy workload. Compared to STCS, RCS-Geometric is performing better in terms of READ. Notice that as the SSTables are not organized into levels, write amplification is also reduced in the RCS-Geometric case.

Descriptive Statistics - Live SSTables

The values tabulated below are summary statistics computed for the range of values recorded during stress test. These values represent the number of Live SSTables and note that less number of Live SSTables indicates generated SSTables are of large size and more number number of Live SSTables indicates generated SSTables are of small size.

Strategy Total Data Points Average Std.dev.

STCS 245 17.34694 10.13263

LCS 245 69.661221 31.34282

RCS-Uniform 245 51.61224 18.62003

RCS-Poisson 245 59.98776 29.51555

RCS-Geometric 245 60.17959 26.35773

Table 5.1: Live SSTables - WRITE HEAVY Workload

As per ﬁgure 5.7 during WRITE heavy workload STCS has a least average number of Live SSTables among all strategies which indicate STCS generates large-sized SSTables compared to other strategies and LCS has a highest average number of

(46)

Figure 5.7: Average number of Live SSTables

Strategy Total Data Points Average Std.dev.

STCS 177 10.13559 0.372011

LCS 177 61.76705 0.547048

RCS-Uniform 177 29.9209 0.367941

RCS-Poisson 177 92.08475 0.453463

RCS-Geometric 177 69.14689 0.442254

Table 5.2: Live SSTables - READ HEAVY Workload Strategy Total Data Points Average Std.dev.

STCS 249 13.85141 3.487016

LCS 249 68.13253 9.136648

RCS-Uniform 249 26.37751 7.572902

RCS-Poisson 249 71.34137 6.710002

RCS-Geometric 249 44.89558 5.48432

Table 5.3: Live SSTables - MIXEd HEAVY Workload

Live SSTables among all strategies which indicate LCS generates small-sized SSTa- bles compared to other strategies. RCS-Poisson and RCS-Geometric have a similar average number of Live SSTables but slightly less than LCS. RCS-Uniform is in between STCS and LCS. During READ heavy workload STCS has the least average number of Live SSTables among all strategies. Next to STCS RCS-Uniform has the least average number of Live SSTables. And during MIXED heavy workload STCS has a least average number of Live SSTables, and RCS-Uniform has the second least