Pure Java interface to a DSMS

(1)

IT 19 090

Examensarbete 30 hp December 2019

Pure Java interface to a DSMS

Shahpar Shabani

Institutionen för informationsteknologi

Department of Information Technology

(2)

(3)

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Abstract

Pure Java interface to a DSMS

Shahpar Shabani

SCSQ (Scalable Stream Query processor) is a data stream management system (DSMS) that allows different kinds of distributed high-volume infinite streams to be queried. The current Java interface to SCSQ uses C libraries to communicate between Java and a SCSQ server. Therefore, a pure Java client-server interface to SCSQ is needed. Unlike regular databases, DSMS can process queries over infinite streams. Such continuous queries (CQs) are running until they are explicitly terminated. The interface must be able to process infinite scans of continuous query results. This master thesis implements a pure Java client-server interface to SCSQ which can handle CQs.

Tryckt av: Reprocentralen ITC IT 19 090

Examinator: Mats Daniels Ämnesgranskare: Kjell Orsborn Handledare: Tore Risch

(4)

(5)

Acknowledgement

My sincere thanks to Prof. Tore Risch who gave me an opportunity to carry out my thesis at Uppsala University, UDBL group. His help and technical guidance while working on my thesis were invaluable.

My earnest thanks to Dr. Kjell Orsborn for reviewing my Master thesis and providing his support and valuable comments, which led to the improvement of this report.

It is with immense gratitude that I acknowledge the help and support of Dr. Justin Pearson, Dr. Jarmo Rantakokko, and Prof. Mats Daniels.

Finally, I must express my very profound gratitude to my parents and my husband for their love, support and continuous encouragement throughout my years of study and through the process of researching and writing this thesis. This accomplishment would not have been possible without them.

i

(6)

(7)

List of Figures

2.1 Abstract reference architecture for a DSMS. [GO03] . . . 6

3.1 Original Java interface use case. . . 11

3.2 Pure Java interface use case. . . 12

3.3 Connections in original Java interface. . . 15

3.4 Connections in the PJI. . . 16

4.1 The CPU and memory used by two Amos II servers communicate each with one Java client which uses the PJI. . . 32

4.2 The CPU and memory used by two Amos II servers communicate each with one Java client which uses the original Amos II Java interface. . . 33

v

(10)

List of Tables

4.1 The elapsed times of four Java client application using the original Amos II Java interface to communicate each with an Amos II servers in milliseconds . . . 30 4.2 The elapsed times of four Java client application using the PJI

to communicate each with an Amos II servers in milliseconds. 30 4.3 The elapsed times of four Java clients each using the PJI to

communicate with the same Amos II server in milliseconds . 31 4.4 The elapsed times of four Java clients each using the original

Amos II Java interface to communicate with the same Amos II server in milliseconds. . . 31

(11)

Chapter 1

Introduction

Traditional databases are used where there is a need for persistent data storage and complex querying in applications. Databases represent sets of ordered objects which are rather static as queries and are more common than insertions, updates, or deletions. The results of queries reflect the current state of the database.

In the past few years, the need for a different kind of data model and queries have emerged, where information is represented in the form of streams, e.g. sensor data, Internet traffic, financial tickers, on-line auctions, and/or transaction logs. Unlike the traditional data models, a data stream is not static rather changes continuously. Consequently, the queries of these data model streams are running for over long time periods and returning new results repeatedly [GO03]. Therefore, queries over streams are called continuous queries.

For storing and analyzing the data in traditional databases a DBMS¹, such as Microsoft SQL Server, MySQL, Oracle, etc. can be used, while a DSMS² plays the same role for data streams. A data management system is accessed from software applications via a client-server API. For instance, Java applications can utilize the JDBC³ API⁴to update and query relational databases.

SCSQ [ZR06] is a DSMS that allows different kinds of distributed high- volume infinite streams to be queried. It extends the Amos II DBMS [FR08]

with representations of data streams. There are external client-server APIs to SCSQ for the programming languages C/C++, Java, Python, and Lisp.

The APIs allow queries to return infinite results as infinite scans. A scan [DR00] is a reference to the result of a query over which, the application can iterate to get new results as tuples.

1Database Management System

2Data Stream Management System

3Java Database Connectivity

4Application Programming Interface

1

(12)

Introduction

The current Java API uses the JNI⁵ to call functions which are written in C for communicating with the SCSQ server. Since JNI requires the SCSQ kernel to be linked to the application, this restriction prevents the development of pure Java applications. In this thesis, a pure Java interface is developed which has the same functionality as the previous JNI-based API, but without any dependency to C libraries to communicate with SCSQ while being capable of handling CQs as well.

5The Java Native Interface (JNI) is a framework that enables Java code to call, and to be called by native applications and libraries written in other languages, e.g. C, C++.

(13)

(14)

Chapter 2

Background

This chapter gives a short description of the concepts that are required to have a better understanding of the thesis project.

2.1 Database Management System (DBMS)

In [EN99] a DBMS is defined as “a database management system (DBMS) is a collection of programs that enables users to create and maintain a database”.

This general-purpose software system allows users and other software to store, maintain and restore data in a structured way using some data model such as the relational data model where data is represented as tables. The database and DBMS software together is called “Database System” [EN99].

DBMSs are used as a medium between the database application and the database.

The relational model was first introduced by Ted Codd of IBM Research in 1970[COD]. The model is based on the mathematical relation concept.

In this model a database is represented by a set of relations [ER99]. In a relational database data is represented by tables of uniform rows and SQL queries are used to extract these data.

The Pure Java Interface which is implemented in this thesis is essentially designed to handle the DSMS request. The API of the Pure Java Interface provides the user with the possibility to communicate with SCSQ DSMS by sending SCSQL statements and receiving the results. The API is not meant to be used for handling DBMS requests, however it can fully handle all DBMS requests.

2.1.1 Amos II

Amos II is a main memory DBMS which has been developed by UDBL group of Uppsala University in a collaboration with Linkoping University. It uses

(15)

2.2 Data Stream Management System (DSMS) Background

a functional data model which has a functional query language, AmosQL.

By the distributed multi-database facilities of Amos II many autonomous and distributed Amos II peers can inter-operate. Amos II runs on PCs under Windows and Linux, and has the single user or optional clientserver configuration.

Amos II uses a functional query language called AMOSQL¹ [FR08].

AMOSQL is a combination of DDL² and DML³. AMOSQL provides all the required syntaxes to create an Amos II functional data model.

2.1.2 Scan

A scan is a key concept in Amos II systems which is a regular LISP list containing one or more tuples and each tuple contains a piece of data in Amos II Release 16 and earlier. Scans are used to return query results.

Since the earlier versions of Amos II did not support CQs over streams, scans as LISP lists were sufficient for regular passive queries.

Unlike regular queries that result in a table, the result of a CQ is a stream. The result stream is updated as new data arrives. The data streams often have such a high rate that storing them inyo disk is not viable or even desirable. Moreover, in most cases the time is critical and the results of the CQs must be delivered as soon as possible [E.Z11a].

The DSMS and continuous querying were introduced in the Amos II release 17 and forward. In these releases, regular scans were not able to handle the query results. Therefore, the remote scan was introduced which is an address pointer to a memory buffer. The remote scan is filled as the new data arrives.

In this thesis the Amos II release 17 (SCSQ) is used as DSMS to use the continuous query support, and the Pure Java Interface implements the remote-scan concept.

2.2 Data Stream Management System (DSMS)

Although regular DBMSs are capable of handling all kind of persistent data and queries with different level of complexities, they do not support data streams⁴.

Nowadays, many applications such as network monitoring, telecommuni- cations, data management, and click stream monitoring are in need of handling streaming data. Therefore, the above-mentioned limitation of DBMSs caused researchers to extend the existing DBMS technologies to develop

1AMOS Query Language

2Data Definition Language

3Data Manipulation Language

4A data stream is an infinite set of items which are ordered according to the time.

[GO03]

5

(16)

2.2 Data Stream Management System (DSMS) Background

Data Stream Management Systems (DSMSs) which are capable of handling data streams. [Bo10]

2.2.1 DSMS Architecture

The architecture of a DSMS is demonstrated in Figure 2.1. The input monitor component controls the rate of the input streams. When the speed of the streams exceeds the defined rate, it sheds data in order to regulate the input rate [GO03].

There are three partitions to store the input data: working storage (e.g.

for window queries), summary storage for stream summary, and static storage for meta-data (e.g. physical location of each source) [GO03]. The users’

continuous queries will be registered in the query repository and will be stored into groups for shared processing. It is possible to register the one- time queries over the current state of the input stream. As soon as a continuous query is registered, a query plan will be compiled from it. The query processor communicates with the input monitor and may re-optimize the query plans in response to changing input rates. Accordingly, results are streamed to users or temporarily buffered. [GO03]

Figure 2.1: Abstract reference architecture for a DSMS. [GO03]

The main effort in this thesis is to provide a pure java interface to SCSQ which is a DSMS based on Amos II and developed by UDBL group of Up- psala University. SCSQ is described later in the following sections.

2.2.2 Continuous query

In traditional databases, the regular passive queries run over statically stored data and deliver results on demand. For querying data streams it is nec- essary to use CQ (continuous queries), which run over streams and continuously deliver results as new data arrive. For instance, if an instrument

(17)

2.3 SCSQ Background

continuously delivers streams of patients’ heart rate readings of patients, a continuous query could be:

“Continuously show me the heart rate readings for patient X in ICU exceeding what is recommended.”

If the CQ is expensive, results may be delayed by the time to do the computations. If the computation time is longer than the input stream rate, the delays accumulate, which prevents the system from keeping up with the input stream rate. A solution would be load shedding [ea03], i.e. dropping data from the input stream that cannot be processed in time. Dropping data in networks is performed typically based on timestamps, priority bits, or randomly. Though load shedding mechanisms have similarities in the concept, but they could have differences as well. For instance, Aurora [ea03]

can make more intelligent decisions in comparison to the other network load shedding mechanisms as it keeps track of the state of the entire system.

However, it is not an option when data loss is not acceptable, and the execution of queries will cause a scalability problem. A possible solution for providing scalability of CQs with expensive operations over high-volume streams is parallelizing the CQs execution where input streams are divided into parallel sub-streams, over which expensive query operations are executed continuously. [E.Z11a]

In the SCSQ DSMS the parallelization approach is used, and SCSQL which is the continuous query language for SCSQ supports parallelization of CQs execution. [ea11]

2.3 SCSQ

For processing data streams with high volume data e.g. the streams gener- ated by radio telescopes and sensor networks, a scalable stream processing is required.

SCSQ (Scalable Stream Query Processor) is a highly parallel DSMS on top of Amos II [Keg09] which is developed to address this issue. It extends Amos II with the capability of utilizing CQs over data streams.

SCSQL is a continuous query language which is extending SQL with streams and SPs⁵ as first-class objects which makes the user able to specify the query parallelization. Using SCSQL, user is able to link subqueries to SPs. Large parallel computations are defined sets of subqueries which are executed on sets of SPs. In SCSQL, a stream is an object which is representing sequences of any kind of objects e.g. the result of a continuous sub-query. Continuous sub-queries are assigned to SPs. [E.Z11a]

5Stream Process

7

(18)

2.4 Java API Background

2.3.1 Scan

The scan concept in a DSMS is comparable to a DBMS scan. The main difference is that the scans are infinite since CQs may continue forever.

2.4 Java API

As mentioned in earlier chapters, there are external interfaces between Amos II and external programming languages. For developing Amos II applications, the Amos II Java interface is the most convenient way as this interface provides better error handling and memory management. For more advanced system extensions and time critical use cases, there are also C and Lisp interfaces availabe. [DR00]

Compare C and Lisp interface:

• The Java interface has the best Error handling. Java code errors cannot cause Amos II to crash. Though, using Lisp interface is much safer than the C interface, but it is not as protected as the Java interface.

• The Java programming language includes a wide range of libraries e.g.

web accesses, which can be interfaced from Amos II using the Java interface.

Disadvantages of the Java interfaces:

• It is slower, and more limited.

• The connection to Amos II in the Java interface is made through the name server. In the case of multiple servers, it could be a possible bottleneck (This problem has been considered in this thesis, and a newer solution is implemented which will be explained in the next chapter).

• The Java interface uses DLL C libraries to communicate with Amos II. Therefore, it cannot be used in the case when C compiler is not available e.g. Java applets.

There are two main kinds of external interfaces [DR00]:

1. callin: where a Java program can call Amos II similar to the call level interfaces for relational databases. It offers two ways to call Amos II:

(a) The embedded query interface: in which a Java method is provided for passing AMOSQL statements to Amos II to be evaluated dynamically which is the only available interface to the Pure Java Interface. It includes the methods to collect the dynamically evaluated AMOSQL query results. This interface is rather slow due to the time required for parsing and compiling the AMOSQL statements at run time.

(19)

2.4 Java API Background

(b) The fast-path interface: in which predefined Amos II functions are implemented and executed as Java methods. This interface does not have the overhead of AMOSQL statement parsing and executing. Due to the client-server design of the PJI⁶, this interface has not been implemented in PJI.

Unlike the Java API, the embedded query interface is the only interface which is implemented in the Pure Java Interface.

Additionally, there are two options for handling the connection between Java programs and Amos II [DR00]:

(a) The tight connection: where Amos II is linked directly with a Java program. In this way, Amos II would be an embedded database in the Java program. Due to the client-server design of the pure Java Interface, this interface has not been implemented in Pure Java Interface.

(b) The client-server connection: where a Java application connects to an Amos II server as a client over the TCP/IP protocol. Unlike the tight connection, these connections are not limited to a single application, rather multiple applications can connect to the same Amos II server in parallel. The Amos II server and the Java applications clients run as individual programs. On the other hand, the overhead of accessing Amos II from a different program can make the communication significantly slower than the tight connection. However, the current implementation of the client- server connection needs Amos II kernel code (in C) which is called from Java. Therefore Amos II Java application clients can not be applets.

2. callout : where Amos II functions can call Java methods. These foreign Amos II functions are implemented by one or multiple Java functions.

Due to the design of the pure Java Interface, this interface has not been implemented in Pure Java Interface.

A requirement in the current project, was that the PJI development must be fully compatible with the current Java interface.

6Pure Java Interface

9

(20)

(21)

Chapter 3

Pure Java interface to SCSQ

The main goal of this thesis is to develop a new TCP/IP Client/server pure Java interface to SCSQ which makes it possible to handle indefinite result streams from CQs. From the user point of view, the functionality of the interface conforms to the callin library of the Amos II Java interface. The current chapter describes the implementation of the PJI¹ to SCSQ.

Figure 3.1: Original Java interface use case.

The available Java interface embedds SCSQ kernel C libraries which makes applications use JNI as illustrated in Figure 3.1. In contrast, as it is shown in Figure 3.2, the PJI provides applications with the possibility to interact with SCSQ over a standard TCP/IP connection with no need to use JNI. Using PJI, applications are able to send continuous queries to the servers, and to receive the results. Like the other SCSQ APIs, the PJI provides the developers with several wrapper functions to send queries and receive results.

1Pure Java interface

11

(22)

3.1 Overview Pure Java interface to SCSQ

Figure 3.2: Pure Java interface use case.

3.1 Overview

The PJI provides the users with the API to connect to running SCSQ servers which are already in the SCSQ name server registered and communicating via SCSQL CQs.

3.1.1 Connection

In order to connect to a SCSQ server via the API provided by the PJI, it is enough to know the name of the SCSQ server only. A stable connection between the client and the server plays a major role in DSMS, since a CQ may result in an indefinite stream coming from the server. Therefore, the PJI assures that once the connection is made the whole communication can be performed through the same connection as long as the connection is open.

3.1.2 Communication

As mentioned in earlier sections, the communication between a client and the SCSQ server is done by sending SCSQL statement to the server and receiving Scans as results. The PJI can simply accept the SCSQL as a Java string, send it over to the SCSQ server, and return results in the shape of so called Scan objects. The Scan is a Java object which comes with Java methods and properties to make it easier for user to iterate over the results, analyze or store them.

The following section provide more details about the implementation of the PJI.

(23)

3.2 PJI Implementation Pure Java interface to SCSQ

3.2 PJI Implementation

The following sections give an overview of the implementation of the PJI to SCSQ.

3.2.1 Parsing CQ results

The communication between the Pure Java Interface and SCSQ is done by exchanging text over TCP/IP sockets. The CQs need to be sent as strings, but returning results are received as Scans which are represented as S-expressions. [web16]

A Lisp interpreter implemented in Java plays a vital role here. It parses scans elements received from the SCSQ server. The elements may contain complex nested object structures. For instance:

("*hierarchy*",{#[OID 0 "OBJECT"],{#[OID 1 "TYPE"], {#[OID 2 "STOREDTYPE"]},{#[OID 3 "DERIVEDTYPE"], {#[OID 6 "IUT"]}},{#[OID 39 "MAPPEDTYPE"],

{#[OID 40 "PROXYTYPE"]}}},{#[OID 4 "FUNCTION"], {#[OID 5 "RELATION"]}},{#[OID 7 "LITERAL"], {#[OID 8"NUMBER"],{#[OID 9 "INTEGER"]},

{#[OID 10 "REAL"]}},{#[OID 11 "CHARSTRING"]}, {#[OID 12 "BOOLEAN"]},{#[OID 13 "BIT"]}, {#[OID 19 "BINARY"]},{#[OID 25 "PORT"]},

{#[OID 560 "TIMEVAL"]},{#[OID 585 "TIMEINTERVAL"], {#[OID 586 "BOTH_CLOSED_TIMEINTERVAL"]},

{#[OID 587 "LEFT_CLOSED_TIMEINTERVAL"]}, {#[OID 588 "RIGHT_CLOSED_TIMEINTERVAL"]},

{#[OID 589 "BOTH_OPEN_TIMEINTERVAL"]}},{#[OID 607 "TIME"]}, {#[OID 615 "DATE"]},{#[OID 1073 "COMPLEX"]},

{#[OID 1222 "EXPRESSION"]}},{#[OID 14 "DATASOURCE"], {#[OID 15 "AMOS"]},{#[OID 1221 "WRAPPER"]},

{#[OID 1294 "RELATIONAL"],{#[OID 1377 "JDBC"]}}}, {#[OID 16 "COLLECTION"],{#[OID 17 "BAG"],

{#[OID 89 "BAG-VECTOR-NUMBER"]},

{#[OID 97 "BAG-(VECTOR,OBJECT)"]},{#[OID 105 "BAG-VECTOR"]}, {#[OID 227 "BAG-NUMBER"]},{#[OID 509 "BAG-FUNCTION"]},

{#[OID 513 "BAG-TYPE"]},{#[OID 756 "BAG-DATE"]}, {#[OID 825 "BAG-(INTEGER,NUMBER)"]},

{#[OID 848 "BAG-INTEGER"]},

{#[OID 864 "BAG-(NUMBER,VECTOR-NUMBER)"]}, {#[OID 870 "BAG-(INTEGER,OBJECT)"]},

{#[OID 872 "BAG-(VECTOR-NUMBER,OBJECT)"]}, {#[OID 878 "BAG-(INTEGER,VECTOR-NUMBER)"]}, {#[OID 882 "BAG-BOOLEAN"]},

13

(24)

{#[OID 999 "BAG-(OBJECT,VECTOR-NUMBER)"]}, {#[OID 1005 "BAG-(NUMBER,OBJECT)"]},

{#[OID 1140 "BAG-(OBJECT,OBJECT)"]}, {#[OID 1189 "BAG-CHARSTRING"]},

{#[OID 1412 "BAG-(CHARSTRING,CHARSTRING)"]}, {#[OID 1510 "BAG-(CHARSTRING,VECTOR)"]}, {#[OID 1515 "BAG-INDEXREWRITERULE"]}, {#[OID 1557 "BAG-SSFTRANSRULE"]}, {#[OID 1576 "BAG-INDEXKEYGENERATOR"]},

{#[OID 1595 "BAG-SSFPARAGENERATOR"]}},{#[OID 18 "VECTOR"], {#[OID 87 "VECTOR-NUMBER"]},

{#[OID 91 "VECTOR-VECTOR-NUMBER"]},

{#[OID 123 "VECTOR-TYPE"]},{#[OID 579 "VECTOR-INTEGER"]}, {#[OID 660 "VECTOR-VECTOR"]},

{#[OID 699 "VECTOR-CHARSTRING"]}, {#[OID 854 "VECTOR-REAL"]},

{#[OID 866 "VECTOR-(NUMBER,VECTOR-NUMBER)"]}, {#[OID 884 "VECTOR-FUNCTION"]},

{#[OID 1161 "VECTOR-RECORD"]}, {#[OID 1277 "VECTOR-EXPRESSION"]},

{#[OID 1508 "VECTOR-LITERAL"]}},{#[OID 47 "SCAN"]}, {#[OID 67 "TUPLE"]},{#[OID 1137 "RECORD"]},

{#[OID 1691 "NUMARRAY"],{#[OID 1692 "DARRAY"]}, {#[OID 1693 "CARRAY"]},{#[OID 1694 "IARRAY"]}, {#[OID1695 "FARRAY"]}}},{#[OID 38 "OPAQUE_PROXY"]},

{#[OID 41 "USEROBJECT"],{#[OID 1493 "INDEXREWRITERULE"]}, {#[OID 1530 "SSFTRANSRULE"]},

{#[OID 1559 "INDEXKEYGENERATOR"]}, {#[OID 1578 "SSFPARAGENERATOR"]}}})

which is a result of the following simple query:

get_type_structure(typenamed(’OBJECT’));

To be able to handle such complex objects, the READER method re- cursively processes each received S-expression to construct a corresponding Java representation. The READER methods used in PJI can detect nor- mal strings, numbers, vectors, and special characters, as well as the special SCSQ representation of object identifiers. Hence the Lisp interpreter does the un-marshal of the received CQ results, and iteratively delivers the query results to the applications as a stream of Java objects.

The PJI is communicating with SCSQ over TCP/IP sockets. For querying from SCSQ, a SCSQL should be sent. A query is defined as a string by the user. But it needs to be combined with a SCSQ statement in the form of:

(25)

(open-query-scan-server "<The query ended with ’;’>") In programming languages like Java, if a string containing special characters e.g. spaces, quotes, backslash, etc is sent over a TCP socket connection, the recipient will not receive the string correctly. Hence, it is impossible to send the upon combination as a plain text, since it contains double-quotes.

In order to be able to send a double-quotes in a string over a TCP socket connection, it has to be escaped with a backslash.

The PJI needs to handle these special strings in order to deliver correct SCSQL queries to the SCSQ server. On the other hand, the responses to queries coming as strings from SCSQ server may contain special characters too. This is the place in the PJI where Marshal is used, as the CQ string has to be divided into smaller strings, parsed, processed and sent over the connection.

The marshal mechanism implemented in the PJI splits the string at the point where there is a special character, processes the special character and adds the pieces to the buffer. Once the whole string is completely added to the buffer, the buffer is flushed into the socket channel.

3.2.2 Name server

The name server contains general meta-information. The information in the name server is managed without explicit operator intervention; its content is managed when a connection is established. [RJK03]

Figure 3.3: Connections in original Java interface.

15

(26)

Figure 3.3 illustrates the Java client connections to Amos II server using the original Java interface. The Amos II servers have to be defined and initialized in the name server, thereafter all traffic goes through the name server. As mentioned earlier, this is the point which under huge traffic (for instance each client registers and communicates with several Amos II servers), the communication may slow down or crash.

The new implementation in the PJI is shown in Figure 3.4. In this implementation, since the Amos II servers have to be defined and initialized in the name server, there is a need to communicate with the name server to get the assigned port number of Amos II servers. These ports are assigned dynamically and randomly to Amos II servers by the name server, and the only way to get server port numbers is to request the name server using an AMOSQL query.

Therefore, each Java client which is interfaced to Amos II via the PJI makes a connection to the name server, gets the port number of the Amos II server, terminates the connection to the name server, and connects to the server. This implementation reduces the overhead of traffic in the name server.

Figure 3.4: Connections in the PJI.

3.2.3 Client Server connection

The connections to Amos II databases are represented by a Java class in the PJI called “Connection”. Here are the constructors of the connection class:

(27)

public Connection(String serverName)

public Connection(String serverAddress, String serverName) public Connection(String serverAddress, String serverName,

String dbName)

The first constructor which asks for the Server name only, is expected to make a connection to a local database. If dbName is empty the connection would be considered as a tight connection to the embedded database; other- wise dbName must contain an Amos II mediator peer name which is defined in the Amos II name server and running on the same computer as the Java application. In the case that the Java application is to be connected to an Amos II server which is managed by a name server and running on a remote computer, the next two constructors are to be used. The constructors ask for server name, network address, and network port using the following mechanism:

Scan sc = null;

int port=0;

this.scToSCSQ = new SocketToSCSQ(serverAddress, serverName,35021);

try {

sc=this.execute("all_amosinfo();");

} catch (Exception e) { e.printStackTrace();

}

spf= new ServerPortFinder(sc, serverName);

port= spf.portNumber;

this.scToSCSQ.closeSocketToSCSQ();

1. Create a socket to name server. The name server uses port number 35021 statically.

2. Send the following query to the name server:

"all_amosinfo();"

3. Receive a Scan containing the information about all the connected servers in the form of:

[(<Server name>, <Computer name>, <Port number>) ...]

Where brackets show the whole Scan in which several Tuples wrapped by parenthesis contain the Server information. In the following sections, it is described how these information are parsed and used.

4. Parse the Scan, and get the port number of the desired server.

17

(28)

5. Close the socket to the name server.

To terminate a connection the following method can be used:

void disconnect() throws AmosException;

Also, connections will be terminated automatically when Java deallocates them.

3.2.4 CQ results

CQs can produce an infinite number of results which are run over the data streams delivering results continuously. Here is a simple CQ:

create function filter() -> bag of number

as for each number x where x in sin(in(heartbeat(0.001))) if x > -1 then return x;

The function heartbeat(0.001) delivers a range of float number starting from 0 and increased by 0.001 every 0.001 second indefinitely. Then, this float number is passed as an input for the SIN function. The filter function looks into the result of the SIN function and returns the output as soon as heartbeat delivers and if the result is over the defined threshold.

The results of the CQ are delivered as scans. As the result of the CQ grows over time, the size of the scan should augment indefinitely as well. The CQ in the above example is quite simple and delivers only a float number every 0.001 second in the worst case. In the real cases a CQ may deliver a more sophisticated result with a much higher rate. Now, the question is how SCSQ and the PJI can afford to handle the result of CQs. As mentioned in the Background chapter, the key point for SCSQ is the remote scan. The following section describes how the remote scan is implemented in the PJI.

3.2.4.1 Remote scan

SCSQ returns a scan as the result of either an embedded query or a function call through a fast-path interface like Amos II, which can be considered as a serialized table. Furthermore, it may include several tuples which can be considered as the rows of the table.

The difference between SCSQ and Amos II in handling scans is that in Amos II, a scan is a piece of memory that is filled by a static query once, and SCSQ due to the fact that the CQ results come continuously, defines a dynamic buffer in memory and fills it with the result data coming from CQs. These scans in SCSQ are called remote-scans. SCSQ always returns the pointer address to the beginning of a remote-scan as a result of a CQ execution.

The PJI implements remote-scans. When it is asked to execute a CQ and return the result, it executes the CQ and receives the pointer address

(29)

to the beginning of the remote-scan. Then, it fetches the data from the remote-scan and delivers them as Java objects. When the interface is asked to deliver the next row of the remote-scan, it looks into the remote-scan buffer and fetches the next row if available. Therefore, in the case that a CQ execution produces an indefinite result set, the remote-scan buffer is re-filled by SCSQ as long as there is still data and the PJI uses the whole re-filled buffer which may contain one or more tuples to deliver the coming CQ results to the clients at once.

To access the content of a remote-scan, the following mechanism is used by the pure Java Interface:

1. Create and fill a local scan by sending the following Lisp statements to the server:

"(setq <"s" + scanPointer> (gethash

<scanPointer> _scan-list_))"

"(scan-fillbuffer <"s" + scanPointer>)"

"(setq <"b" + scanPointer>

(scan-buffer <"s" + scanPointer>))"

"(scan-setcurrent <"s" + scanPointer> (buffer-peek

<"b" + scanPointer>))"

Every local scan and its buffer need to have a unique name to avoid mixing CQ results. Therefore, the <"s" + scanPointer$>$ is used to form a unique name for the local scan, and the <"b" + scanPointer$>$ is used for forming a unique buffer name. The local scan contains up to 20 tuples. Once Tuples are popped out from the local scan, it is filled by the next tuples from the remote-scans.

2. To pop the next row of the local scan, the following method can be used:

void nextRow() throws AmosException It asks the next row from SCSQ by sending:

"(scan-nextrow <"s" + scanPointer>)"

These are the methods of the Java scan which allow the client to fetch the data from the created scan:

1. The method to get the current row of the local scan:

Tuple getRow() throws AmosException 2. To check the end-Of-scan:

boolean eos() throws AmosException 19

(30)

Here is an example that shows how to iterate through all tuples in a scan:

theScan = theConnection.execute("select name(t) from type t;");

while (!theScan.eos()) {

Tuple row;

String str;

row = theScan.getRow();

str = row.getStringElem(0);

theScan.nextRow();

}

When all the rows of the remote-scan are extracted, or the Garbage Collector de-allocate it, the Java Scan is automatically closed. But, using the following method, is safer and cleaner:

void closeScan() throws AmosException

Since the method releases the memory space both locally and remotely one, leaving the scan to be closed by the GC could cause memory overflow problem in the Amos II servers.

In the PJI, there exist two constructors to create a scan:

public Scan()

public Scan(int theScan, Connection theConnection) Using the first constructor, an empty Java scan will be created which is only a Java object. This empty Java scan does not belong to any connection, and no scan identifier or buffer in Amos II is assigned to it.

3.2.4.2 Tuple

Tuple is a SCSQ object and Amos II data type. It can contain any SCSQ data type. In the pure Java Interface, a Java class called Tuple represents a Tuple. It contains an ordered series of Amos II database objects which is used in many different cases such as [DR00]:

• When iterating over scans, each row is represented by a Tuple.

• As argument lists in function calls.

• As an array of data values.

In the PJI a tuple is implemented by Java Vectors. There are different constructors defined for the Tuple object:

Tuple()

(31)

This creates an empty Tuple. Since Tuples can be used as arguments in function calls, the PJI facilitates creating a Single value Tuple by offering the following constructors:

Tuple(Oid obj) throws AmosException Tuple(String str) throws AmosException

There exists a variable called Arity which keeps the number of the elements inside the Tuple. There are two methods setArity() and getArity() to get and set the Arity of a Tuple.

Moreover, there is another constructor for Tuple to define a new tuple by the given Arity and use the Tuple constructor:

Tuple(int arity);

The PJI includes a couple of methods to convert a Tuple data to different Java data types. The set functions can be used to either create a Tuple containing a specific data type e.g. String, or set a specific data type in an existing Tuple. The get functions are useful when Tuple data with a specific data type are going to be read. These functions are listed in Appendix A.

3.2.5 Embedded Query Interface

The embedded query interface accepts AMOSQL statements as strings and sends them to the Amos II kernel to be executed:

Scan execute(String query) throws AmosException;

The following example shows the execution of an AMOSQL statement theScan = theConnection.execute(

"select name(t) from type t;");

The query contains AMOSQL select statement ended with ’;’. A scan object which is associated with the connection will be returned by the execute method. A scan object is an iterable stream of tuples. There exist special functions through the interface that can iterate over Scan objects.

As the scan is a stream and continuously growing, it may contain a large number of Tuples (rows). In the case where it is required to restrict the size of the returned Scan object, there is a different form of “execute” method in which the parameter stopAfter specifies the maximum number of tuples in the returned Scan:

Scan execute(String query, int stopAfter) throws AmosException;

Executing the query in the last example with stopAfterwill result in a set of 10 type names:

theScan = theConnection.execute(

"select name(t) from type t;", 10);

21

(32)

3.2.5.1 Single tuple results

In some cases the result of a query is a Tuple only. Therefore to get the result, no iteration is required, and based on the expected result value type, one of the get element methods can be used. For instance, the result of the following query is only the concatination of a and b as a tuple containing a string element:

str = theConnection.execute("concat(’a’,’b’);",1).getRow .getStringElem(0);

3.2.5.2 String elements

The following function makes it possible to get a Tuple data as a Java String data type.

String getStringElem(int pos) throws AmosException;

It fetches the element located at position pos of the Tuple and returns it as a Java string value. It raises an error when the Tuple element is not a string, or the specified position is out of range. And here is the overloaded set function to set a String element at the given position of a Tuple:

void setElem(int pos, String str) throws AmosException;

It will raise an error if the position is out of range. The following example shows how a Tuple containing a single string is created:

Tuple tp = new Tuple(1);

tp.setElem(0,"HELLO WORLD");

3.2.5.3 Integer elements

For Integer elements, there are get and set functions as well. The get function is as follows:

int getIntElem(int pos) throws AmosException;

The arguments pos is the position in the tuple where the Integer element is going to be fetched. It raises the corresponding error in the specified position of the Tuple is no Integer, or the specified position is out of range.

The following overloaded set function can set an Integer value at the given position pos.

void setElem(int pos, int integer) throws AmosException;

It raises an error if position is out of range.

(33)

3.2.5.4 Floating point elements

There are different functions to get and set a double precision floating point value at a specific position. Here is the get function:

double getDoubleElem(int pos) throws AmosException;

A corresponding error would be raised if the specified Tuple item is not a real number, or the position is out of range. The overloaded set function to store the given value at a specific position pos of the Tuple.

void setElem(int pos, double dbl) throws AmosException;

3.2.5.5 Object elements

The function to get a proxy for the object located at the position pos of the Tuple:

Oid getOidElem(int pos) throws AmosException;

An Amos II object proxy is returned which is representing all data structures that are supported by Amos II storage manager, including literals and OIDs.

A corresponding error would be raised if the specified Tuple item is not a real number, or the position is out of range.

The overloaded set function to store the given value at a specific position pos of the Tuple.

void setElem(int pos, Oid obj) throws AmosException;

3.2.5.6 Sequences

Amos II includes collections of other objects with a significant order called Sequence which are represented as Amos II database Vector type. Also, in Pure Java Interface, they are represented as the object of Tuple Java class.

The get function for getting a sequence item at a position pos of the Tuple is as follows:

Tuple getSeqElem(int pos) throws AmosException;

A corresponding error would be raised if the specified Tuple item is not a real number, or the position is out of range. And the overloaded set function to set to store an instance of a Tuple as a sequence in position pos of a tuple:

void setElem(int pos, Tuple tpl) throws AmosException;

23

(34)

3.2.5.7 Data type tests

the pure Java Interface offers the following methods to check the data type of an element at the position pos in a Tuple:

boolean isString(int pos) throws AmosException;

boolean isInteger(int pos) throws AmosException;

boolean isDouble(int pos) throws AmosException;

boolean isObject(int pos) throws AmosException;

boolean isTuple(int pos) throws AmosException;

3.2.5.8 The Fast-Path Interface

The fast-path interface provides Java developers with the possibility to call Amos II functions. There exist primitives to build up the argument lists of Java data types and pass them into Amos II function calls as parameters. In fast-path interface, the result is always a scan and the results can be received from an embedded query call. Here is an example of Amos II function call using the fast-path interface:

from Java:

Tuple argl;

String res;

rgl = new Tuple(1);

argl.setElem(0, "HELLO WORLD");

res = theConnection.callFunction(

"charstring.lower->charstring",argl).getRow() .getStringElem(0);

In this example, the Amos II function charstring.lower->charstring is called which is converting a string to lower case. A String is added to the arg1 which is a Java Tuple, and it is passed as a parameter to this Amos II function call.

Argument lists which are representing fast-path Amos II function call parameters, are represented by instances of the Java Tuple class. Notice that the Parity of the Tuple which is to be sent as parameter (the argument list) to the Amos II function call must be set correctly according to the function. Moreover, the elements of this Tuple have to be set using the correct overloaded set function, and in the right order according to the Amos II function.

After preparing the argument list, the Amos II function fnName can be called by one of the following functions:

Scan callFunction(String fnName, Tuple fnArgs) throws AmosException;

Scan callFunction(String fnName, Tuple fnArgs, int stopAfter) throws AmosException;

(35)

The difference here between these two methods is that the first method returns the whole result of the function call in a Scan. But like Embedded query execution, the PJI offers another variant of Amos II function call which makes it possible to limit the size of the results in the returned Scan.

It should be considered that the PJI stores the link between the Amos II function name and the corresponding function proxy which represents the database function. Therefore:

• The first call to callFunction accesses the database to get the function proxy and the subsequent calls will then be very fast.

• The callFunction will not be affected when the function name is changed between two calls.

When required, the following Java function can clear the function proxy cache:

void clearFunctionCache();

It is possible to get the function proxy using the following Java function:

Oid getFunction(String fnName) throws AmosException;

which accepts the function name as parameter and returns the corresponding function proxy. It will raise an error if the function does not exist.

The callFunction is overloaded, and there exist other variants of call- Function which accept the function proxy as parameter instead of function name:

Scan callFunction(Oid fnObject, Tuple fnArgs) throws AmosException;

Scan callFunction(Oid fnObject, Tuple fnArgs, int stopAfter) throws AmosException;

3.2.5.9 Creating and Deleting Objects

the pure Java Interface offers a method to create a new Amos II database object TypeName:

Oid createObject(String TypeName) throws AmosException;

A type proxy can be obtained by getType method:

Oid getType(String typeName) throws AmosException;

which accepts the type name as parameter and returns the corresponding type proxy. It will raise an error if the type does not exist.

The createObject is overloaded, and there exist other variants of cre- ateObject which accept the type proxy as parameter instead of type name:

25

(36)

Oid createObject(Oid type) throws AmosException;

It accepts the type name of the new type which is to be created as parameter, creates the database object and returns its object proxy.

Here is the method to delete a database object:

void deleteObject(Oid theObject) throws AmosException;

(37)

Chapter 4

Reflections and Evaluation

As it is described earlier, the PJI aims to improve two major limitations in the original Amos II Java interface: dependencies on JNI libraries, and the data traffic through the name server. In this chapter, the PJI is analyzed to find out how the PJI reflects on these two limitations and evaluate the performance of the PJI.

4.1 Reflections

In the following, it is described briefly that how the PJI reflects on the upon mentions limitations and what solutions the PJI proffers in order to improve the limitations.

4.1.1 JNI dependencies

In the original Amos II Java interface, the JNI libraries are used in order to call the native C functions of the Amos II to handle the transactions.

However, it is a reliable solution, but it limits the users from utilizing the original Amos II Java interface in web applications. The PJI communicates with Amos II servers over network sockets to handle the transactions and it uses no non-Java library. Therefore, the PJI can be freely used in any Java application.

4.1.2 Data traffic in name server

A Java application which uses the original Amos II Java interface communicates with an Amos II server over the name server where the Amos II is registered. In the case, the Amos II server is a DBMS which handles few database requests, both Amos II server and name server can handle the data traffic flow perfectly. However, this setup may lead to a high load of data traffic on the name server when the Amos II server is a DSMS that handles CQs over high rate streams. The situation can deteriorate when

27

(38)

4.2 Evaluations Reflections and Evaluation

several Amos II servers with multiple clients try to handle CQs over high rate streams.

As it was described in the preceding chapter, the PJI communicates with the name server only once in order to fetch the required connection information to connect to the Amos II server. Then, it establishes a direct socket to the Amos II server to communicate with the Amos II server and does not communicate with the name server any longer.

4.2 Evaluations

In this section, the performance of the PJI is evaluated by means of simu- lating a data stream and executing a CQ over this stream. The evaluation test is performed once using a Java application which utilizes the original Amos II Java interface, then using a Java application which utilizes the PJI in order to compare the performance of the original Amos II Java interface and the PJI. It needs to be stated that both Java applications are running on the same hardware and OS.

The simulation of the stream and the CQ both are implemented by using the following AMOSQL code:

create function myiota(Number l, Number u, Number s) -> Bag of Integer "

as begin declare Integer r;

set r = l;

while true

do return r;

set r = r + s;

if r > u

then set r = l;

end while;

end;

create function FilterFunc() -> bag of number

as for each number x where x in myiota(-100,100,1) if x > -95 and x < 95 then return x;

The stream data is a bag of integers in the range of (-100, 100) with the increase rate of 1 which are produced indefinitely. The CQ is rather a simple filter that filters out the receiving data which are not within the range of (-95, 95).

The Java application connects to the Amos II server each time using one of the Amos II Java interfaces, registers upon Amos II functions, then iterates over the CQ results. Although the stream tends to run indefinitely, the Java demo application is designed to iterate over the CQ results only

(39)

for the given limited number of iterations. The following Java implementation performs the iterations and measures the elapsed time of the whole iterations:

startTime = Instant.now().toEpochMilli();

while (!s.eos()) {

tpl = s.getRow();

s.nextRow();

i++;

if (i > Integer.parseInt(this.nofIterations)) break;

}

endTime = Instant.now().toEpochMilli();

System.out.println("The elapsed time for " + this.nofIterations + " is " +

(endTime - startTime) + "ms");

4.2.1 Elapsed time

Two main tests were performed to evaluate the response time of both Java interfaces. Both tests measure the elapsed time in the Java client to iterate over the CQ results. The first test aims to compare the response time of the original Amos II Java interface and the PJI when there are multiple Amos II servers communicate each with one client. The next test aims to compare the response time of both Amos II Java interfaces when there are four clients communicating with the same Amos II server.

4.2.1.1 Multiple servers

The first test evaluates elapsed times for a different number of iterations over CQ results when all four servers run at the same time and each communicates with one client. Table 4.1 shows the elapsed time when the client uses the original Amos II Java interface and Table 4.2 contains the elapse times when the client uses the PJI to communicate with the Amos II server.

The elapsed times shows a rather low difference in lower iterations i.e. 10 and 100 iterations. When checking the result of 1000 iterations and higher tests, it is apparent that the PJI has faster response time by a factor of 10 or even in higher iterations by a factor of 20 compared to the original Amos II Java interface.

The result of this experiment stipulates how the PJI contributes to elim- inting the possible bottleneck by creating a direct connection between a client and a server. As described in earlier chapters, the name server needs to handle the whole data traffic between clients and servers when using the original Amos II Java interface.

29

(40)

Iterations Server A Server B Server C Server D

10 14 13 14 15

100 144 142 141 147

1000 1421 1419 1426 1427

10000 12147 11874 11982 12187

100000 116264 116188 116154 116204

Table 4.1: The elapsed times of four Java client application using the original Amos II Java interface to communicate each with an Amos II servers in milliseconds

Iterations Server A Server B Server C Server D

10 11 12 9 11

100 60 68 63 65

1000 207 198 203 206

10000 1657 1634 1668 1683

100000 68084 68267 68821 67102

Table 4.2: The elapsed times of four Java client application using the PJI to communicate each with an Amos II servers in milliseconds.

Moreover, it is clearly shown in this test that the client/server connection made by using the PJI has a faster response time when the stream runs for a longer time in comparison to the original Amos II Java interface. The longer the stream runs, the higher will be the load on the name server process when slows down the name sever over time.

4.2.1.2 Multiple clients

In this test, one Amos II server communicates with four Java client applications. Table 4.3 contains the elapsed times for each Java client application that utilizes the PJI to communicate with the server, and Table 4.4 includes the elapsed times for each Java client applications that uses the original Amos II Java interface to communicate with the server.

The PJI has again a faster response time though the difference is not as high as the previous test. The result of this test indicates that the direct client/server connections made by the PJI handle the traffic faster. However, in the case that the original Amos II Java interface is used, the load on the name server is higher when there are multiple clients connecting to variant servers, since the name server need to route the traffic from several clients to several servers. Therefore, the PJI cannot be extremely faster in this test as it was in the preceeding experiment.

(41)

Iterations Client 1 Client 2 Client 3 Client 4

10000 64590 64572 64302 64460

Table 4.3: The elapsed times of four Java clients each using the PJI to communicate with the same Amos II server in milliseconds

Iterations Client 1 Client 2 Client 3 Client 4 10000 119204 118820 118852 118748

Table 4.4: The elapsed times of four Java clients each using the original Amos II Java interface to communicate with the same Amos II server in milliseconds.

4.2.2 Resource usage

In this test, the CPU and memory used by the name server and two Amos servers communicate each with one Java client is measured.

Figure 4.1 shows the CPU and memory usage of two Amos II servers each communicating with a Java client application that uses the PJI. However, memory usage is not considerable, but the CPU usage is increasing gradually, though it is significantly low in the beginning.

Figure 4.2 describes the CPU and memory usage of two Amos II servers, each communicating with a Java client application that uses the original Amos II Java interface. The result indicates that the memory usage is not considerable similar to the case where the PJI is used by the Java client, but the CPU usage is slightly different here. In contrast to the other case where the PJI is used by the Java client, the CPU usage is higher in the beginning, but the increase rate of the CPU usage is lower.

4.2.3 Summary

As the results of tests suggest, the PJI is faster to handle CQs over streams compare to the original Amos II Java interface. However, the PJI does not proffer a considerable improvement concerning the CPU and memory usage.

31

(42)

Figure 4.1: The CPU and memory used by two Amos II servers communicate each with one Java client which uses the PJI.

(43)

Figure 4.2: The CPU and memory used by two Amos II servers communicate each with one Java client which uses the original Amos II Java interface.

33

(44)

Chapter 5

Conclusion and Future Work

5.1 Conclusion

The Amos II project is offering interfaces to a number of commonly used programming languages that leverages the ease of use for the Amos II DBMS.

Now, due to the increasing demand for handling data streams in different fields, it is likely that there will be a much more interest for DSMSs. There- fore, SCSQ as a DSMS needs to be equipped with interfaces to existing programming languages. The PJI can fill the gap for the Java developers who are going to develop pure Java applications like Applets. Moreover, it has been tried here to not change the existing Amos II Java API interfaces.

From the pure Java Interface user point of view, the provided methods look the same. It can be quite helpful for the developers who have been using the Amos II Java API already to exchange the Amos II Java library with the pure Java Interface library.

This thesis work delievers the PJI as an Amos II Java interface offering the same API as original Amos II Java intreface which are impleneted by pure Java libraries soley. Besides, the PJI improves the communication between Java clients and Amos II servers by creating a client-server connection instead of connections through the name server.

5.2 Future work

The main focus of this thesis was to implement a Java interface for SCSQ.

The callin interface from the Amos II Java API is adapted and implemented here, and in the future it would be a possible task to integrate the callout interface into the PJI.

In terms of improvement, the current implementation is improving the querying pace, but it is obvious that it can be more optimized in the future.

To this end, it would be useful to benchmark the querying pace of the current implementation as well.

(45)

Appendix A

Pure Java interface

This appendix includes the implemented Pure Java interface API.

A.1 Connection class

The available functions of the Connection class are listed in the following:

public class Connection {

public Connection(String serverAddress, String serverName)

public Connection(String serverAddress, String serverName, String dbName) public Connection(String serverName)

public static Connection localConnection() public void disconnect()

public void init(String dbName)

public void initializeAmos(String imageName) public Scan execute(String query, int stopAfter)

public Scan executeCustom(String query, String options) public Scan execute(String query)

public void rollback() public void commit()

public Oid getFunction(String fnName)

public Oid getFunctionInternal(String fnName) public static void clearFunctionCache()

public void getFunctionArgumentsAsString(Tuple arg) public Scan callFunction(Oid fnObject, Tuple fnArgs)

public Scan callFunction(Oid fnObject, Tuple fnArgs, int stopAfter) public Scan callFunction(String fnName, Tuple fnArgs)

public Scan callFunction(String fnName, Tuple fnArgs, int stopAfter) public String getString(Scan theScan)

public Scan callFunctionCustom(Oid fnObject, Tuple fnArgs, String options)

35

(46)

A.2 Scan class Pure Java interface

public Oid callOidFunction(String fname, Oid arg)

public Tuple callTupleTupleFunction(String fname, Oid arg) public String callStringFunction(String fname, Oid arg) public String callStringFunction(String fname)

public Scan callFunction(String name)

public Scan callFunction(String name, Oid arg) public Scan callFunction(String name, String arg) public Scan callFunction(Oid fn, Oid arg)

public Oid createObject(Oid type)

public Oid createObject(String typeName) public Oid getType(String typeName) public void deleteObject(Oid theObject)

public String createVectorFromTuple(Tuple arg)

public void setFunction(Oid fn, Tuple argList, Tuple resList)

public void setFunction(String fnName, Tuple argList, Tuple resList) public void addFunction(Oid fn, Tuple argList, Tuple resList)

public void addFunction(String fnName, Tuple argList, Tuple resList) public void remFunction(Oid fn, Tuple argList, Tuple resList)

public void remFunction(String fnName, Tuple argList, Tuple resList) public Oid getObjectNumbered(int idno)

public boolean isInteger(String input) }

A.2 Scan class

The available functions of the Scan class are listed in the following:

public class Scan {

public Tuple getRow() public Vector toVector() public void nextRow()

public boolean eos() public void closeScan() public void close()

public Scan(int theScan, Connection theConnection)

public Scan(String scanString, Connection theConnection) public Scan()

}

A.3 Tuple class

The available functions of the Tuple class are listed in the following:

public class Tuple {

(47)

A.4 OID class Pure Java interface

public Tuple()

public Tuple(int arity) public Tuple(Oid obj) public Tuple(String str) public Vector toVector()

public Vector toRemoteVector() public int getArity()

public void setArity(int arity) public String getStringElem(int pos) public int getIntElem(int pos)

public double getDoubleElem(int pos) public byte[] getBinaryElem(int pos) public boolean getBooleanElem(int pos) public Oid getOidElem(int pos)

public Tuple getSeqElem(int pos) public Object getElem(int pos)

public void setElem(int pos, String str)

public void setElem(int pos, byte[] str, int len)

public void setBinaryElem(int pos, byte[] arr, int size) public void addElem(int pos, String str)

public void addElem(int pos, byte[] str, int len) public void setElem(int pos, int integer)

public void setElem(int pos, double dbl) public void setElem(int pos, boolean z) public void setElem(int pos, Oid obj) public void setElem(int pos, Tuple tpl) public void setElem(int pos, Object obj) public boolean isString(int pos)

public boolean isInteger(int pos) public boolean isDouble(int pos) public boolean isBinary(int pos) public boolean isObject(int pos) public boolean isTuple(int pos) public boolean isBoolean(int pos) public boolean isNull(int pos)

public Tuple(int theTuple, Connection theConnection) }

A.4 OID class

The available functions of the OID class are listed in the following:

public class Oid{

public Oid()

37

(48)

A.4 OID class Pure Java interface

private Oid(int oidtype, int typetag, Connection theConnection) public Oid(int oidtypeHandle)

public Oid(String oidName, int oidHandle) public boolean equals(Object theObject) public String getName()

public final String toString() public Oid CopyProps(Oid o) public int getID()

public void print() public String getType() }

(49)

Bibliography

[ABW03] Arvind Arasu, Shivnath Babu, and Jennifer Widom. The cql continuous query language: Semantic foundations and query execution. Technical Report 2003-67, Stanford InfoLab, 2003. An earlier version this technical report, titled ”An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations”, appears on this publications server as technical report number 2002-57. A short version of technical report 2002-57 also appears in the proceedings of the 9th International Conference on Data Base Programming Languages (DBPL 2003).

[Bo10] Yang Bo. Querying json streams. Master’s thesis, Uppsala Uni- versity, 2010.

[COD] Rational databases. http://www-

03.ibm.com/ibm/history/ibm100/us/en/icons/reldb/.

[DR00] D.Elin and T. Risch. Amos ii java interfaces, 2000.

[ea03] N. Tatbul et al. Load shedding in a data stream manager. 2003.

[ea11] T. Risch et al. Query language survey and selection criteria. 2011.

[EN99] Ramez A. Elmasri and Shankrant B. Navathe. Fundamentals of Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 3rd edition, 1999.

[E.Z11a] E.Zeitler. Scalable Parallelization of Expensive Continuous Queries over Massive Data Streams. PhD thesis, Digital Com- prehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 836, 2011.

[E.Z11b] E.Zeitler. Scsq user’s guide, 2011.

[FR08] Gustav Fahl and Tore Risch. Amos ii tutorial, 2008.

[FRS93] Gustav Fahl, Tore Risch, and Martin Sk¨old. Amos - an architecture for active mediators. In Intl. Workshop on Next Generation

39

(50)

BIBLIOGRAPHY BIBLIOGRAPHY

Information Technologies and Systems (NGITS ’93, pages 47–53, 1993.

[GO03] Lukasz Golab and M. Tamer Ozsu. Data stream management issues – a survey. Technical report, 2003.

[Keg09] Stefan Kegel. Streamed verification of a data stream management benchmark. Master’s thesis, Otto-von-Guericke-University Magdeburg, 2009.

[LR1] Lofar. http://www.lofar.nl/.

[RG99] Raghu Ramakrishnan and Johannes Gehrke. Database Manage- ment Systems. McGraw-Hill, Inc., New York, NY, USA, 3 edition, 1999.

[Ris13] Tore Risch. ed. scsq linear road website, 2013.

http://user.it.uu.se/ udbl/lr.html (Last visit on 01/08/2013).

[RJK03] Tore Risch, Vanja Josifovski, and Timour Katchaounov. Func- tional data integration in a distributed mediator system. In Functional Approach to Data Management - Modeling, Analyz- ing and Integrating Heterogeneous Data, Springer, ISBN, pages 3–540, 2003.

[Shi81] David W. Shipman. The functional data model and the data languages daplex. ACM Trans. Database Syst., 6(1):140–173, March 1981.

[web14] Wikipedia website. Marshaling.

http://en.wikipedia.org/wiki/Marshalling-(computer-science), 2014.

[web16] Wikipedia website. S-expression.

https://en.wikipedia.org/wiki/S-expression, 2016.

[ZR06] Erik Zeitler and Tore Risch. Processing highvolume stream queries on a supercomputer. In Proc. ICDE 2006 Workshops, 2006.

[ZR07] E. Zeitler and T. Risch. Using stream queries to measure communication performance of a parallel computing environment. In Distributed Computing Systems Workshops, 2007. ICDCSW ’07.

27th International Conference on, pages 65–65, June 2007.

Pure Java interface to a DSMS

Examensarbete 30 hp December 2019

Pure Java interface to a DSMS

Shahpar Shabani

Institutionen för informationsteknologi

Department of Information Technology

Abstract

Pure Java interface to a DSMS

Acknowledgement

Contents

List of Figures

List of Tables

Chapter 1

Introduction

Chapter 2

Background

2.1 Database Management System (DBMS)

2.2 Data Stream Management System (DSMS)

2.3 SCSQ

2.4 Java API

Chapter 3

Pure Java interface to SCSQ

3.1 Overview

3.2 PJI Implementation

Chapter 4

Reflections and Evaluation

4.1 Reflections

4.2 Evaluations

Chapter 5

Conclusion and Future Work

5.1 Conclusion

5.2 Future work

Appendix A

Pure Java interface

A.1 Connection class

A.2 Scan class

A.3 Tuple class

A.4 OID class

Bibliography