An Approach to Achieve DBMS Vendor Independence for Ides AB's Platform

(1)

Linköping University | Department of Computer Science

Bachelor thesis, 16 ECTS | Datateknik

2017 | LIU-IDA/LITH-EX-G--17/008--SE

An

Approach

to

Achieve

DBMS Vendor Independence

for Ides AB’s Platform

Niklas Blomqvist

Philip Johansson

Examiner : Olaf Hartig

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c

Niklas Blomqvist Philip Johansson

(3)

Abstract

Software that is developed with few integration capabilities to different user interfaces or database vendors might lose market share in the long run. To stay competitive, compa-nies that find themselves in situations like these might need to look at options to increase their alternatives. This thesis aims to present and evaluate how Ides AB could achieve vendor independence as it relates to database integration. The proposed solution is based on pre-existing code from an existing product and thus includes theory about the methods and details how one can read, understand and analyse code. The outcome is presented with code examples to give the reader a clear and concise understanding. In the evaluation phase, we take other related work into consideration as it relates to our thesis focus. The proposed approach presented consists of a class to represent different database vendors. It also consists of abstract functions handling the interaction between different databases. Which database the class interacts with is determined by the connection established. The approach also includes what is possible to make database agnostic verified by an evalua-tion.

(4)

Acknowledgments

We would like to thank Ides AB for giving us the opportunity to do this thesis. Thanks to everyone at Ides for behind very helpful and inviting. Special thanks go to Kristian Sjöström who has been our supervisor at Ides. Also thanks to Olaf Hartig who has been our examinator and supervisor. Olaf provided us with feedback and guidelines to make this thesis possible.

(5)

List of Figures

2.1 Code example of refactoring with the extract method. . . 8

2.2 Code example of refactoring with the inline method. . . 9

2.3 Architecture of a database. . . 9

2.4 Example of a table in a database. . . 10

2.5 Relations in a table. . . 10

2.6 Result from query. . . 11

2.7 Three-layer architecture. . . 13

2.8 ORM architecture . . . 14

4.1 List over IDbConnection interface properties and methods. . . 20

4.2 Example of parameters in the IDbCommand interface. . . 20

4.3 Call to the database using Oracle.DataAccess.Client. . . 21

4.4 Enumeration representing database types. Could be extended further. . . 22

4.5 The constructor of the implemented class. . . 22

4.6 The overwritten CommandText property for the IDbCommand interface. . . 23

4.7 The character different DBMSs use for parameters. . . 23

4.8 Accessing Oracle’s parameter character. . . 23

4.9 Snippet from the ConvertStatement function that translate the parameter character. 23 4.10 Example of writing a standard SQL statement. . . 24

4.11 Statement with the unique Oracle functions decode and dbms_lob.get_length. . . 24

4.12 Command with separate statements for different DBMSs. . . 24

4.13 List over functions to be translated. . . 25

4.14 Implementation after implementing parse functions. . . 26

4.15 Function header of helper function GetStatement . . . 26

4.16 Statement supported by multiple DBMSs. . . 26

4.17 Statement expected to execute on all DBMSs. . . 26

4.18 Function for adding a parameter to a statement. . . 27

4.19 Functions for reading binary data in Oracle and SQL Server. . . 27

4.20 Function for reading binary data. . . 27

4.21 Function for reading data. . . 28

4.22 Call to the database using the new implementation. . . 28

4.23 Example of a statement in the user interface . . . 29

4.24 Example of application logic in the user interface . . . 29

5.1 Snippet from a stored procedure. . . 31

5.2 Snippet from a stored procedure written in C#. . . 31

5.3 Chart of stored procedure . . . 32

5.4 Chart of the memory usage evaluation on a remote server . . . 33

5.5 Chart of CPU evaluation on a remote server . . . 34

5.6 Chart of the execution time evaluation of the statement 1 on a remote server . . . . 35

(8)

5.11 Chart of the memory usage evaluation on a local server . . . 37

5.12 Chart of CPU evaluation on a local server . . . 38

5.13 Chart of the execution time evaluation of the statement 1 on a local server . . . 39

(9)

1 Introduction

1.1 Motivation

The ability to choose and customize your software has become something that customers take for granted. Companies that have developed software which was originally designed to only have narrow options are now in a critical phase where they are losing potential customers. How the process to store data from the software should be done is often seen as one of the parts that the customer wants to choose.

Databases are one way of handling data. Developers of software are facing a choice in where they must choose which database management system (DBMS) to interact with. But every customer has their own individual situation with a unique solution that possibly cannot be adapted to another customer. Keeping up with the market and being able to deliver what the customers demand is an essential key in developing software. One way of extending the flexibility is to make the software database-agnostic1_.

1.2 Aim

The purpose of this thesis project is to investigate and to evaluate the most appropriate way to make Ides AB’s software platform database-agnostic without sacrificing performance.

1.3 Research questions

1. In what way, could Ides become independent of database provider without affecting the current functionality?

• How could this be implemented without affecting the performance of execution time, CPU or memory by more than 10%?

2. How should Ides keep working with their current layer architecture, with respect to multiple database providers?

(10)

1.4. Background

1.4 Background

Ides AB is developing a platform for version control of engineering data, CAD integrations, applications and Microsoft office documents. The name of the product is MasterConcept and more about it can be read at http://www.imaster.se/en/. Today, Ides AB can exclu-sively offer their platform to run with Oracles database management system. To increase their flexibility to customers, Ides would like to make their software platform functional with different database managers. The reason is that they have lost procurements, and thereby market shares because they cannot offer another DBMS.

1.5 Delimitations

• Ides is developing their software in a .NET environment with the language C#, therefore this thesis will focus on alternatives relevant for the .NET Framework.

(11)

2 Theory

This chapter presents theory and related work used writing this thesis.

2.1 Reading, understanding, and analyzing code

This section is relevant in the sense that MasterConcept is 30+ years old. Over the years it has grown very large in terms of code base. Understanding is a key factor in contribution.

"As the amount of code in a project grows, it gradually surpasses understanding" [5]. There are methods to help get the fundamental understanding that one need to contribute in an effective manner. Feathers [5], explains three methods, Notes/Sketching, Listing markup, and Breakpoints, all presented below.

Notes/Sketching

When reading code repeatedly it can become confusing. One way to get further understand-ing could be to write notes and draw sketches. Start by writunderstand-ing down names of the most important thing, classes or functions, continuing with the second most important thing. If there is a relation between the two, draw a line between them. The sketches do not have to be formal, like a full UML diagram. This method could help see things from a different perspective, since drawing things out tend to do that [5].

Listing markup

Another method Feathers [5] presents for code understanding is Listing Markup. Mark up a listing to get deeper understanding of its purpose and print out the code. Depending on if you want to further understand dependencies or method structure, apply one of the following techniques:

• Separating responsibilities

Use a marker to group things. If things belong together, mark them with a special symbol next to them so it is possible to identify parts that belong together.

(12)

2.2. Software development models

• Understanding method structure

Line up the code in blocks since bad indentation in long methods can make them very difficult to read. Draw lines from the beginning of the blocks to the end of the blocks. Look over the different blocks and see if there exists complicated logic that could be split up into separate functions using a refactoring method described in section 2.4.

Breakpoints

This method requires an Integrated Development Environment1(IDE). Find the entrance of some logic to deepen knowledge about. Set a breakpoint on the first line and step the code line by line. This technique will traverse every function call made from the point of break and can help to see how the code is interrelated.

2.2 Software development models

This section presents different software development models. Models that were considered when developing.

When it comes to software development and development models, Patton [18] describes the four most frequently used models. Patton [18] says that each model has its own advan-tages and disadvanadvan-tages, and when testing and trying to understand code, one will most likely encounter them all. The models that Patton [18] describes are the following:

Big-bang model

The big-bang model originates from the theory that the universe was created in a single huge explosion. In software development, this principle is applied in a similar manner. A very large amount of both people and money are put together to develop a software. The devel-opment is very fast paced and testing is not considered. This results in a large program that might or might not work perfectly because the way of working was very rushed forward. The resulting structure is in most cases not optimal.

Testing and understanding a product that was developed with the big-bang model is very difficult. This is because it is very hard to go back and fix things that are broken since there is no real structure in the code.

Code-and-fix model

The code-and-fix model is a step up from the big-bang model. The idea is that the develop-ment team is starting out with a simple design of what they want. This later proceeds into a long repeating cycle of testing, fixing bugs and developing new features. Finally, at some point the software is released.

Testing and understanding a product developed with the code-and-fix model is easier than with the bing-bang model. This is because under the development cycle there is a part for testing and thus the released software is easier to maintain due to improved code quality.

Waterfall model

The waterfall model builds upon a series of steps starting from an initial idea to a final product. After each step the development team reviews if they are ready or not to move on 1_{An IDE is a software application that provides comprehensive facilities to computer programmers for software} development.

(13)

2.3. Maintainability

to the next step in the model. If not, the team stays at their current level until they are ready to move on. This model requires well-done product definition and design specification. The coding phase is usually only a single block and when moved from one block to another there is no turning back in the waterfall model.

Testing and understanding a product developed with the waterfall model is easy because every single step in the development cycle is very well and thoroughly specified.

Spiral model

The spiral model addresses many of the problems with the previous models described while contributing with additional features. The general idea behind the spiral model is to start small with defining the most important features. Try them out and get feedback from cus-tomers and then move on to the next level. This is repeated until the final product is achieved. Every cycle in the spiral model involves:

1. Determine objectives, alternatives, and constraints. 2. Identify and resolve risks.

3. Evaluate alternatives.

4. Develop and test the current level. 5. Plan the next level.

6. Decide on the approach for the next level.

The overall view of the spiral model is seen as a bit of both waterfall and code-and-fix. Waterfall because of the phases of analyzing, designing, developing, and testing. Code-and-fix because of each cycle around the spiral working model is very similar.

Strictly following the spiral model ends up with a code base with high quality and allows for easier understanding by future developers [18].

2.3 Maintainability

This section contains information to be applied both when writing code and understanding existing code and design decisions.

Maintainability is the ease of which a system could undergo changes. Changes could impact components, services, features, and interfaces by fixing errors or to meet new require-ments from customers [13]. Maintainability is a function of the overall quality. The most common things that affect maintainability include the following [13]:

• Excessive dependencies between layers.

The different layers, classes, or functions in the applications are very dependent on each other, they are tightly coupled to each other. This makes it very hard to swap one class with another without having to change a lot of other code.

Affects: Prevents easy replacements, updates, and changes.

Possible solution: A typical approach to prevent this is to design systems as well-defined layers with clear layout, as mentioned in 2.6.

(14)

2.4. Refactoring code

• Direct communication.

This applies to components and layers located on different physical servers. In gen-eral, when designing for communication between such components, there is a choice between direct communication and message-based communication.

Affects: Prevents changes to physical deployment of components and layers. Possible solution: Choosing an appropriate communication model and designing a pluggable architecture that allows for easy maintenance could avoid this.

• Relying on custom implementations.

Writing and relying on custom implementations for things such as authentication. Affects: Prevents reuse and hinders maintenance.

Possible solution: Instead use the built-in platform functions and features where possible.

• Code logic of components is not cohesive.

Cohesion refers to the degree to which the elements inside a component belong together. If components are not cohesive there exists code that do not belong to the component.

Affects: Prevents the ease to maintain and replace. Also causes unnecessary depen-dencies.

Possible solution: Design components with high cohesion and low coupling. • Code base is large and complex.

Large amounts of code written without a proper structure.

Affects: Makes changes very unmanageable, fragile and burdensome.

Possible solution: Again, having clear and structured layers and perhaps using a business workflow could avoid these situations.

• Lack of documentation.

There is a lack of documentation regarding either comments in code or manuals over how to use the program and its features.

Affects: Hinders usage, management, and future upgrades.

Possible solution: Make sure to document each function with comments and to use explanatory variable names.

2.4 Refactoring code

This section contains information about how to refactor code. We used refactoring to get better under-standing of how the code worked.

Software that consists of a lot of code that is constantly changed by different developers can be difficult to keep clean and uniform. Usually in software development, there is a code design to maintain a standard, but development of both code language and development technology is ongoing. It may therefore be useful to maintain the existing code with new technologies so that the whole software is coherent [14].

Refactoring code is a technique for developers to improve the code. It is also a technique to improve the understanding of how the code is interrelated. The main goal of refactoring is to improve the code after being developed, as both new features get added to the programming

(15)

language and philosophies for software development change [6]. The Extract method is one useful technique, the core of the extract method is to break methods into smaller methods [6].

Extract method

The extract method has seven steps to follow to achieve the desired effect: 1. Find the code you want to refactor and comment it out.

2. Come up with a name that describe what it does

it is important to distinguish between what it does and how it does it, the name should be related to the first.

3. Place a call to the new method inside the old method.

4. Copy the code that you want to refactor into the new method.

5. Adjust the method declaration to accommodate the parameters and return value (if any).

6. Run tests to validate that it works.

7. Remove the commented code from the first step. Figure 2.1 shows an example of the extract method.

public void printOwing(Customer Customer, List<Order> Orders) {

double outstanding = 0.0;

//Print banner

Console.WriteLine("---"); Console.WriteLine("-- Customer Owes --"); Console.WriteLine("---");

//Calculate outstanding

foreach( Order order in Orders )

outstanding += order.GetAmount();

//Print details

Console.WriteLine(String.Format("Name: {0}", Customer.GetName())); Console.WriteLine(String.Format("Amount: {0}", outstanding)); }

ŕ ŕ ő

(16)

public void printOwing(Customer Customer, List<Order> Orders) {

double outstanding = 0.0; printBanner();

//Calculate outstanding

foreach( Order order in Orders )

outstanding += order.GetAmount();

//Print details

Console.WriteLine(String.Format("Name: {0}", Customer.GetName())); Console.WriteLine(String.Format("Amount: {0}", outstanding)); }

private void printBanner() {

//Print banner

Console.WriteLine("---"); Console.WriteLine("-- Customer Owes --"); Console.WriteLine("---"); }

Figure 2.1: Code example of refactoring with the extract method.

When using the extract method, local variables may become a problem. There are guide-lines to follow both to avoid and to handle local variables. The first thing is if the variable is read but not changed, then it is just to pass them as a parameter. The real problem occurs when a local variable is to be assigned. If it is only one variable that is assigned, the method can return that value. If there are several values that is to be assigned, it is possible to encap-sulate it into a wrapper class. This may not be optimal considering that the initial purpose was, clear and easy code. If the latter would happen, Fowler et al. [6] state that the refactoring may be avoided or written in another way [6, 14].

Inline method

The inline method is the opposite of the extract method. When there is no longer easy to under-stand the code or if it is badly factored, the inline method is used.

The inline method has five steps to follow to achieve the desired effect: 1. Check that the method is not polymorphic.

If so, do not inline if subclasses override the method; they cannot override a method that is not there.

2. Find all calls to the method.

3. Replace each call with the method body. 4. Compile and test.

5. Remove the method definition.

(17)

2.5. Database management systems

public int getRating() {

return (moreThanFiveLateDeliveries()) ? 2 : 1; }

private bool moreThanFiveLateDeliveries() { return _numberOfLateDeliveries > 5; } ŕ ŕ ő public int getRating()

{

return (_numberOfLateDeliveries > 5) ? 2 : 1; }

Figure 2.2: Code example of refactoring with the inline method.

2.5 Database management systems

MasterConcept is using a DBMS for storing and retrieving data. This section gives relevant informa-tion about DBMSs.

A database is a structured collection of data, usually very large. Relational databases structure data in tables consisting of rows and columns. Each column has a column name, which is used access that particular column [4].

Databases are structured using database schemas. A database schema represents the log-ical view of the entire database. It defines how the data is organized within the database as well as the relation between the data. A database schema can be divided into two categories, both listed below.

• Logical database schema

Defines the logical constraints that need to be applied when data is stored. A con-straint could be column data type such as numbers (integers) or text (strings).

• Physical database schema

The actual storage of data and its form of storage, like files. It defines how the data will be stored in a secondary storage device.

(18)

Database systems contain a lot of data in addition to the actual data to be saved. This data is known as metadata, data about data, which helps to locate and retrieve data in an easy way. Similar to the database schema, data independence follows a layered architecture, listed below.

• Logical data independence

If changes occur to the format of a table, the data on disk should not be affected. • Physical data independence

If changes occur to the physical data the schema or logical data should not be af-fected.

To keep data independence, efficient access, and to allow for administration, databases are managed by a database management system (DBMS) [19]. A database management system is a software application that interacts with both the user and the database itself to capture and analyze data [4].

Persons

ID Firstname Lastname 1 Niklas Blomqvist 2 Philip Johansson Figure 2.4: Example of a table in a database.

Figure 2.4 shows an example of a possible table in a database. This is a table about persons and consists of three columns. The first column, ID, is only there to uniquely identify each person. This is because two people could have the same name. The second and third column name is Firstname/Lastname and stores the name of each person.

Relations in databases

A relation is a set of tuples2_(d

1, d2, ..., dn)where each element diis a member of a data domain Di3 [2]. Each element is termed an attribute value [2]. An attribute is a name paired with a data type [2]. An attribute value is an attribute paired together with an element of that attributes domain [2]. A tuple is a set of attribute values in which no two distinct elements have the same attribute name [2]. Figure 2.5 shows relation, tuple and attribute represented as table, row and column respectively.

SQL Term Relational database term Description

Row Tuple (or record) A dataset representing a single item. Column Attribute (or field) A labeled element of a tuple (ex. Firstname)

Table Relation A set of tuples sharing the same attributes; a set of columns and rows.

Figure 2.5: Relations in a table. 2_{A tuple is a finite ordered list of elements.}

(19)

Structured Query Language

A database management system allows users to create, read, update and delete (CRUD) data in a database [4]. This is achieved by interacting with the DBMS through the Structured Query Language (SQL) [4]. SQL allows users to manage databases by deleting/creating ta-bles and inserting/updating/deleting of data within the tata-bles [19]. Communication between the DBMS and users is done via a Statement. The fundamental statements are the following [4]:

• SELECT

For retrieving data from a table. • INSERT

For inserting data in a table. • UPDATE

For updating existing data in a table. • DELETE

For deleting data in a table.

A statement that retrieves data is called a query and is built upon a SELECT-FROM-WHERE structure [4].

SELECT < a t t r i b u t e l i s t > FROM < t a b l e l i s t > WHERE < c o n d i t i o n >

• <attribute list> - names of columns whose values are to be retrieved by the query. • <table list> - relation names required to process the query.

• <condition> - conditional expression that identifies the data to be retrieved.

By using figure 2.4 as an example, a query for retrieving persons whose first name is Niklas would look like:

SELECT Firstname , Lastname

FROM Persons

WHERE Firstname =" N i k l a s " This query would produce the following result:

Firstname Lastname Niklas Blomqvist Figure 2.6: Result from query.

Similar expressions can be used with UPDATE/INSERT/DELETE to modify data in ta-bles.

(20)

2.6. Layer architecture

Stored procedures

Most DBMSs (including Oracle and SQL Server) support stored procedures. A stored pro-cedure is a subroutine which can consolidate and centralize logic which originally was im-plemented in applications. Very extensive or complex processing which would require the execution of multiple SQL statements could instead be implemented as a stored procedure to save both time and memory [4]. Different DBMSs utilize different dialects of SQL. This means a procedure written for Oracle’s DBMS would possibly not be functional in SQL Server [16, 17]. Stored procedures are registered in the DBMS and not in the application code.

Triggers

A trigger is a procedural code in a DBMS that automatically executes at certain events. Pos-sible events differs from DBMS to DBMS but the following are sort of standard for the most well-known ones (including Oracle and SQL Server) [4]:

• BEFORE UPDATE/DELETE/INSERT

Execute code before updating/deleting/inserting data into a table. • AFTER UPDATE/DELETE/INSERT

Execute code after updating/deleting/inserting data into a table.

Triggers are bound to specific tables and are automatically executed when any of the bound events occur.

Oracle

For stored procedures and triggers, Oracle utilizes Procedural Language/SQL (PL/SQL) [17] which is an extension of SQL. One of the main differences between PL/SQL and other SQL dialects, for expressing stored procedures, is the ability to group procedures into packages. Packages declare the types, variables, constants, exceptions, cursors, and subprograms that can be referenced from outside the package. It contains information about the content of the package but excludes the code for the subprograms. PL/SQL also has a lot of unique built-in functions.

SQL Server

For stored procedures and triggers, SQL Server utilizes Transact-SQL (T-SQL)[16] which also is an extension of SQL. The main difference between PL/SQL and T-SQL is how they handle variables, stored procedures, and built-in functions [16, 17].

2.6 Layer architecture

This section contains information to understand how code is structured and to understand our result of how Ides should keep working with their layer architecture in the future.

Like any other complex structure, software should be structured in some way. There are different approaches when it comes to structuring the software, but the primary goal is the same, keep the software alive and updatable. Without some structure, it would be hard to debug and expand the software. Software should be structured in a way that the user, system, software and business goals are being considered [13]. Questions that come up when designing a system might be:

(21)

2.6. Layer architecture

• How can the software be designed so it can flexibly be updated? • What is that next phase of the software? (i.e. features, platform etc.)

At the highest level, software can be decomposed into 3 layers. These three levels are presented below.

• Presentation

This layer takes care of the interaction with the user. It receives what to render and might also take input and pass it to the business layer. This layer is normally the traditional application or the Web-based application [22].

• Business

This layer takes care of the business logic of the application, it is working like a bridge between the presentation and data layers.

• Data

This layer takes care of the data access and the persistent data store.

Figure 2.7: Three-layer architecture.

Layers are taking care of the logical representation and do not care about the location of the physical hardware. There can be different tiers of layers, but layers can also reside on the same tier. The primary goal of having individual layers is to make each of the layers independent of each other. By doing that, the developer can replace and update one layer without having to make a lot of changes to the others [10, 20, 21].

When designing an application, the first thing to focus on is the highest level of abstrac-tion. Start by collecting functionality into different layers. There is a granularity between too few or too many layers. This balance is important to keep in mind to avoid unnecessary complexity. When grouping functionality into layers, the interaction between the layers must be taken into concern. Meier et al. [13] are specifying three rules, top-down interaction, strict interaction and loose interaction, for the layers to interact with each other. Interacting in this context is the same as to communicate with each other. These three rules are to be followed in the whole code to avoid circular dependency. Circular dependency is a relation with two or more components being directly or indirectly dependent on each other to function properly.

• Top-down interaction.

Higher level layers can interact with layers below. This accompanies that lower layer never can interact with the ones above. It is possible to use events to make

(22)

com-2.7. Object-relational mapping

• Strict interaction.

Each layer must interact with only the layer directly below. This rule will enforce strict separation where each layer only knows about the layer directly above. If the software is to be modified over time, this is the approach to choose. The reason behind that is that modifications will only affect the layer directly above [13].

• Loose interaction.

Higher level layers can bypass layers to interact with lower layers directly. This may improve the performance, but can also affect the dependencies negatively, mean-ing that the code can get effects that was not sought. This is the approach to choose if the application is not to be distributed across physical tiers [13].

The benefit of this rule is that modifications to the interface of the layer will only affect the layer directly above.

2.7 Object-relational mapping

ORM was a first thought of a solution to our problem and is discussed in later chapters.

Object-Relational Mapping (ORM), is an existing technique for converting data between object-oriented programming languages and relational databases. Typically, an object-oriented program uses an API that in turn calls a driver to communicate commands to the database [9]. In a result set, this means the programmer manually must set each attribute of the object in question to the corresponding returned data. "This code differs little from class to class and is largely the same even from application to application and is often tedious to write" [9].

An object-relational mapper functions in between the Data Access Layer and the Relational Database as seen in figure 2.8.

Figure 2.8: ORM architecture

Using an ORM is one way of creating abstraction in code as you would write statements to the ORM of choice instead of the database. The ORM is configured to talk with a specific DBMS, but multiple of these configurations could extend the functionality to be used with multiple different ones.

ORMs work by using drivers for different DBMS and they implement an open database connection where the targeted database is irrelevant for the ORM itself [11]. Typically, you initialize an ORM with a connection string which tells the ORM which driver you would like to use. The mappers have their own style of writing statements and then utilizes the drivers to translate these into something appropriate depending on the targeted DBMS.

(23)

2.8. Software testing

2.8 Software testing

This section presents what testing means and contains related work when it comes to evaluate solu-tions with DBMSs.

When developing software, there is usually a product specification that works like an agree-ment among the software team. It specifies what and how the product will be and act, and what it will not do. Problems with the software code is often referred to as bugs. Ron Patton [18] define five different bugs:

1. The software does not do something that the product specification says it should do. 2. The software does something that the product specification says it should not do. 3. The software does something that the product specification does not mention.

4. The software does not do something that the product specification does not mention but should.

5. The software is difficult to understand, hard to use, slow or - in the software testers eyes - will be viewed by the end user as just plain not right.

Verification and Validation

When developing software, there are at minimum two stakeholders, the customer and the specified requirements. There are different ways to check that the software meets the speci-fied requirements. The Verification and the Validation are two terms that relate the specification and the customer. "Verification is the process confirming that something–software–meets its specification. Validation is the process of confirming that it meets the user’s requirements." [18]

Black-box and White-box testing

In black-box testing, the tester only knows what the software is supposed to do and is not able to see how it operates. The tester writes some input and gets some output, and cannot explain how it happened. In White-box testing, the tester has access to the software code and can examine the code for clues that will help with constructing the testing. The tester can also evaluate what went wrong when tests fail.

Comparative study on databases

Youssef Bassil [1] does a comparative study where he evaluates the performance of the top DBMSs. The DBMSs that are being evaluated are MS SQL Server 2008, Oracle 10g, IBM DB2, MySQL 5.5, and MS Access 2010. He evaluates the DBMSs on how they perform on CPU utilization, memory usage, virtual memory usage, and threads count. The databases are populated with 1 000 000 records each. He executes five different statements with different complexity and compares the results. Eventually he comes to the conclusion of how SQL Server has a lower execution time executing queries compared to Oracle. Oracle however showed a slight advantage performing UPDATE statements compared to SQL Server. On an average of the ten statements he executed, Oracle had a higher CPU utilization as well as memory usage compared to SQL Server.

(24)

3 Method

3.1 Pre-study

MasterConcept [12] contain 666 000 lines of code. To get an understanding of how the pro-gram works and how it was interrelated, we searched for methods that would help us. We found three useful methods, notes/sketching, listing markup and breakpoints that are presented in section 2.1.

Ides software development team uses Visual Studio [23] as developing environment. It is an Integrated Development Environment (IDE). This is what we used during development. Visual Studio enabled us to set up breakpoints in the code to follow the flow. To further improve our understanding, we sketched down the important classes and functions. To fully understand these classes and functions we applied the method about refactoring. Some functions contained over 400 lines of code. To further understand what the functions did and how the classes worked we applied the extract method presented in section 2.4.

When gathering knowledge about the code, we got an insight of how the layered ar-chitecture is structured. We later used breakpoints and notes/sketching to get a full insight in how the program is structured. This was later used to answer our second research question: How should Ides keep working with their current layer architecture, with respect to multiple database providers?

To structure our work, we decided to have a software developing model. Since the prod-uct already existed and was constantly developed, we looked for a model that would work for us. It was also important that we could submit the code in an easy way. With these rea-sons, we evaluated the development models and eventually decided to use the spiral model.

3.2 Implementation

To answer our research question; In what way, could Ides become independent of database provider without affecting the current functionality? We looked for previous work and found that ORM might be a possible solution. In validation meetings with the software developer chief, we were informed that they had already tried implementing an ORM solution. But the

(25)

perfor-3.3. Software verification

mance was not good enough, and therefore they were not interested in a solution based on ORM.

During our research, we did not find any method that was directly applicable to our prob-lem. Therefore, we needed to combine methods to achieve the desired result. We focused on a solution that could work for our sub-research question: How could this be implemented without affecting the performance of execution time, CPU or memory by more than 10%?

By doing that we found that our solution needed to include theory and related work from: • Maintainability

When analyzing the code as mentioned in section 3.1, we found three direct quality aspects that we needed to include in our solution.

– Excessive dependencies between layers

We needed a structured layer architecture to prevent the affects that might be caused.

– Direct communication

We needed the solution to have an appropriate communication model to allow for easy maintenance.

– Code base is large and complex

The code base that we worked with contained a large amount of code, and we needed to have clear and structured layers.

• Layer architecture

Meier et al. [13] stated three rules presented in section 2.6. When we were develop-ing our solution we based the interaction between the layers on strict interaction. Strict interaction was the interaction type that best fit our requirements. This is because the code will be modified over time and according to Meier et al [13], this is the interaction type to choose.

• Database management systems

When investigating the Oracle DBMS further, we realized that it contained a lot of stored procedures and logic that we somehow needed to implement for use with other DBMSs. Because of this we figured if there was a way to write this code only once. Eventually we thought we could try to translate a stored procedure from PL/SQL code to C# code.

3.3 Software verification

To fully answer our research question How could this be implemented without affecting the per-formance of execution time, CPU or memory by more than 10%? we needed to verify how our solution was performing. We decided to do a white-box testing based on Youssef Bassil [1] study. We chose white-box testing because we had the code and could in discussion with the developers at the company discuss which parts of the software that were interesting to test. The reason why we built the test on Youssef Bassils test is to compare our results with his.

The testing we performed, contained three different parts: • Old Oracle solution vs New Oracle solution

This comparison was done in a network with a server-client communication. • Old Oracle solution vs New Oracle solution

(26)

3.3. Software verification

• New Oracle solution vs New SQL Server solution1. The statements that was tested were the following:

Statements

ID Statement

S1_insert INSERT INTO epc_vault_data (id, version, data, data_length) VALUES (:id, :version, :data, :data_length)

S2_delete DELETE FROM epc_vault_data WHERE id = :id AND version = :version AND data = :data AND data_length = :data_length S3a_select SELECT dbms_lob.getlength(utl_compress.lz_uncompress(data) FROM epc_vault_data WHERE id = :id AND version = :version S3b_select SELECT DATALENGTH(DECOMPRESS(data)) FROM epc_vault_data WHERE id = :id AND version = :version

S4_update UPDATE epc_vault_data SET data = :data, data_length = :length WHERE id = :id AND version = :version

S5_select SELECT sytuser_sytitems.sytitem_id id, sytuser_sytitems.sytitem_rev rev, sytuser_sytitems.cotstd_stdid stdid FROM epc_activities left JOIN epc_activities_doc ON ((epc_activities.id = epc_activities_doc.activities_id OR epc_activities.parallel_sign = epc_activities_doc.activities_id) AND epc_activities.levelseq = epc_activities_doc.activities_lvl_seq AND epc_activities.activityseq = epc_activities_doc.activities_activity_seq) LEFT JOIN epc_activities_item ON (epc_activities.id = epc_activities_item.activities_id AND epc_activities.levelseq = epc_activities_item.activities_lvl_seq AND epc_activities.activityseq = epc_activities_item.activities_activity_seq) LEFT JOIN epc_activities_task ON (epc_activities.id = epc_activities_task.activities_id AND epc_activities.levelseq = epc_activities_task.activities_lvl_seq AND epc_activities.activityseq = epc_activities_task.activities_activity_seq) JOIN sytuser_sytitems ON (epc_activities.sign_user = sytuser_sytitems.shortname AND sytuser_sytitems.cotstd_stdid = (select cotstd_stdid FROM cotprojs WHERE projid = nvl(nvl(epc_activities_doc.dotdoc_projid, epc_activities_item.item_projid), epc_activities_task.taskrequest_projid))) WHERE id = :id and levelseq = :levelseq and activityseq = :activityseq

The statements were chosen with a diversity in mind. The statements are a representation of the most executed statements in MasterConcept. In addition to that, they are also based upon the statements Youssef Bassil [1] executed in his comparative study, to best be able to compare the outcome of the tests to his outcome. The statements are chosen with respect that different DBMSs utilizes CPU and memory differently. To evaluate the differences of the DBMSs a variation of commands are chosen with different complexity. In addition to the tests above, we decided to evaluate the performance between the stored procedure and the corresponding C# code we wrote. This was done both over network and on a local database.

Testing environment

All the statements were executed on the same computer running Windows 10 Pro. The computer’s hardware consisted of an Intel Core i7-3720QM CPU @ 2.60 GHz (8 cores), 16.0 GB RAM, and a 500 GB Samsung 850 EVO SSD. There were three different databases set up. One local SQL Server database, running SQL Server 2016. One local Oracle database running Oracle 11g and one remote Oracle database also running Oracle 11g. The remote Oracle database had an average latency of 16 ms during the tests.

To execute the statement a small application was written. It was written in C#, using the .NET Framework namespace System.Diagnostics to retrieve information about the hardware. The application runs a statement 10 000 times and records the execution time, the memory usage and the CPU utilization. This is done four times calculating the average execution time, memory and CPU from all the tests. The test application is executed on a local machine, measuring its performance. The values retrieved was compared with each other. The results are presented in chapter Evaluation.

(27)

4 Results

The thesis has two main research questions:

1. In what way, could Ides become independent of database provider without affecting the current functionality?

2. How should Ides keep working with their current layer architecture, with respect to multiple database providers?

To simplify the understanding of the result, this section has been divided into two parts each covering one of the questions. The performance evaluation of the implementation is pre-sented in the chapter Evaluation.

4.1 Background

The proposed implementation utilizes IDbConnection, IDbCommand, and IDbReader interfaces. These interfaces are a part of the .NET Framework. They represent an open connection, im-plemented by .NET Framework data providers for accessing relational databases. Below is a list consisting of four drivers used to connect to various data sources.

• Oracle.DataAccess.Client

Driver for accessing data from a Oracle Database. • System.Data.SqlClient

Driver for accessing data from a SQL Server Database. • System.Data.OleDb

Driver for accessing data from a variety of sources in a uniform manner. • MySql.Data.MySqlClient

Driver for accessing data from a MySql Database.

(28)

4.1. Background

4.1 [8].

string ConnectionString { get; set; }

// Gets or sets the string used to open a database. int ConnectionTimeout { get; }

//Gets the time to wait while trying to establish a connection before terminating the attempt and generating an error.

string Database { get; }

//Gets the name of the current database or the database to be used after a connection is opened.

ConnectionState State { get; }

//Gets the current state of the connection.

IDbTransaction BeginTransaction()

//Begins a database transaction.

void ChangeDatabase(string databaseName)

//Changes the current database for an open Connection object. void Close()

//Closes the connection to the database.

IDbCommand CreateCommand()

//Creates and returns a Command object associated with the connection. void Dispose()

//Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.

void Open()

//Opens a database connection with the settings specified by the

ConnectionString property of the provider-specific Connection object.

Figure 4.1: List over IDbConnection interface properties and methods.

The properties and methods above are only the definitions. The different providers then have their own implementation of the above methods and properties. One of the functions above is CreateCommand(). This function returns an IDbCommand.

The IDbCommand is the interface for executing statements against the DBMS. It has proper-ties like CommandText which represents the statement to execute. To be able to add dynamic parameters in a statement the interface contains a list of parameters. An example usage is shown in figure 4.2.

using( DbCommand command = connection.CreateCommand() ) {

command.CommandText = "SELECT Firstname, Lastname FROM Persons WHERE Firstname=:name";

string name = "Dynamic Name";

command.Parameters.Add(":name", name); string res = command.ExecuteScalar(); }

Figure 4.2: Example of parameters in the IDbCommand interface.

MasterConcept uses both System.Data.OleDb and Oracle.DataAccess.Client. An example call to the database is presented in figure 4.3.

(29)

4.2. Implementation

using (OracleCommand cmd = conn.CreateCommand()) { cmd.CommandText = @"select decode(dbms_lob.getlength(epc_vault_data.data), 0, ’FALSE’, null, ’FALSE’, ’TRUE’) hasfile from epc_vault_data

where epc_vault_data.id = :id_ and epc_vault_data.version = :ver_"; OracleParameter id = new OracleParameter(); id.ParameterName = ":id_";

id.Value = vid.VaultId; cmd.Parameters.Add(id);

OracleParameter version = new OracleParameter(); version.ParameterName = ":ver_";

version.Value = vid.VaultVersion; cmd.Parameters.Add(version);

using (OracleDataReader reader = cmd.ExecuteReader()) {

if (reader.Read()) {

return reader.GetString(0) == "TRUE"; }

}

closeConnection(conn); }

Figure 4.3: Call to the database using Oracle.DataAccess.Client.

Technically, an OleDb connection could be used to connect to both Oracle and SQL Server. The problem is that the OleDb driver is deprecated [15]. OleDb also lacks support for CJK Unicode1characters which might be present in documents.

4.2 Implementation

Our implementation is a class that utilizes the IDbConnection interface. Thus, executing state-ments with the new interface is very similar to before. To differentiate between the different connection types (DBMS vendors), we implemented an enumeration representing different types of connections. Figure 4.4 shows the enumeration.

(30)

4.2. Implementation

public enum DatabaseType {

Oracle, SQLServer }

DatabaseType.Oracle // Represents a Oracle connection.

DatabaseType.SQLServer // Represents a SQL Server connection.

Figure 4.4: Enumeration representing database types. Could be extended further.

The constructor of the implemented class takes a DatabaseType as parameter to determine which connection to create. Figure 4.5 shows the constructor.

public DatabaseInterface(DatabaseType Type) {

string provider = null; switch (Type) { case DatabaseType.Oracle: provider = "Oracle.DataAccess.Client"; break; case DatabaseType.SQLServer: provider = "System.Data.SqlClient"; break; } dbType_ = Type; dbFactory_ = DbProviderFactories.GetFactory(provider); dbConnection_ = dbFactory_.CreateConnection(); }

Figure 4.5: The constructor of the implemented class.

Depending on the passed variable value of the database connection to create, the ap-propriate connection is created. For Oracle Oracle.DataAccess.Client is used. For SQL Server System.Data.SqlClient is used.

Parameters

One thing we noticed is that drivers handle parameters in different ways. The Oracle driver uses ’:’ for determining a parameter in a statement. The SQL Server driver uses ’@’ for param-eters. To solve this problem the IDbCommand interface CommandText property was modified. Figure 4.6 shows the result. Before setting the string to the new value we pass it over to the ConvertStatement function. This function handles the translation between the different connections.

(31)

4.2. Implementation

public string CommandText {

get => dbCommand_.CommandText; set

{

param.Clear();

dbCommand_.CommandText = ConvertStatement(value); }

}

Figure 4.6: The overwritten CommandText property for the IDbCommand interface. To extend the flexibility to add more DBMS, the implemented class has a private property of all parameter character types. Figure 4.7 shows the property. The parameter characters are ordered in the same way as the enumeration, shown in figure 4.4. Accessing the parameter character for a DBMS is shown in figure 4.8.

private static char[] ParameterType = { ’:’, ’@’ };

Figure 4.7: The character different DBMSs use for parameters.

ParameterType[(int)DatabaseType.Oracle];

Figure 4.8: Accessing Oracle’s parameter character.

The function ConvertStatement, displayed earlier in figure 4.6, translate all the parameter characters in the statement to the appropriate one for the current DBMS in use. Figure 4.9 shows part of the implementation.

private string ConvertStatement(string statement) {

...

statement =

statement.Replace(ParameterType[(int)DatabaseType.Oracle], ParameterType[(int)dbType_]);

...

return sqlParser_.Parse(text, dbType_);; }

Figure 4.9: Snippet from the ConvertStatement function that translate the parameter character.

Writing statements

As explained earlier in section 2.5, there exist unique Oracle and SQL Server functions to be used in statements. This means that occasionally one would have to write separate statements for different DBMSs. When writing standard SQL statements, without DBMS specific func-tions, one would not want to write the same statement multiple times. To handle this issue, the implemented interface uses a helper function for determining which statement to execute depending on the current connection type. Writing a statement with the implemented class is displayed in figure 4.10.

(32)

4.2. Implementation

using( var command = (DatabaseCommand)connection.CreateCommand() ) {

command.CommandText = command.GetStatement("UPDATE epc_vault_data SET data = :data, data_length = :length WHERE id = :id AND version = :version");

... }

Figure 4.10: Example of writing a standard SQL statement.

Figure 4.10 shows an example of a standard SQL statement without any unique Ora-cle/SQL Server functions, like decode for Oracle or CONVERT for SQL Server. Because of this, only one statement need to be supplied and it would work for both Oracle and SQL Server.

Figure 4.11 shows a statement that would only execute functionally on a Oracle DBMS be-cause of the Oracle unique function decode and dbms_lob.getlength. Our implementation gives two possible ways to make this statement work with a SQL Server DBMS. The first one is to write a statement which uses the SQL Server equivalents when it comes to the functions. This solution is displayed in figure 4.12. It shows how the function GetStatement can take multi-ple parameters. The first parameter is a string and represents the statement to execute. The second parameter tells the function which database type the previous statement is expected to run on. Figure 4.12 only shows one database type for each statement, but it can be any number. Following our second parameter is the third, which is another statement. As we can see, this statement is expected to run on SQL Server based on the fourth parameter. More on the function GetStatement in section Processing the statement.

command.CommandText = command.GetStatement(@"SELECT

decode(dbms_lob.getlength(epc_vault_data.data), 0, ’FALSE’, null, ’FALSE’, ’TRUE’)

FROM epc_vault_data

WHERE epc_vault_data.id = :id_

AND epc_vault_data.version = :ver_"); ...

}

Figure 4.11: Statement with the unique Oracle functions decode and dbms_lob.get_length.

command.CommandText = command.GetStatement(@

"decode(dbms_lob.getlength(data), 0, ’FALSE’, ’TRUE’)", DatabaseType.Oracle,

"CASE DATALENGTH(data) WHEN 0 THEN ’FALSE’ ELSE ’TRUE’ END", DatabaseType.SQLServer)

); ... }

(33)

4.2. Implementation

The other way to write agnostic statements is to use the implemented SQL parser. The parser allows to overwrite DBMS specific functions and translate them for use in another DBMS. Figure 4.13 shows an example of defining a function that has been translated. Us-ing figure 4.13 as a reference, the first strUs-ing is the function to be translated: decode and dbms_lob.getlength. The second one is a function, Decode and GetLength in this case, and it is the function that is called whenever any of the strings is found within a statement. The last parameter is the type for which the translation is meant for. Each function is expected to take a List<string> as the only parameter. The implementation of Decode and GetLength is also visible in figure 4.13.

private TupleList<string, Func<List<string>, string>,

DatabaseInterface.DatabaseType> Functions = new TupleList<string, Func<List<string>, string>, DatabaseInterface.DatabaseType> {

{"decode", Decode, DatabaseType.SQLServer},

{"dbms_lob.getlength", GetLength, DatabaseType.SQLServer}, ...

}

private static string Decode(List<string> Parameters) {

string Statement = String.Format("CASE {0}", Parameters.ElementAt(0));

Parameters.RemoveAt(0); int index = 0;

for (; index < Parameters.Count - 1; index += 2)

Statement += String.Format(" WHEN {0} THEN {1}", Parameters.ElementAt(index),

Parameters.ElementAt(index + 1));

Statement += String.Format(" ELSE {0} END", Parameters.ElementAt(index));

return Statement; }

private static string GetLength(List<string> Parameters) {

return String.Format("DATALENGTH({0})", Parameters.ElementAt(0)); }

Figure 4.13: List over functions to be translated.

Figure 4.9 displayed the function ConvertStatement. As seen in the figure, before returning the statement, it is passed over to the parser which parses the statement and translates every function presented in the TupleList showed in figure 4.13.

Looking back at figure 4.12 where two separate statements where represented to make it compatible with both Oracle and SQL Server. decode and dbms_lob.getlength were imple-mented for use with SQL Server in figure 4.13. This makes it possible to change the code in figure 4.12 to what is presented in figure 4.14.

(34)

4.2. Implementation

"decode(dbms_lob.getlength(data), 0, ’FALSE’, ’TRUE’)"

); ... }

Figure 4.14: Implementation after implementing parse functions.

Processing the statement

The function GetStatement mentioned in section Writing statements is a helper function for returning the correct statement to execute depending on the current DBMS connection type. The function header is displayed in figure 4.15.

public string GetStatement(params object[] Statements)

Figure 4.15: Function header of helper function GetStatement

The function header takes one parameter, params object[] Statements, in C# this is the way to allow a function to take unlimited parameters. This allows to provide as many statements as wanted. The function expects to take one string followed by any number of database types that the statement can be executed on. An example where a statement is supported by both SQL Server and MySQL is shown in figure 4.16.

"decode(dbms_lob.getlength(data), 0, ’FALSE’, ’TRUE’)", DatabaseType.Oracle,

"CASE data WHEN 0 THEN ’FALSE’ ELSE ’TRUE’", DatabaseType.SQLServer, DatabaseType.MySQL );

... }

Figure 4.16: Statement supported by multiple DBMSs.

If there is only one statement provided, as in figure 4.17, it is expected to execute on all DBMSs.

"SELECT * FROM table"); ...

}

Figure 4.17: Statement expected to execute on all DBMSs.

If a command consists of multiple statements, the statements might have different number of parameters. Thus, a way of adding parameters in a uniform way is needed. The

(35)

imple-4.2. Implementation

mented class has a function called AddParameter. Figure 4.18 shows the function. By default, it takes two parameters but a third one can be supplied if needed. The third parameter is used to add a parameter to a specific statement.

public void AddParameter(string Parameter, object Value, DatabaseType DbType = DatabaseType.None)

{

if (dbType_ != DbType && DbType != DatabaseType.None) return;

IDbDataParameter par = dbCommand_.CreateParameter(); par.ParameterName = ConvertStatement(Parameter); par.Value = Value;

dbCommand_.Parameters.Add(par); }

Figure 4.18: Function for adding a parameter to a statement.

IDataReader

To read data from a statement, the IDataReader interface is used. MasterConcept stores binary data in the database. In Oracle, this is called binary large objects (BLOB). In SQL Server, this is called SqlBinary. Both the Oracle and the SQL Server driver has a function for retrieving binary data. Figure 4.19 shows the functions.

OracleDataReader.GetOracleBlob(int); SqlDataReader.GetSqlBinary(int);

Figure 4.19: Functions for reading binary data in Oracle and SQL Server.

To optimize reading of binary data the IDataReader interface was implemented with a custom function for reading data. The function utilizes the functions mentioned in figure 4.19 depending on the current connection. If the connection type is neither Oracle or SQL Server a more general data reading is used. Figure 4.20 shows the custom implementation. public byte[] ReadBlob(int Ordinal)

{

if (dbType_ == DatabaseType.Oracle) {

var rdr = NativeReader<OracleDataReader>(); var blob = rdr.GetOracleBlob(Ordinal); return blob.IsNull ? null : blob.Value; }

else if (dbType_ == DatabaseType.SQLServer) {

var rdr = NativeReader<SqlDataReader>(); var blob = rdr.GetSqlBinary(Ordinal); return blob.IsNull ? null : blob.Value; }

var blob = Read<byte[]>(Ordinal); return blob;

(36)

4.3. Analysis of the layer architecture

A problem that occurred when reading data from various queries is that the returned type could vary between Oracle and SQL Server. One example of this is the SELECT COUNT(*) which in Oracle returns a decimal and in SQL Server returns an integer. If a decimal is de-clared in C# and an integer is returned, the program will crash. Because of this, a way of reading data and returning the same type in both SQL Server and Oracle was implemented. Figure 4.21 shows both the old and the new function for reading data. The new function is a template function and tries to convert the read data type to the wanted data type. If this fails an exception is thrown.

public object GetValue(int); // Used to read data.

public T Read<T>(int); // To make sure the correct type is returned.

Figure 4.21: Function for reading data.

Usage

Figure 4.22 shows the same example scenario as in figure 4.3 using the new interface. The syntax is very similar between the two.

using (var command = (DatabaseCommand)connection.CreateCommand()) {

string statement = command.GetStatement(@"SELECT

decode(dbms_lob.getlength(data), 0, ’FALSE’, null, ’FALSE’, ’TRUE’) hasfile FROM epc_vault_data WHERE id = :id AND version = :version"

);

command.CommandText = statement;

command.AddParameter(":id", vId.VaultId);

command.AddParameter(":version", vId.VaultVersion);

using( var reader = (DatabaseReader)command.ExecuteReader() ) {

if( reader.Read() ) {

return reader.Read<string>(0) == "TRUE"; }

}

closeConnection(connection); }

Figure 4.22: Call to the database using the new implementation.

4.3 Analysis of the layer architecture

When we analyzed the code, we found parts that did not follow the guidelines that suited for our development of Ides software.

Direct communication

There was loose interaction between the layers. In a sense that there were calls from the pre-sentation layer to the data layer.

(37)

4.3. Analysis of the layer architecture

public void CompareDrawings(DocumentInfo id1, bool doGen) {

const string cSQLPrevRev = _{"select * "} +

"from dotdocs " +

"where cotproj_projid = :proj_ " +

"and projphase = :phase_ " +

"and id = :id_ " +

"and epc_sort(replace(rev,’-’,’0’)) < epc_sort(replace(:rev_,’-’,’0’)) " +

"order by epc_sort(replace(rev,’-’,’0’)) desc "; IdesQueryCommand cmd = new IdesQueryCommand();

cmd.CommandText = cSQLPrevRev;

cmd.Parameters.Add(new IdesParameter(":proj_", id1.ProjID));

cmd.Parameters.Add(new IdesParameter(":phase_", id1.ProjectPhase)); cmd.Parameters.Add(new IdesParameter(":id_", id1.Id));

cmd.Parameters.Add(new IdesParameter(":rev_", id1.Revision)); }

Figure 4.23: Example of a statement in the user interface

Code logic of components is not cohesive

There were business logic in the presentation layer that should not be there. private string getChildItemFolderName(ItemDocumentStructureRelation

relation) {

var empty = Session.Translator.ReadString("IDS_EMPTY");

var number = String.IsNullOrWhiteSpace(relation.Child_Keya01) ? empty : relation.Child_Keya01;

var folderName = number.Replace(" ", "-"); return folderName;

}

Figure 4.24: Example of application logic in the user interface

Excessive dependencies between layers

The overview of how the layers are structured is that there are high dependencies between them. That is also one reason why this thesis exists and the company were of course aware of this before.

Our analysis is that the interaction between the layers is not well designed and that it exists logic in wrong place, is what is causing most problem with the layer design for Ides to become database-agnostic. The fact that logic is located in incorrect layers, makes it hard to maintain. Our solution presented in 4.2 is built with the idea of keeping the code maintainable and easy to add new database vendors.

(38)

5 Evaluation

This chapter presents the outcome of the evaluation described in section 3.3. The tests were written by us and evaluates the performance of execution time, CPU, and memory usage. The tests were carried out on a local machine and a remote server. The bars in the charts below are grouped by the statements presented in section 3.3. When it is visually unclear to have the bars in the same chart, they are separated to clarify for the reader.

The tests was executed from a local computer and tried to make them as fair as possible. We started the tests and let them finish before using the computer again to make sure no other programs could affect the outcome. To our extent this was the best environment we could access, perform, and evaluate the tests on.

5.1 Stored procedure

This test was done, as mentioned earlier, to determine if translating stored procedures to C# code was a valid way of creating abstraction. MasterConcept utilizes multiple stored procedures but for this test we decided to use the most central one, in terms of calls, for the document part. Down below in figure 5.1 is a snippet from the stored procedure written for Oracle. Later in figure 5.2 follows the corresponding C# code. Since a stored procedure is executed within the DBMS and not on the local computer there was no way for us to measure the CPU utilization or memory usage in a correct way. Because of that the following test only measures the execution time.

An Approach to Achieve DBMS Vendor Independence for Ides AB&apos;s Platform

Linköping University | Department of Computer Science

Bachelor thesis, 16 ECTS | Datateknik

2017 | LIU-IDA/LITH-EX-G--17/008--SE

An

Approach

to

Achieve

DBMS Vendor Independence

for Ides AB’s Platform

Niklas Blomqvist

Philip Johansson

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research questions

1.4

Background

1.5

Delimitations

2

Theory

2.1

Reading, understanding, and analyzing code

Notes/Sketching

Listing markup

Breakpoints

2.2

Software development models

Big-bang model

Code-and-fix model

Waterfall model

Spiral model

2.3

Maintainability

2.4

Refactoring code

Extract method

Inline method

2.5

Database management systems

Relations in databases

Structured Query Language

Stored procedures

Triggers

Oracle

SQL Server

2.6

Layer architecture

2.7

Object-relational mapping

2.8

Software testing

3

Method

3.1

Pre-study

3.2

Implementation

3.3

Software verification

Statements

Testing environment

4

Results

4.1

Background

4.2

Implementation

Parameters

Writing statements

An Approach to Achieve DBMS Vendor Independence for Ides AB's Platform