Business Value of the “Data
Warehouse Appliance” Technology
Affärsvärde med tekniken "Data Warehouse Appliance"Saga Undén
Eric Westerlund
A
BSTRACT
The recent increase in the amount of stored company data and exceeding interest in data analysis has resulted in new requirements on Data Warehousing solutions. This has led to the development of Data Warehouse Appliances, which this research project aims to investigate the business value of. The result is intended to support companies that are considering an investment, and give them an understanding of the technology’s benefits.
The research project was conducted in two parts. Vendors of the Appliance technology were interviewed, as well as their customers. The results from the vendor interviews together with a literature study provided a knowledge base for the analysis of the user companies’ interviews. The results clearly indicate that there is value in the technology for larger companies.
The research shows that although the main benefits advocated by the vendors match the perceived ones of the user companies, there are other aspects which they value even more. Examples of this include a reduced amount of administrative tasks and support from a single source. The research also reveals that the benefits estimated by the customer at the time of purchase were not their most valued benefits in hindsight.
S
AMMANFATTNING
Företag lagrar allt större datamängder och låter dessa ligga till grund för komplicerade
dataanalyser, vilket ställer nya krav på deras befintliga Data Warehouse-‐lösningar. Detta har lett till utvecklingen av Data Warehouse Appliance, vars affärsnytta detta projekt syftar till att utreda. Resultatet kommer tillhandahålla beslutsunderlag för de företag som överväger en investering i tekniken.
Undersökningen genomfördes i två steg. Intervjuer genomfördes med leverantörer som tillhandahåller tekniken såväl som med deras användande kunder. Resultaten från
leverantörsintervjuerna tillsammans med en omfattande litteraturstudie låg sedan till grund för den analys som gjordes av intervjuerna med de användande företagen. Resultaten visar på ett verkligt värde i tekniken för företag med stora datamängder.
Undersökningen visar att de fördelar som framhålls som teknikens främsta av leverantörerna bekräftas av deras användande kunder, men att det finns andra vinster de värdesätter ännu mer. Dessa inkluderar en minskad teknisk komplexitet, en minskad mängd administrativa uppgifter samt support från en enda källa. Undersökningen visar även att de faktorer som spelat störst roll vid investeringen inte är desamma som tillskrivs störst värde i efterhand.
P
REFACE
This thesis is written for companies considering an investment in the Data Warehouse Appliance technology, in an attempt to provide them with objective information on the subject. It might also be of interest to professionals within the field of Business Intelligence, as well as any novice who is curious about and looking for an introduction to Business Intelligence, Data Warehousing or Data Warehouse Appliances.
Working with this thesis has been very interesting, enjoyable and worthwhile. We would like to thank Affecto for their support and confidence in us -‐ a special thanks goes out to our tutor (and mentor) Tomas Nabel who has acted as an excellent sounding board and with whom we have had many interesting and valuable discussions during the project. We would also like to thank our examiner Anders Sjögren who has been of great help in all administrative formalities, and Richard Nordberg who has provided guidance and support throughout the writing process.
Finally, we would like to thank the house of Nymble, which has provided us with not only great coffee, lunch and ‘fika’, but also super comfortable arm chairs and ‘Musikrummet’ which has acted as our office these two months.
T
ABLE OF CONTENTS
1.
Introduction 7
1.1
Problem definition 7
1.2
Purpose and goal 8
1.3
Scope and delimitation 8
1.4
Project method 8
2.
Theoretical background 9
2.1
Business Intelligence 9
2.1.2
Data Warehousing 9
2.1.3
Data Warehouse Appliances 12
2.1.4
Data Warehouse Appliance architecture 13
2.2
Measuring business value 14
2.2.1
Business value of an IT investment 15
2.2.2
Value of Business Intelligence 15
2.2.3
Cost and value of information 17
3.
Research Method 19
3.1
Choice of method 19
3.2
Seven stages of interview investigation 19
3.3
Question types – when to ask what and how 20
3.4
How to conduct an interview of great quality 21
3.5
What to consider when conducting an interview 22
3.6
What to consider when analyzing the interview results 22
4
Results 24
4.1
Vendor interview results 24
4.1.1
Top business values of Data Warehouse Appliances 24
4.1.1.1
Performance 25
4.1.1.3
Scalability 27
4.1.1.4
Simplicity 28
4.2
User interviews 28
4.2.1
Thoughts on Data Warehouse Appliance before implementation 29
4.2.2
Thoughts on Data Warehouse Appliance after implementation 30
5.
Analysis 31
5.1
Vendor interviews 31
5.1.1
Vendor truths 31
5.1.1
Analysis of vendor truths 32
5.2
User companies interviews 32
5.2.2
The difference between expected and perceived business value of Data Warehouse
Appliances 33
5.2.3
What drives an investment in Data Warehouse Appliance technology 33
5.2.4
Delivering value is more important than lowering costs 34
5.2.5
Focus on information rather than technology 34
6.
Conclusion 35
6.1
Considerations 36
6.2
Further research 36
7.
References 37
7.1
Further reading 37
7.2
Figures 38
Appendix A 40
1
Vendor interview question framework 40
2
Vendor question form 40
3
Using companies interview question framework 42
T
ABLE OF FIGURES
Figure 1: Data Warehouse architecture 11 Figure 2: Shared everything architecture 13 Figure 3: Shared nothing architecture 13
Figure 4: Business value of Business Intelligence 16
Figure 5: Avantages of Data Warehouse Appliances according to the vendors 24
Figure 6: Factors that contribute to the performance of Data Warehouse Appliances,
according to the vendors 25 Figure 7: Hardware components of a Data Warehouse Appliance 27 Figure 8: Pricing of a Data Warehouse Appliance 28 Figure 9: Administrative tasks, before and after an implementation of
Data Warehouse Appliances 30
1.
I
NTRODUCTION
Over the years, companies have come to increasingly value their stored information. This
realization is related to the fact that today, almost all company information is stored electronically in databases. The companies strive towards using this accumulated information as a source and base for various decision support tools. This has led to the development of Business Intelligence (BI) tools and Data Warehousing (DW), which helps companies get more out of what they already possess, by analyzing data and transforming it into information. The very best results are obtained when implementing a customized solution which fits in to the companies’ unique business
processes. Among other things, this enables ad hoc reports and forecasts that supports employees at all levels in their decision-‐making. While a couple of years ago, the usage of Business Intelligence tools gave your company business leverage, today it has become nearly mandatory.
The Business Intelligence concept of Data Warehousing aims to collect data from multiple sources and store it in one common database, used for reporting and other BI tools (Porter & Rome, 1995). Today, as the amount of collected data grows, some companies are growing out of their Data Warehouse solutions. For them, a pre-‐packaged, optimized, large scale Data Warehouse solution – Data Warehouse Appliance -‐ might be of interest.
1.1 P
ROBLEM DEFINITIONIn businesses such as finance, telecommunication and retail, extremely large amounts of data is generated every day. This could serve as a perfect source for Business Intelligence tools and applications, which analyze data and create analyses that can provide support in business decision situations. However, a problem arises when the generated data amounts to a level where it is no longer possible to load into the system quickly enough. For example, this could result in that the weekly sales statistics are not completely loaded into the BI applications during the weekend. This, in turn, would mean that the upcoming results from the BI tools would never be based on fresh data, but instead on an older and in some cases irrelevant base.
1.2 P
URPOSE AND GOALThis thesis aims to investigate the business value of the Data Warehouse Appliance technology, in order to help companies that are considering an investment in making their decision.
1.3 S
COPE AND DELIMITATIONThe study will focus on the Data Warehouse Appliance market in Sweden. The following suppliers and their respective products will be considered:
● Teradata Enterprise Data Warehouse, Teradata 13.10 ● IBM Netezza
● Oracle Exadata Database Machine
● Microsoft/HP Enterprise Data Warehouse Appliance
● SAP HANA
Other suppliers of the technology, that does not hold market in Sweden, has been set as out of scope for this research project.
1.4 P
ROJECT METHOD2.
T
HEORETICAL BACKGROUND
In order to provide relevant background information on the research subject, this section presents information compiled from a literature study. The first section presents the Business Intelligence and Data Warehousing areas. The second deals with business value -‐ its definition and ways it can be assessed. The information about Business Intelligence, Data Warehousing and business value has been collected from books and academic articles. The information about Data Warehouse Appliances is based on interviews with Appliance vendors as well as their documentation.
2.1 B
USINESSI
NTELLIGENCEBusiness Intelligence (BI) is a concept that can be described as the usage of business information
and business analysis in key business processes in order to take actions and make decisions that increase performance or profit. It is not a specific product, technology or methodology but rather a combination of the three (Williams & Williams, 2007).
There has been an increased interest in Business Intelligence over the past few years. What was business leverage five or ten years ago is today mandatory in order to keep up with the
competition. Every year since 2004, Business Intelligence has been among the top ten priorities of CIO’s. This year, 2012, it is the very top one (Gartner, 2004-‐2012).
Today, as more and more information is stored electronically, the foundation on which BI tools rely becomes greater. One reason for this is the fact that prices on hardware has dropped, allowing companies to not only store their current data, but historical as well (Chaudhuri, Dayal & Narasayya, 2011). The technology for storing this historical data is commonly called Data Warehousing.
2.1.1 D
ATAW
AREHOUSINGData Warehousing (DW) is a term for the collection of decision support technologies enabling
companies to make better and faster decisions (Chaudhuri & Dayal, 1997). In order to understand its definition, one must first know the basics of operational databases.
Operational databases are digital storage areas for computer applications. It is a solution for handling lots of data for many users. When new data is created within an application, it is sent to the database which writes it to its memory. When a user wants information in an application, a request -‐ or query -‐ for the relevant data is sent to the database. The read and write operations of an operational database are typically simple and many. Every single query that is sent costs a bit of the database’s capacity, meaning that the amount of capacity needed is based on the number of queries and their complexity. Therefore, companies with many users or a large amount of complex queries need a database with a lot of capacity (Abiteboul et al, 1995).
In the 1990’s when companies were starting to analyze data stored in their databases, they realized some important differences between operational and analytical needs:
● The data serving needs were physically different
● The supporting technology needs were fundamentally different ● The user communities were different
● The processing characteristics were fundamentally different
These findings led to the separation of operational databases and databases with historical data intended for analysis. These databases were named Data Warehouses and its main characteristics are (Inmon, 2005):
• It has a longer time horizon than operational databases • It integrates data from many heterogeneous sources
• It is organized around subjects such as customer, product or sales
• Its data is not changed over time, the only permitted change is to add new data
In later years Data Warehousing has come to mean different things. One meaning is the database itself and another, broader meaning is the entire Data Warehouse environment. The reason for this is that in the beginning a Data Warehouse consisted of just one database. As it often ended up overly complicated and hard to understand and navigate, it evolved into an architecture consisting of both a large integrated database and smaller databases targeted only to support a few
Surrounding this architecture are processes to handle the flow of data from operational systems to analytic applications. This is needed because the data stored often differs between the source systems. Examples of differences are:
• Label of the information,
such as a person being labeled as a customer in one system and a user in another • Structure of data,
such as forename and surname stored separately in one system and together in another • Formatting of data,
such as a zip code saved as a number in one system and as a text string in another
The term used to describe this flow of information is the Extract-Transform-Load (ETL) process. Figure 1 displays a typical Data Warehouse architecture with source systems, Data Warehouse, Data Marts, analytical tools, as well as the ETL process.
2.1.2 D
ATAW
AREHOUSEA
PPLIANCESThis section is a compilation of information extracted from interviews with Data Warehouse Appliance vendors and a number of published documents. As an introduction, here is the definition of appliance by the New Oxford American Dictionary:
appliance
¦
əә plīəәns¦
noun
1 a device or piece of equipment designed to perform a specific
task, typically a domestic one. See note at TOOL
.
• an apparatus fitted by a surgeon or a dentist for corrective or
therapeutic purpose : electrical and gas appliances.
2 Brit. the action or process of bringing something into operation :
the appliance of science could increase crop yields.
The definition of Data Warehouse Appliance is, according to one vendor, a complete and optimized software and hardware solution for large-‐scale Data Warehousing purposes. Others referred to an analogy of a kitchen appliance, and argued that any two appliances have one thing in common: it is not defined by what it consists of, but by what it is meant to do. While you could describe a toaster as a metal box containing heating elements and a spring timer, the common way is to say it's a tool for toasting bread. Ergo, an appliance is a tool or product with a specific purpose.
According to vendors, companies that have invested in Appliance technology are in one of the following categories:
● Companies with large amounts of data ● Companies with complex queries ● Companies with many queries
Targeted areas are retail, telecommunications and banking. What they have in common is the large amount of operational data that is generated every day. Banks register every transaction from every customer, retail companies register every item sold in every store and telephone companies register every call and message of every customer.
However, the vendors differentiate as they target companies of various sizes. While one vendor states that those who consider the DW Appliance technology usually are among the five largest in their industry, others imply that their solutions fit the needs of smaller sized companies as well. Another vendor claims that there are clear breaking points in data volume that indicate that an Appliance is applicable. This vendor states that at six to ten terabytes of stored data, it becomes more beneficial in terms of hardware price and performance -‐ while other vendors mention one terabyte as this breaking point.
2.1.3 D
ATAW
AREHOUSEA
PPLIANCE ARCHITECTUREWhen DW Appliance vendors are asked how the technology works, it is clear that the solution is complex. One component that is essential to the concept of Data Warehouse Appliance is the overall architectural design.
Data Warehouse Appliances focus on two architectural types of design: Symmetric Multi-‐Processing (SMP) and Massively Parallel Processing (MPP). Both intend to speed up the input/output (I/O) of the database but they work in slightly different ways. The SMP design revolves around multiple processing units connected to a single shared memory and storage area. This design is often called a shared everything design, and is shown in figure 2. The MPP design has parallel processing units which all have their own data source and memory. This is called shared nothing architecture, and is shown in Figure 3.
Both designs use a query planner, which distributes the incoming tasks on the different processing units. Each unit does its part of the work and the result is then assembled at the end. On top of the query planner is an interface, which typically is able to understand most database query languages.
All DW Appliance systems use some kind of security for handling hardware malfunctions. The most common setup is RAID 1, which means that every disc has a mirror somewhere, containing the exact same information. The system is usually configured in a way that prevents two mirror partitions from being on the same physical machine. The risk of inaccessible data is therefore further reduced.
2.2 M
EASURING BUSINESS VALUEIn order to investigate how an investment in Data Warehouse Appliances can be valued, it is important to first understand what business value is. The economic formula for defining value is rather straight forward: “Economical value occurs when the benefit derived from a resource’s application is greater than the costs incurred from its planning, acquisition, maintenance, and disposition.” This means that value roughly can be translated into benefits minus costs (English, 1999).
The possible outcomes of any successful investment are lowered costs, improved productivity and increased revenue, all leading to that more money will be generated than what was spent. This is called return on investment (ROI) (Adelman & Moss, 2000).
Benefits can be divided into two categories: tangible and intangible. Tangible benefits are those that are considered easily quantifiable, such as higher productivity or fewer returned products.
Intangible benefits are harder to measure and creates value indirectly. Examples of intangible benefits are goodwill and customer relationships. Costs are also usually divided into two categories:
fixed and variable. Fixed costs are described as the costs involved with creating the capacity to
time (English, 1999). These concepts should be kept in mind while reading further about value in IT and Business Intelligence.
2.2.1 B
USINESS VALUE OF ANIT
INVESTMENTBusiness value is the difference between perceived value of the company's product or service and the cost for it. In order to sell a product or service, a company will need to create business value and then capture it. There have been many attempts to try to describe the value of IT in an organization, and the main issue is to describe how a general IT infrastructure contributes to the overall benefits and costs.
There are several reasons to why companies wish to do value assessments of their IT investments -‐ it can not only help justify the money spent, but can also function as a way of engaging the
employees and future users. The assessment process focuses on what creates value and is important for the company. This thought process is said to create creativity and motivation (Dahlgren & Lundgren & Stigberg, 1998) (Keeney, 1994). But there is a real challenge in assessing the value of an IT investment. Studies indicate that there is no absolute method of measuring the value of an IT investment which is applicable for all companies. Instead, while some companies try to quantify the value and make everything into dollars and cents, others consider a list of intangible values as a reason for an investment (Renkema, 2000).
One reason that assessments of IT investments are difficult to conduct is the fact that different parts of an organization might not consider the same things to be of value. From the business
management point of view, factors such as higher margins and improved efficiency are prioritized. But from a technological perspective, availability, performance and security is of higher interest (Gammelgård, 2007).
2.2.2 V
ALUE OFB
USINESS INTELLIGENCEThe true value of Business Intelligence occurs when business information is combined with
Figure 4: Business value of Business Intelligence
Business Intelligence affects the business value to a very large extent. Companies and organizations have been using information as a foundation for decisions and performance control for a long time. This comes from the basic assumption that an informed decision tends to have a higher chance of leading to good results than an uninformed one. This is a straightforward reason for gathering information in a business. The less uncertainty we have about the current state and future outcomes, the better chance we have to make decisions with good outcomes (Clemen & Reilly, 2011).
To assess the value of information that will influence decisions and actions, Clemen & Reilly (2011) introduce the term expected value of information. This term describes what we expect to gain from acquiring more information on how to act. Only by considering the expected value of information can we decide whether to invest in obtaining it. The worst-‐case scenario is that no new input is acquired on how to make the decision, and in this case the expected value of the new information is zero. The best case is when the acquired information always leads to a decision with the best possible outcome. This is according to Clemen & Reilly called perfect information. Putting this together, the expected value of any information source is somewhere between zero and the value of perfect information. Additionally, the expected value of information is critically dependent on the particular decision or problem at hand. This means that different people, in different situations, place different value on the same information.
investments of 179 larger companies were studied. The findings were that companies that had adopted data driven decision-making had 5-‐6% higher output and productivity. They also found a correlation between making decisions based on data and asset utilization, return on equity and market value. In a study made by Park (2006), it was concluded that a full data warehouse solution increases the performance of Decision Support System users.
The main focus of BI is to enable profit making and to make non-‐profit making business processes more efficient. This is done through identifying the information which the business processes need and obtaining that very information. Therefore, every BI environment should be developed around the company's business processes (Willams & Williams, 2007).
2.2.3 C
OST AND VALUE OF INFORMATIONA common approach to Data Warehousing is that all stored data is valuable, meaning that the more information is saved, the more valuable it is. This is not entirely true. Although a Data Warehouse could potentially be more valuable when filled with a greater amount of information, it is not until the information is used in the organization it becomes valuable (English, 1999).
In business there are typically two types of costs, fixed and variable. However when discussing the cost of information there are two other areas that categorize costs: the cost basis and the value basis. The cost basis of information is the cost of developing and maintaining the infrastructure that supports collecting information. This includes developing information and technology architecture, as well as the cost of designing applications and databases. The value basis of information is the cost of applying information. This means the cost for applications that access or retrieve data and use it to perform work or to solve a business problem (English, 1999).
Before the information can create value, it must go through a process containing various steps which all are tied to costs. This process is called the Resource life cycle. IT systems designed to capture data and turn it into information is looked upon as a company resource. The first step of this cycle is the planning. This step consists of planning what software and hardware to buy. The second step is the acquisition step, where the company buys and installs its purchase in the
3.
R
ESEARCH
M
ETHOD
This section presents the research and interview methods used in the study, to vindicate the correctness of the conducted interviews and their function as research material.
3.1 C
HOICE OF METHODThe interviews were conducted according to the principles stated by Kvale in ’Interviews – an introduction to qualitative research interviewing’ (1996). A qualitative research method was chosen because of a number of reasons. First -‐ since the existing research is extremely limited, the interviews serve as the main source of information on the subject and therefore needs to be in-‐ depth. Second -‐ the target interviewees were too few to serve as a reasonable ground for a quantitative research. Third -‐ since a comparison of the answers from the different interviewees was to be conducted, reasons existed for using a predefined set of questions.
However, it is important to create a comfortable interview environment where the interviewee feels secure and comfortable and therefore answers the questions openly. Therefore, the interviews were conducted in a semi-‐structured way, using a framework of topics that were to be discussed instead of questions being answered. This is found in Appendix A. Prior to each interview; these topics were changed to fit the specific interviewee. The interview was then recorded, which allowed the researchers to participate actively and take notes when specific subjects of interest were
discussed to enable revisits to them later. The transcription of the interviews was facilitated by performing it the very day of the interview, while fresh in mind.
3.2 S
EVEN STAGES OF INTERVIEW INVESTIGATIONKvale introduces the following seven stages of interview investigation: 1. Thematizing
6. Verifying 7. Reporting
The first stage, thematizing, results in a well-‐formulated purpose of the investigation and a
description of the main topic. This is to be done before any of the interviews takes place, in order to gain an understanding of what is to be done during the research, and why. It is followed by a
designing phase where the research study is planned in detail with regards to all of the seven stages
as a whole. The interviewing is then conducted in the chosen manner, according to the interview guide that was developed during the previous designing phase. The transcribing phase follows, which aims to prepare the material for analysis. Kvale stresses the importance of this stage, claiming that rather than being a simple clerical task, transcription is itself an interpretative process. Through careful analyzing, conclusions can be drawn. This is done systematically, using a chosen method that is in line with the previously stated purpose of the project. By verifying the collected material, the generalizability, reliability and validity of the conducted interviews are ascertained. Finally the reporting is done to communicate the findings.
3.3 Q
UESTION TYPES–
WHEN TO ASK WHAT AND HOWKvale also introduces how and when different types of interview questions are asked:
● Introducing questions are used to open up a conversation broadly, e.g. ’can you tell me
something about…’
● Follow-up questions are used to keep the conversation going. Either by asking a direct
question on the already touched subject, repeating keywords or agreeing: nodding, making affirmative sounds
● Probing questions are used to make the interviewee elaborate on the already touched
subject
● Specifying questions are used to drill down into a detailed subject and the opinions of the
interviewee, e.g. ’what did you think then?’
● Direct questions are used to openly introduce a new topic or dimension to the discussion
● Indirect questions can be used either to discretely introduce a new topic or dimension to the
● Structuring questions are used to close an already exhausted topic or disrupt a long answer
which is not relevant to the research
● Silence in between the questions are used to make the interviewee more comfortable and
get time to collect his/her thoughts without feeling rushed
● Interpreting questions are asked to confirm that what you have interpreted from the
answers really is what the interviewee meant, e.g. ’So it is true that you mean that…?’
Moreover, the aspects of how and when to use leading questions are discussed. According to Kvale it suits qualitative research interviews particularly well as it is not only is important to repeatedly check the reliability of the interviewees’ answers, but also to verify the interviewers’
interpretations. As a qualitative research study generally comprises a smaller number of interviews than a quantitative, this is of especially great importance. Kvale stresses that the interviewer should not put focus on whether to lead or not, but rather where the interview questions should lead – in important directions, which results in relevant findings for the research study.
3.4 H
OW TO CONDUCT AN INTERVIEW OF GREAT QUALITYA great quality interview requires not only well planned and asked questions, but also an interviewer who possesses the following qualities:
● Knowledgeable ● Structuring ● Clear ● Gentle ● Sensitive ● Open ● Steering ● Critical ● Remembering ● Interpreting
● The answers should be spontaneous, rich, specific, and relevant
● The questions asked should be short and clear, allowing the answers to be long and in focus ● The interviewer should take care to clarify the meanings of relevant terms used in the
interview
● The interpretation of the answers should begin already during the interview
● The interviewer should strive to verify his or her interpretations of the interviewee’s answers during the interview
● The interview should be self-‐communicative and therefore be understandable without extensive knowledge of or introduction to the subject
3.5 W
HAT TO CONSIDER WHEN CONDUCTING AN INTERVIEWWhile performing the analysis, there are two crucial factors of which the researchers need to be aware. First – their own theoretical presuppositions and the role these play in the interpretation of the material. Second – the usage of either miners' or travelers' approach. When using the former, the researcher must take care not to affect the interviewee’s answer in any way – much like a botanic collecting flowers in the nature without damaging the environment. When using the latter, the opposite applies and the questions asked are answered collaboratively.
3.6 W
HAT TO CONSIDER WHEN ANALYZING THE INTERVIEW RESULTSTo gain a high level of reliability, validity and generalizability, there are a number of things to consider when analyzing the interview results, especially when they are qualitative and conducted semi-‐structurally.
Generalizability tells to which degree the conclusions that are drawn from the analysis apply in
general. This is crucial when a small number of interviews are conducted, as they will represent a much larger group. According to Kvale, this is achieved through examining relevant attributes only.
Reliability concerns the consistency of the research findings. The more sources tell the same, the
Validity regards the degree to which the observations reflects the variables that are of true
importance to the research. This is achieved through the researchers’ capabilities and
craftsmanship, and concerns agreeing with the interviewee on the meanings of the terms that are used. It also concerns the truth and correctness of the interviewee’s statements, which must be carefully evaluated by the researcher.
4
R
ESULTS
This section presents a summary of what was said during the interviews with the vendors and user companies.
4.1 V
ENDOR INTERVIEW RESULTSVendor interviews were conducted in order to gain an insight into what the technology aims to solve as well as to analyze the current position of the appliance technology vendors.
In-‐depth interviews were conducted semi-‐structurally and in person. The question framework can be found in Appendix A. Follow-‐up questions were asked, when necessary, via email and telephone. Afterwards, a form was sent out to enable comparisons between the vendors and attain and collect short, clear and specific answers. The question form and the collected answers can be found in Appendix A.
4.1.1 T
OP BUSINESS VALUES OFD
ATAW
AREHOUSEA
PPLIANCESAccording to Data Warehousing Appliance vendors there are many reasons to invest in data warehouse technology. They mention the benefits seen in Figure 5.
● Cost of hardware, the cost that occurs when buying hardware to support the Data Warehouse
Appliance
● Cost of maintenance, the cost of administration and development tasks ● Performance, the speed at which the Data Warehouse Appliance operates ● Support, the external help received when maintaining or troubleshooting
● Time of implementation, the duration of setting up and configuring the Data Warehouse
Apppliance
● Read performance, the speed with which a question to the Data Warehouse Appliance is
retrieved
● Write performance, the speed with which an update batch is inserted into the Data
Warehouse Appliance
● Scalability, the Data Warehouse Appliances’ ability to expand or contract in order to fit the
changing needs of the user company
● Other, including shortened ‘latency’ in information which means the reduced time taken for
information flow between operational system and analysis, and fewer systems to administrate
The following sections cover what the vendors say about these benefits.
4.1.1.1 PERFORMANCE
In terms of performance every vendor has numerous reasons why appliances are fast. The vendors mention many factors that make up the Appliance performance, as seen in Figure 6.
When discussing performance, vendors explain that the main issue with large scale Data
Warehousing is the input/output (I/O). I/O can be described as the flow of data between processing and storage. Today the processing speed is much higher than the reading speed of storage discs. In order to make up for the slow reading speed, Appliance products use parallel processing of data. This is, as shown in Figure 6, an essential part of why appliances have high performance. The goal has been to retrieve as little unnecessary data as possible from the database. To achieve this, the DW Appliance has several processing units directly linked to the location of the data. These
processing units each process their own part of a query and filter out unneeded rows and columns. Many Appliance products also use compression to further reduce the traffic between processing units and storage. This parallel processing technology is controlled by software developed
especially for Appliances. Vendors say that software that handles query planning and optimizing is central in building a parallel Data Warehouse solution.
Because of the highly increased performance in Appliances, the structure of the Data Warehouse can be changed. The potential benefit is shortened latency between registered information in source systems and information ready for analysis. Vendors explain that with increased
performance of queries, the traditional architecture with a large Data Warehouse and several Data Marts can be changed. The result is a structure where all of the data is stored in the Data
Warehouse and the Data Marts are built as views of that data. According to a vendor this has several benefits, such as less duplicated data, less development effort and more flexibility in report design.
When talking about DW Appliance business value, one of the benefits most commonly mentioned by vendors is the change in maintenance. Since the architecture can be changed and compressed to one place, the administrative work is reduced. Vendors argue that since less physical modeling is needed to create Data Marts, indexes and aggregated views, less development is required from a Business Intelligence perspective. The eliminated need to construct Data Marts also contributes to a more flexible environment for the developers.
on this is that it is easier building an optimized solution with hardware that fit well together. Another mentioned reason is that prices can be lowered.
Figure 7: Hardware components of a Data Warehouse Appliance
All vendors provide a unified source of support. Since large scale Data Warehouse architectures can be very complex, it is often hard to specify exactly what is causing errors or performance issues. This problem lies in the many different components that constitute the architecture. Vendors argue that with a standardized product, it is far easier to duplicate the environment and run tests to find a solution. There is also an issue with responsibility, where in a solution with many vendors low performance or errors could be blamed on others.
4.1.1.2 SCALABILITY
Appliance solutions are in many ways targeted for companies with large amounts of data. This means that the products must be able to grow. Vendors talk about the concepts and linear scalability and modular expansion. Linear scalability means that performance, price, and administration will increase linearly when expanding the Data Warehouse Appliance. Modular
expansion means that expansion of the Data Warehouse Appliance is done in modules – a company
4.1.1.3 SIMPLICITY
There are a number of benefits which are less tangible. One reason to invest in an Appliance is -‐ according to the vendors -‐ the lowered amount of systems included in the Data Warehouse architecture. This benefit is accompanied by fewer administrative tasks. A result of these impacts would be a less complex Data Warehouse environment. Another aspect taken into consideration when marketing DW Appliances is the pricing. Vendors have learned that pricing of large scale systems can be very confusing to the customer. They have therefore developed a simple pricing method with either price per complete product, or per amount of storage needed. This is shown in Figure 8. The column ‘Other’ represents the answer that a combination of the pricing methods can be offered.
Figure 8: Pricing of a Data Warehouse Appliance
4.2 U
SER INTERVIEWSUser interviews were conducted in order to gain an insight into the decision process of a Data Warehouse Appliance investment:
What where the grounds for the investment? How was the vendor chosen? How was the implementation managed? And most importantly: what is the perceived business value?
In-‐depth interviews were conducted semi-‐structurally and in person. A framework of questions and topics was sent to the interviewees beforehand. This can be found in Appendix A. Follow-‐up