Data source types and functions with multiple imple-

I dokument Vanja Josifovski (sidor 102-105)

5.3 Performance measurements

6.1.1 Data source types and functions with multiple imple-

While some of the functions used in the AMOSQL queries are implemented, and can be executed, in exactly one data source, there are also functions that can be executed in more than one data source. According to the this criteria, the functions in AMOSII are classi ed into:

 single implementation functions(SIFs)

 multiple implementations functions(MIFs)

6.1 Query decomposition 91

The user-de ned local functions as well as the proxy functions are single implementation functions. For example, the function

name

Person!string is a SIF, de ned over the instances of the type

Person

in

EMPLOY EE DB

. The implementation of this function is known only in that mediator and therefore it can be executed only there. The second category contains func-tions that are executable in more than one data source, as for example, the comparison operators (e.g.

<

,

>

, etc.) that are executable in AMOSII servers, relational databases, certain storage managers, etc. The MIFs can also be user-de ned. However, since in our framework each user-de ned type is de ned in only one data source, a MIF may take only literal typed argu-ments. A framework that would support replicated user-de ned types and MIFs taking user-de ned type arguments would require that the state (value) of the instances of these types is shipped among the mediators, in order to be used at the data source where the MIF is executed. In the framework presented in this thesis, only OIDs and the needed portions of the instances' state is shipped among the mediators and the data sources. Replicated user-de ned types can be simulated by stringifying the state of the instances and handling them in the mediators as character strings. The wrappers trans-late the stringi ed instances from and to the representations required in the data sources. Extending the integration framework to handle replicated user-de ned types is one of the topics of our current research.

Depending on the set of MIFs implemented at a data source, the data sources are classi ed into several data source types (DSTs). Inversely, the set of MIFs associated with a DST is named generic capability of the DST.

Besides a generic capability de ned by its type, each data source can have speci c capabilityde ned by the types and functions exported to the AMOSII mediators or translators. To simplify the presentation, in the rest of this chapter the term capability is used to refer only to a generic capability of a source or a DST.

In order to reuse the capability speci cations, the DSTs are organized in a hierarchy where the more speci c DSTs inherit the capabilities of the more general ones. This hierarchy is separate from the AMOSII type hier-archy and is used only during the query decomposition as described below.

Figure 6.2 shows an example of an AMOSII DST hierarchy. All DST hier-archies are rooted in a node representing data sources with only the basic capability to execute one simple calculus predicate that invokes a single function/operation in this source and returns the result to the translator.

This corresponds to a sequential scan capability in some other mediation

92 Query Decomposition and Execution

Scan

Join Aggr egation

Compar ison Ar ithmetic

Relational

M atr ix

Amos

+, -, *, /, ... >, <, =, ... sum, avg, max, ...

Single Col. Join

matrix_mult, matrix_add, ...

Figure 6.2: Data source capabilities hierarchy

frameworks [36]. Data sources of this type cannot execute MIFs. At the next capability level, DSTs with capabilities to perform arithmetic, comparison and join operations are de ned. The arithmetic and comparison DST are de ned using the usual set of MIFs, shown in the gure. A MIF in a capa-bility of a certain DST can be de ned as a generic function, when all of its resolvents are executable at the sources of the speci ed DST, or as a speci c resolvent when only a particular resolvent can be scheduled for execution at the speci ed type of sources.

The two join capabilities, the single collection join (SC join) and the general join, are not speci ed using MIFs as all the other DST capabilities.

In the calculus used in AMOSII, the equi-joins are represented implicitly by a variable appearing in more than one query predicate. Accordingly, the join capabilities represent that a data source (and its wrapper) can handle several predicates connected by common variables as a single unit. The predicates executed in such data sources can be grouped together before sending them to the wrapper to achieve more ecient translation to expressions in the local language. For example, relational databases and AMOSII servers can perform joins, and therefore it might be favorable to allow join operations

6.1 Query decomposition 93

to be pushed to data sources of these types.

Based on the properties of the commonly used data sources, there is a need to distinguish between two types of join capabilities. First, there are sources that are capable of combining and executing conditions over only a single collection of data items in the source (e.g. a table in a storage manager). These types of sources are de ned by using a DST that inherits only the SC join capability. An example of this kind of a DST is a storage manager storing several data tables, each represented in the AMOSII trans-lator as a proxy type. Each table can be scanned with associated conditions.

The conditions to a single table can be added together. However, operations over di erent tables need to be submitted separately. Therefore, for each ta-ble, the MIF operations are grouped together with the proxy type typecheck predicate, and submitted to the wrapper. One such grouped predicate is sub-mitted for each di erent collection. A system with these types of properties ts the capability description of the comparison DST in gure 6.2.

The general join capability is inherited by DSTs capable of processing joins over multiple collections (e.g. relational database sources). The decom-poser sees each collection as a proxy type, and together with a join capability, it combines the operations over several proxy types into a single subquery sent to the data sources.

New DSTs are de ned by inserting them into the DST hierarchy. For ex-ample a DST representing a software capable of matrix operations is named Matrix, and placed under the DST hierarchy root node in gure 6.2. This im-plies that it supports the execution of one operation at a time. A source that allows a combination of several matrix operations would have been de ned as a child of the Join DST.

I dokument Vanja Josifovski (sidor 102-105)