5.3 Performance measurements
6.1.1 Data source types and functions with multiple imple-
While some of the functions used in the AMOSQL queries are implemented, and can be executed, in exactly one data source, there are also functions that can be executed in more than one data source. According to the this criteria, the functions in AMOSII are classied into:
single implementation functions(SIFs)
multiple implementations functions(MIFs)
6.1 Query decomposition 91
The user-dened local functions as well as the proxy functions are single implementation functions. For example, the function
name
Person!string is a SIF, dened over the instances of the typePerson
inEMPLOY EE DB
. The implementation of this function is known only in that mediator and therefore it can be executed only there. The second category contains func-tions that are executable in more than one data source, as for example, the comparison operators (e.g.<
,>
, etc.) that are executable in AMOSII servers, relational databases, certain storage managers, etc. The MIFs can also be user-dened. However, since in our framework each user-dened type is dened in only one data source, a MIF may take only literal typed argu-ments. A framework that would support replicated user-dened types and MIFs taking user-dened type arguments would require that the state (value) of the instances of these types is shipped among the mediators, in order to be used at the data source where the MIF is executed. In the framework presented in this thesis, only OIDs and the needed portions of the instances' state is shipped among the mediators and the data sources. Replicated user-dened types can be simulated by stringifying the state of the instances and handling them in the mediators as character strings. The wrappers trans-late the stringied instances from and to the representations required in the data sources. Extending the integration framework to handle replicated user-dened types is one of the topics of our current research.Depending on the set of MIFs implemented at a data source, the data sources are classied into several data source types (DSTs). Inversely, the set of MIFs associated with a DST is named generic capability of the DST.
Besides a generic capability dened by its type, each data source can have specic capabilitydened by the types and functions exported to the AMOSII mediators or translators. To simplify the presentation, in the rest of this chapter the term capability is used to refer only to a generic capability of a source or a DST.
In order to reuse the capability specications, the DSTs are organized in a hierarchy where the more specic DSTs inherit the capabilities of the more general ones. This hierarchy is separate from the AMOSII type hier-archy and is used only during the query decomposition as described below.
Figure 6.2 shows an example of an AMOSII DST hierarchy. All DST hier-archies are rooted in a node representing data sources with only the basic capability to execute one simple calculus predicate that invokes a single function/operation in this source and returns the result to the translator.
This corresponds to a sequential scan capability in some other mediation
92 Query Decomposition and Execution
Scan
Join Aggr egation
Compar ison Ar ithmetic
Relational
M atr ix
Amos
+, -, *, /, ... >, <, =, ... sum, avg, max, ...
Single Col. Join
matrix_mult, matrix_add, ...
Figure 6.2: Data source capabilities hierarchy
frameworks [36]. Data sources of this type cannot execute MIFs. At the next capability level, DSTs with capabilities to perform arithmetic, comparison and join operations are dened. The arithmetic and comparison DST are dened using the usual set of MIFs, shown in the gure. A MIF in a capa-bility of a certain DST can be dened as a generic function, when all of its resolvents are executable at the sources of the specied DST, or as a specic resolvent when only a particular resolvent can be scheduled for execution at the specied type of sources.
The two join capabilities, the single collection join (SC join) and the general join, are not specied using MIFs as all the other DST capabilities.
In the calculus used in AMOSII, the equi-joins are represented implicitly by a variable appearing in more than one query predicate. Accordingly, the join capabilities represent that a data source (and its wrapper) can handle several predicates connected by common variables as a single unit. The predicates executed in such data sources can be grouped together before sending them to the wrapper to achieve more ecient translation to expressions in the local language. For example, relational databases and AMOSII servers can perform joins, and therefore it might be favorable to allow join operations
6.1 Query decomposition 93
to be pushed to data sources of these types.
Based on the properties of the commonly used data sources, there is a need to distinguish between two types of join capabilities. First, there are sources that are capable of combining and executing conditions over only a single collection of data items in the source (e.g. a table in a storage manager). These types of sources are dened by using a DST that inherits only the SC join capability. An example of this kind of a DST is a storage manager storing several data tables, each represented in the AMOSII trans-lator as a proxy type. Each table can be scanned with associated conditions.
The conditions to a single table can be added together. However, operations over dierent tables need to be submitted separately. Therefore, for each ta-ble, the MIF operations are grouped together with the proxy type typecheck predicate, and submitted to the wrapper. One such grouped predicate is sub-mitted for each dierent collection. A system with these types of properties ts the capability description of the comparison DST in gure 6.2.
The general join capability is inherited by DSTs capable of processing joins over multiple collections (e.g. relational database sources). The decom-poser sees each collection as a proxy type, and together with a join capability, it combines the operations over several proxy types into a single subquery sent to the data sources.
New DSTs are dened by inserting them into the DST hierarchy. For ex-ample a DST representing a software capable of matrix operations is named Matrix, and placed under the DST hierarchy root node in gure 6.2. This im-plies that it supports the execution of one operation at a time. A source that allows a combination of several matrix operations would have been dened as a child of the Join DST.