• No results found

5.2 EXTENDING AMOS WITH LINEAR MATRIX ALGEBRA

5.2.4 The array foreign data source

Many scientific and engineering applications involve one- or multi-dimensional se-quences of numerical data expressing arrays or matrices of numerical values. These are usually used to represent different mathematical concepts, such as scalar-, vector-, or dyadic-valued quantities. Thus, the ability to represent and do computations on se-quences of numerical data is of great importance for scientific and engineering software tools. Several commercial “tool kits” for scientific and engineering computations, like MATLAB1, MATHCAD2, and HiQ3, support these facts.

Likewise, the database community has emphasised the ability to represent sequences of numerical data. This capability can, for instance, be found in commercial products like Illustra [6] and eBASE4. It has also been developed for research DBMSs including

EX-1. MATLAB is a product of MathWorks, Inc.

2. Mathcad is a product of MathSoft, Inc.

3. HiQ is a product of National Instruments, Inc.

4. eBASE is a product of Universal Analytics, Inc.

ODUS [99]. Furthermore, the standard proposals of SQL3 [16] and OQL [17] include data structures for numerical sequences in their object model. By providing data struc-tures for numerical sequences in a database environment it will be possible to combine this type of numerical data with other data types, such as simple integer and real number data, character strings. Hence, an extended set of data structures are made available for facilitating modelling and manipulation of the rich set of scientific and engineering da-ta.

In this work, the AMOS object-relational DBMS has been extended with a numerical array data structure. At the query language level, three different numerical and one-di-mensional, or linear, array types have currently been defined and implemented. The ar-ray types (or classes) are defined for sequences of integers, floats, and doubles, that are denoted:

• iarray

• farray

• darray

where iarray, farray, and darray represents a fixed sequence of 4-byte integer numbers, 4-byte real numbers, and 8-byte real numbers respectively. These three types are formed in a type structure according to Figure 27 where the array type is a subtype of the usertypeobject type.

Eight basic operations have currently been defined for arrays at the query language lev-el:

construct(charstring typename, vector settings) -> boolean

initialise(array arr, vector settings) -> boolean

destruct(array arr) -> boolean

name(array arr) -> charstring aname

size(array arr) -> integer asize

ref(array arr, integer index) -> number value

set(array arr, integer index, number value) -> boolean

where name is a stored function representing an optional name attribute and size is a foreign function that represents the length of the array. The construct and destruct op-erations are general constructor and destructor procedures for implementation of tai-lored creation and deletion operations. The construct operator takes a name of a type and a vector of initial settings and creates an object of that type. It then applies the ini-tialise operator, overloaded for different types, that defines specific initialisations for the created object. Finally, the construct operation returns the newly created and initial-ised object.

Figure 27. The type hierarchy for numerical arrays where the array type is a subtype of usertypeobject.

The current implementations of the initialise and the construct functions for objects are as follows:

create function initialise( object obj, type tp,

vector settings) -> object as begin

/* type-specific initialisations */

result obj end;

create function initialise( array arr, type arrtype, vector settings) -> array as begin

allocate(arr, arrtype, settings);

set name(arr) = vref(settings, 1);

result arr end;

create function construct( charstring typename,

vector settings) -> object obj as begin

declare type tp;

set tp = typenamed(typename);

set obj = new_object(tp);

initialise(obj, tp, settings);

result obj end;

As an example of how tailored constructor functionality can be accomplished, the sec-ond initialise function above is designed for the array type by overloading the initialise function for a corresponding array type signature. The values of the size and name

at-iarray farray darray

iarray

tributes are extracted from the settings vector and initiated by the allocate and set oper-ations. More specifically, the allocate operation is responsible for allocating a literal ar-ray object (a basic data storage structure) of a specified size and associating it with an object of type array. The constructor and destructor operations were mainly introduced to provide tailored creation and deletion for array types and subsequent extensions to matrices and domain-specific types presented in Section 5.3.3. Parallel work on AMOS has generalised the applicability of constructors to any user-defined type [93]. Similar functionality is specified in the SQL3 proposal. The ref and the set operator are foreign functions for retrieving and updating single array values. Additional retrieval and up-date operators are required as well, but these are still provided by the FEA application.

This includes operators for accessing subparts of an array and similar operations on oth-er aggregation data structures based on arrays. Opoth-erations of this type are not normally supported in array representations for databases but are of great importance to engineer-ing applications and are supplied in “tool-kits” such as MATLAB and others. As we have seen in the previous section, this basic array type structure is further extended by adding subtypes of arrays and by adding operators that are required in specific applica-tions.

Below the query language level, the new array data source is defined by a literal array data type. The array data type is complemented by a set of basic operations that are ac-cessible from both LISP and C. Foreign functions, defined at the query language level, are defined by means of operations at this level that operate on literal arrays. Since it is possible to dereference the array data structure from the literal object, it is possible to implement critical and kernel array operations as efficiently as in conventional pro-gramming languages.

Hence, as described in Section 4.4, C record templates for the literal array data types are defined for iarray, farray, and darray. The techniques for defining and registering new storage data types that were described in the same section have been used. An example of the record template for the farray data type (float array) is provided below.

struct farray { objtags tags;

char filler1[2];

int len;

char filler2[4];

char cont[sizeof(float)];};

The objtags include type information and reference counters for storage management, len is the size of the array, and the last. char declaration is a pointer to the data seg-ment. The basic data structure is of the same type as arrays in C and Fortran.

There are currently five basic operations defined on literal arrays which are implement-ed in C. Here, they are exemplifiimplement-ed for float arrays:

mkfarrayfn(arraysize) creates a literal float array object where the number of

ele-ments is determined by arraysize.

farraysizefn(array) retrieves the size of the literal array object.

farrayreffn(array, arrayel) retrieves the value of the float, in the literal array object, at the position determined by index.

setfarrayfn(array, arrayel, elvalue) assigns the float value to the float, in the literal array object, at the position determined by index

These operations are all implemented in C. They are also registered to LISP using a standardised naming convention and are accessed as mkfarray, farraysize, farrayref, and setfarray.

There is an additional operation:

floatcont(farray-oid) that returns the dereferenced array of floats (the basic array data storage structure) of the literal array object.

The floatcont operation is a C-macro and is a low-level operation that should only be used within other operations to be able operate on and index the basic array data struc-ture.

The numerical array data structures can be further developed to include dynamic arrays and maybe multi-dimensional (at least two-dimensional) arrays. For numerical analysis applications, you usually use some form of tailored two-dimensional representation of arrays that takes advantage of domain-specific characteristics to provide more compact representations of multi-dimensional arrays, e.g. skyline matrix representations. This is one major reason behind the decision to use this array representation to implement the matrix representation as discussed in the previous section. The performance for the cur-rent array representation is discussed in a special section, Section 5.4, that treats related performance issues as well.