• No results found

5.4 CQ Management

5.4.1 CQ Compilation

1: function PARALLEL(n,tmpl, params)

2: d← data f low(n, n)

3: for i ← 0, n − 1 do . Create branch i

4: APPLY_TEMPLATE(tmpl,params,{i},{i},d,nil)

5: end for

6: return d

7: end function

Algorithm 6:Parallel template constructor

• input_points(d) is a function that returns all nodes that are input points of the data flow graph d.

• thecopy(nd,map) is a function that returns the copy of the node nd as spec-ified by the mapping map.

1: function PIPE(d1, d2)

2: d← data f low(arity(d1), width(d2))

3: map← nil

4: for all nd ∈ copy_nodes(d1, map) do

5: f low(nd) ← d

6: end for

7: for all nd ∈ copy_nodes(d2, map) do

8: f low(nd) ← d

9: end for

. Connect copies of input points of d2 to the copies of output points of d1

10: for all nd2 ∈ input_points(d2) do

11: v← inputs(nd2)

12: for i ← 0, count(v) − 1 do

13: nd1 ← get_out put(d1, v[i])

14: cnd1 ← thecopy(nd1, map)

15: cnd2 ← thecopy(nd2, map)

16: inputs(cnd2)[i] ← cnd1

17: out puts(cnd1) ← out puts(cnd1) + cnd2

18: end for

19: end for

20: return d

21: end function Algorithm 7:Pipe

1. Determine node dependencies. The dependencies between nodes specified through inputs and outputs functions are analyzed and stored as a value of the level attribute of node.

2. Bind the nodes that are input points of the data flow graph to the input streams of the query. The binding uses the input stream numbers in the inputsfunction.

3. For each node, starting from the input points of the graph and following an increasing order of the level function values, compile the node. The compilation of a node is given below.

4. Connect the SQFs that are output points of the data flow graph to the result streams of the CQ.

The Algorithm 9 uses the following functions and procedures:

• max_level(d) returns the maximum among the levels of the nodes in a data flow graph d;

• result_streams(n) is a multi-value function that stores the result stream ob-jects created for the node n

1: functionCOPY_NODES(d, map)

2: for all s ∈ sites(d) do

3: news← copy_site(s, map)

4: end for

5: for all nd ∈ nodes(d) do

6: newnd← copy_node(nd, map)

7: assigned_to(newnd) ← thecopy(assigned_to(nd), map)

8: return newnd

9: end for

10: end function

Algorithm 8:Copying of a data flow

The compilation of a node presented by the pseudocode in Algorithm 10 has the following steps:

1. Perform a type check of the parameters of the SQF in order to derive the type of the result stream (line 2). The stream parameters are specified in inputs function, while non-stream parameters are given in the params at-tribute of the node. The type check requires that the input streams of the node are already bound when the node is processed. We ensure this by start-ing with the data flow input points with level value equal to 0 and followstart-ing the SQF dependencies, i.e. increasing level values. The compiler infers the type of the result stream from the SQF definition. The type is needed by the result stream constructor.

2. Group the consumers of the compiled node according to their site assign-ments. For each group create a logical stream connecting the current node to the group of consumers. In this way the consumers in the group share the input stream produced by the node.

3. If both the node and the group of its consumers are assigned to the same site, the logical stream is implemented by a stream object with a main-memory stream interface assigned to the same site (lines 8-14).

If the group of consumers is assigned on a different logical site than the current node, the logical stream is implemented by a pair of dual stream ob-jects using inter-GSDM stream interface (lines 15-24). One of the streams (line 16), assigned to the current node’s site, sends data from the current node to the site where the consumers group is assigned. The second stream object (line 17) is assigned to the consumers’ site and receives data from the current node.

Algorithm 10 uses the following functions and procedures:

• derive_result_type(n) checks the types of the SQF parameters and returns the type of the result stream using the definition of the SQF annotating the node.

1: function COMPILEDFG(d)

2: cq← plan_o f (d)

3: LEVELS(d)

. Connect all input points to the input streams

4: BIND_INPUTS(d,inputs(cq))

5: success← true

6: for i ← 0, max_level(d) do

7: for all n ∈ nodes(d) do

8: if level(n) = i ∧ success then

9: success← compile(n)

10: end if

11: end for

12: end for

13: if success then

. Connect the output point to the CQ result streams

14: n← get_out put(d, 0)

15: for all outs ∈ out puts(cq) do

16: result_streams(n) ← result_streams(n) + outs

17: end for

18: end if

19: return success

20: end function

Algorithm 9:Algorithm for compilation of a data flow graph (DFG)

• consumer_sites(n) is a derived function that returns all objects of type Site to which some node is assigned that is a consumer of the argument node n;

• nextstreamid() is the system generator of stream identifiers;

• mm_stream(name,stype,st) is a stream constructor using the default para-meters for streams in main memory inside a working node. The parapara-meters specify the stream name, type, and a Site object representing the site where the stream object will be created;

• inGSDM_stream(name,stype,st,src) and outGSDM_stream(name,stype,st,-dest)are stream constructors using the default parameters for inter-GSDM streams. The first three parameters are as described above. The last para-meter is a source address for an input stream and a destination address for an output stream object. Notice, that the dual streams have the same name but are assigned to different sites;

• subst(v,n,s) is a procedure that substitutes the object n with the object s in the vector v. It is used to bind a node to an input stream produced by the node producer by the call subst(inputs(n), nprod, s2).

1: functionCOMPILE(n)

2: stype← derive_result_type(n)

3: if stype then

4: st prod← assigned_to(n)

5: for all st ∈ consumer_sites(n) do

6: name← nextstreamid()

7: if st = st prod then . Create stream in main memory

8: s1 ← mm_stream(name, stype, st prod)

9: result_streams(n) ← result_streams(n) + s1

10: for all n1 ∈ out puts(n) do

11: if st = assigned_to(n1) then

12: SUBST(inputs(n1),n,s1)

13: end if

14: end for

15: else . Create inter-GSDM stream of two dual stream objects

16: s1 ← outGSDM_stream(name, stype, st prod, name(st))

17: s2 ← inGSDM_stream(name, stype, st, name(st prod))

18: result_streams(n) ← result_streams(n) + s1

19: for all n1 ∈ out puts(n) do

20: if st = assigned_to(n1) then

21: SUBST(inputs(n1),n,s2)

22: end if

23: end for

24: end if

25: end for

26: return true

27: else

28: return f alse

29: end if

30: end function

Algorithm 10:Node Compilation

• result_streams(n) is a multi-value function that stores the result streams of a node n. It is used later on by the installation. The function contains a stream object for each group of consumers on different site.