• No results found

UPMAIL Technical Report No.  Mars ,  

N/A
N/A
Protected

Academic year: 2022

Share "UPMAIL Technical Report No.  Mars ,  "

Copied!
124
0
0

Loading.... (view fulltext now)

Full text

(1)

The Luther WAM Emulator

Version 1.1 (Parallel) Johan Bevemyr

Computing Science Dept., Uppsala University Box 311, S-751 05 Uppsala, Sweden

Electronic mail: bevemyr@csd.uu.se

Abstract. This manual describes the implementation of the Luther WAM Emulator, both the sequential version and the recursion-parallel version. The Luther WAM Emulator is a portable C implementation of Warren's abstract machine with extensions for recursion-parallel execution.

It is designed to be easily modi ed and understood, but at the same time reasonably ecient.

Implementational detailes are given for the instruction set as well as the storage model and the runtime system. We also describe how to proceed when extending the emulator in several di erent directions.

(2)
(3)

1 Introduction

This document describes the internal structure and implementation of the Luther WAM emula- tor. The intention of this manual is to give the reader enough knowledge about how the emulator works to be able to understand its ner points and to be able to modify it for his or her own needs.

The sequential parts of the Luther WAM emulator is inspired by the SICStus Prolog emulator.

A number of things present in the SICStus emulator has been left out, e.g., the shallow backtracking scheme and the support for native code.

This version of the emulator is the rst step towards a recursion-parallel implementation. Since we plan to run the emulator on various machines and architectures we have tried to take this into account when constructing the emulator. The current emulator is capable of recursion-parallel execution on shared memory mashines, such as the Symmetry Sequent and the Sun Galaxy. The sequential emulator has also been tested on the HP 9000/700, the Macintosh, and the Sun 4 family.

It has also been modi ed for executing bounded quanti cations on the Connection Machine model 200.

The manual is split into a number of sections. We start o by giving an overview of the storage model|the heap, the stack, the trail and various other data areas. Thereafter we have a short outline of the structure of the emulator, followed by a detailed description of each WAM instruction. The section following contains the macros used when describing the implementation of the instruction set. The WAM code format in its three forms are described and the builtin predicates are listed. Finally a few hints are given as how to extend the emulator in several di erent directions.

This manual is based on SICStus Prolog Internals Manual.

1.1 Notational Conventions

The semantics of the various instructions will be given in a style similar to the C programming language. The descriptions of the instructions will closely follow the actual emulator code. However, a number of optimizations have been omitted from this exposition in order to (we hope) make it clearer.

Whenever we give C code for an algorithm that behaves di erently in the parallel version and in the sequential version, we use theifdef notation from C. An example is given below.

(4)

#ifdef PARALLEL

<C code used in the parallel version>

#else

<C code used in the sequential version>

#endif /* PARALLEL */

We shall use the following notation for address arithmetic:

 The expression X++denotes postincrement with respect to the growth direction of the relevant memory area.

 The expression --X denotes predecrement with respect to the growth direction of the relevant memory area.

 The expression *X denotes a contents-of operation.

 The expression &X denotes an address-of operation.

Macros are given names starting with a capital letter.

(5)

2 Storage Model

The abstract machine described herein is a modi ed Warren abstract machine. Our model addresses certain issues not treated in the original WAM, e.g. arithmetic, cut, generic objects and garbage collection. It inherits all major properties of Warren's model, such as structure copying, separate choicepoints and environments, and tagged pointers.

The sequential storage model is similar to the original WAM. We have three stacks, the heap, the stack, and the trail.

The storage model of the parallel engine is slightly di erent. The parallel engine consists of a set of sequential engines (workers), each with its own heap, stack, and trail. All workers have shared access to all heaps, with the right to bind any variable on any heap. The stack and trail is only accessible to the worker who manage it.

The parallel engine also facilitate shared access, between worker, to common areas such as the atom table and the data base.

2.1 Terms and their Representation

The main types of terms are variables, constants, compound terms and generic objects.

A temporary variable is a variable that ful lls all of the following conditions.

 The rst occurrence of the variable is in the head, in a structure, or in the last goal's argument.

 The variable does not occur in two distinct goals.

 If the variable occurs in the head, it does not occur in any goal other than the rst.

For example, in:

p([X|Xs],Z) :- q(Xs,Rs), r(U,U), s(Rs,[X],Z).

X,Rs, and Zare permanent, andUand Xs are temporary.

In essence, the value of a temporary variable does not have to be preserved across procedure calls, while the value of a permanent variable must be preserved across some procecure calls.

(6)

Clauses that contain permanent variables will store their values in an environment at run time.

In WAM notation, registers that contain temporary variables are denoted by Xn, and permanent variable registers by Yn.

A variable can be unbound, conditionally bound or unconditionally bound to another term. A process known as dereferencing follows a chain of bound variables until an unbound variable or a non-variable is encountered. A binding is conditional if there can be another execution path which may bind the variable to something else. Conditional bindings can be undone and are recorded on the trail stack.

A constant is an atom or a number. A compound term is composed of a functor and some arguments (arbitrary terms). A list is a special case of compound term with the functor ./2. A generic object is composed of a method table and a data array. The method table contains functions that perform operations on the generic object.

Terms are represented as tagged pointers to objects. A tagged pointer has the following parts (on a 32 bit architecture):

111 111111111111111111111111111 11

=== =========================== ==

tag value gc

The tag eld distinguishes the type of the term, the value eld is usually a points to an object, and the gc eld is used by the garbage collector.

At some systems the memory allocated by malloc starts at a high address and the tag may thus interfere with the address. At those systems a constant (MALLOCBASE) is added to the pointer when converting from a tagged to an untagged object. In this implementation stack variables and heap variables have di erent tags, instead of a commonREF tag. The reason behind this is that if the same tag is used for both stack and heap variables then the stack has to be physically allocated at a higher address in the memory. Using di erent tags for heap and stack variables allow the implementation to allocate the stack independently of the heap.

It would be more ecient to store the tag in the lower part of a TAGGED pointer. The reason is that untagging and dereferening of a pointer can sometimes be compiled into a singel machine instruction. For example, suppose stack variables were tagged with 1, then untagging a stack variable could be done by substracting 1. Further, suppose X is such a variable, accessing its value would require the following operation *(X-1). This can be compiled into move.l A2, -1(A1), if X is stored in register A1. If the tag is stored in the topmost bits in a word, then three machine instructions are needed to perform the same operation.

(7)

Why do we not store the tag in the lower bits? The reason is that there is only room for two tag bits. This can be worked around by using subtags, but this complicates the implementation and makes it harder to experiment with di erent term representations.

Type of term Tag Value

============ === =====

Heap Variable (HVA) 0 term pointer

#ifdef CONSTR

Constrained Variable (CVA) 1 term pointer

#endif

Stack Variable (SVA) 2 term pointer Small Integer (NUM) 3 integer value Floating point (FLT) 4 object pointer

Atom (ATM) 5 table index

List (LST) 6 object pointer

Compound Term (STR) 7 object pointer Generic Object (GEN) 8 object pointer

#ifdef PARALLEL && UNBOUND

Unbound Variable (UVA) 9 timestamp

#endif

Functors are represented as tagged pointers to entries in the functor table. Functors are uniquely stored in the table. The name and the arity of a functor is accessed using the macros FunctorToAtom(Term) and ArityOf(Term). Functors are constructed using the macro

StoreFunctor(Name,Arity). The macros are de ned in term.h.

2.2 Parallel Engine Speci cs

In WAM the postion of a variable in the heap is used for determining whether binding it has to be recorded on the trail. The criteria for recording a variable binding is that the binding has to be undone on backtracking. This is determined by comparing a variables address with the heap top of the last choicepoint. If the variable reside in the area below the saved heap top pointer then it was created before the choicepoint and any modi cation to it has to be recorded. Now, in a parallel setting with multiple heaps it is no longer possible to use a single saved heap top pointer for determinig whether a variable was created before a given choicepoint or not. We solve this problem by extending all unbound variables with a timestamp. We have experimented with two methods for doing this.

1. The representation of variables is extended with an extra word in which the timestamp is stored.

(8)

2. A special tag is used for representing unbound variables and the value eld is used for storing the timestamp.

The rst solution result in that twice as much heap memory is used by a program. In order for the second solution to work the mechanism for dereferencing variables has to be modi ed, this modi cation result in an overall execution overhead of approximately 7\%. Using a speical tag for unbound variables require the use of 4 tag bits instead of 3.

Both solutions using timestamp require some slight mod cation to the trailing test and each choicepoint have to be extended with an extra eld.

2.3 Data Areas

The data areas are divided into the static areas, for information which is saved from one query to another, and the dynamic areas, for information which is not needed upon backtracking. The dynamic areas are operated as stacks; the static areas as a memory pool in which objects of arbitrary size can be allocated. The address order of the various areas is not critical, and neither is the growth direction of the stacks.

In contrast to Warrens's model there is no explicit PDL, just the implicit C PDL.

A brief description of each memory area follows, listing for each area the kind of objects that it may contain.

2.3.1 BasicType De nitions

We de ne shorthands for some C declarations as follows:

typedef long s32; /* signed 32 bit */

typedef unsigned long u32; /* unsigned 32 bit */

typedef u32 TAGGED; /* terms */

typedef u32 UNTAGGED; /* terms */

typedef enum { /* false or true */

FALSE = 0, TRUE = 1 } BOOL;

typedef u32 code; /* instructions with arguments */

(9)

2.3.2 TheCode Area

The code area contains WAM code. There may be pointers from the code area into the database area. The following global variables de ne the bounds of the code area. These variables are de ned in a data structure called globalvar. In the parallel version all workers have shared access these variables.

code *code_start, /* low bound of area */

*code_end, /* high bound of area */

*code_current; /* first unused element */

2.3.3 TheStaticarea

The static area contains a variety of objects described below.

Atom De nition

Atoms are stored in a xed size hash table. Each entry in the table is a bucket and the buckets are stored in the atom area. Each bucket contains a TAGGED pointer to the printname of the atom and a pointer to the next atom in the bucket.

typedef struct atom_bucket { TAGGED atom;

struct atom_bucket *next;

} atom_bucket;

The printname and the mode of each atom is stored here. The mode indicates whether the atom has to be quoted etc.

struct atom { TAGGED mode;

char *pname;

Permanent Floats};

Floating point numbers appearing in the compiled code are stored in this area. They cannot be stored on the heap since backtracking would remove them. A oating point number is represented as a boxed object.

struct float { TAGGED start;

double value;

TAGGED end;

};

The start and end objects are tagged as atoms with a ag bit set to TRUE to indicate that they surround a boxed object. The reason for all this is that when we store a oat on the heap there is no room for the gc bits in the machine representation of a

(10)

double. It has to be represented as a bit string. The start and end parts contain a atom tagged object with theboxbit set and thestatic/dynamicbit set depending on whether it appear in compiled code (in the static area) or not, the size of the box, and bits reserved for the garbage collector. The static/dynamic distinction has to be made in order for the garbage collector to know whether it reside in the heap or not.

111 1 1 1111111111111111111111111 11

=== = = ========================= ==

ATM | | size of box gc

/ \

box static/dynamic

Predicate De nitions

Predicate de nition structures are stored in this area. There are three kinds of predi- cates: compiled clauses, C predicates, and interpreted predicates.

typedef struct definition {

enter_instruction enter_instruction;

TAGGED name;

TAGGED module;

union definfo code;

struct definition *next;

} definition;

union definfo {

code *incoreinfo; /* pointer to start of WAM code */

BOOL (*cinfo)(); /* pointer to C function */

in_switch *indexinfo; /* pointer to indexing information for interpreted predicates */

};

typedef enum {

ENTER_INTERPRETED, ENTER_EMULATED, ENTER_C,

ENTER_UNDEFINED } enter_instruction;

2.3.4 TheHeap

This area is sometimes called the global stack. It consists of variables, constants, lists, structures, and generic objects. Each object contains a sequence of words representing terms. It growth towards increasing addresses.

The bounds of the heap is de ned by the current worker.

(11)

It may contain the objects described below.

oats Float objects residing in the heap are represented in the same way as oat objects residing in the static area (see oat above).

lists They consist of the two arguments of the list (car and cdr).

struct list { TAGGED car;

TAGGED cdr;

};

typedef struct list *list;

structures They consist of the functor followed by the arguments of the structure. The represen- tation of the functor is described above.

struct structure { TAGGED functor;

TAGGED arg[ARITY];

};

typedef struct structure *structure;

generic objects

They consist of a pointer to a method table containing methods for operations on the object, followed by object data. The sub-tag of the generic object is the pointer to the method table. All objects of the same kind has the same method table pointer.

struct generic {

struct method *method;

TAGGED data[ANY];

};

typedef struct generic *generic;

typedef struct method {

int (*size)(); /* size of object */

BOOL (*unify)(); /* unifies two objects */

void (*print)(); /* prints object */

SIZE (*compare)(); /* compare two objects */

void (*undo)(); /* undo on backtracking */

void (*gc)(); /* used by the gc */

TAGGED (*deref)(); /* used when dereferencing */

} method;

typedef enum { LESS = -1, EQUAL = 0, GREATER = 1

(12)

} SIZE;

The undo method is called on backtracking, the unify method is called when trying to unify the generic objects with any other object. The print method is called when printing the object andcompareis called when comparing two generic objects with the same tag. Thegc method is called when doing garbage collection.

variables

They consist of just one word: the valuecell, where the binding is stored. An unbound variable is represented as if bound to itself unless a special tag is used for unbound variables (in the parallel version only). Then unbound variables are represented by UVA tagged objects in the parallel version, the value eld is used to store the creation time (timestamp) of the object.

#ifdef PARALLEL && TIMESTAMP struct variable {

TAGGED val;

u32 timestamp;

};#else

typedef TAGGED variable;

#endif TIMESTAMP

Constrained variables occupy two words, one to store the binding and one to store the constrains (a pointer to a list of frozen goals). Bindings of constrained variables are always trailed and a counterwake_counteris incremented (if it is the rst constrained variable that is bound after a call then an event (EVENT_WAKE) is signaled). At the next call the waked goals are found by searching the trail forCVAentries. Thewake_counter

is used for restricting the search.

struct constrained_var { TAGGED val;

TAGGED constr;

};

A heap variable may be bound to any term except to a stack variable. A process known as globalizing creates a new heap variable and binds a stack variable to it, ensuring that the stack variable henceforth dereferences to the heap.

This area grows during forward execution and contracts on backtracking.

2.3.5 TheStack

This area contains choicepoints and environments. A choicepoint is established when entering a procedure Q with arity n which has more than one clause that can match the goal. When no alternatives remain, the choicepoint is discarded. An environment represents a list of goals still to

(13)

be executed. It consists of a number of variables which the compiler has classi ed as permanent, occurring in the body of a clause, plus a pointer into the body of a continuation clause and its environment.

This area grows at recursive calls and contracts on determinate calls and on backtracking. It grows towards increasing addresses.

The bounds of the stack are de ned by the current worker.

A choicepoint consists of a snapshot of the crucial abstract machine registers:

typedef struct choicepoint {

TAGGED *trail_top; /* (TR) top of trail stack */

TAGGED *global_top; /* (H) top of global stack */

struct choicepoint *last_choice; /* (B) previous choice pt. */

struct environment *cont_env; /* (CE) cont. environment */

code *next_instr; /* (CP) cont. code */

code *next_clause; /* (BP) next clause to try */

indx arity; /* (n) number of saved aregs */

#ifdef PARALLEL && TIMESTAMP

s32 timestamp /* current time */

#endif

TAGGED areg[ANY]; /* (Ai) saved argument reg. */

} choicepoint;

An environment is represented by the following record

typedef struct environment {

struct environment *cont_env; /* (CE) cont. environment */

code *next_instr; /* (CP) cont. code */

TAGGED yreg[ANY];

} environment;

The permanent variables are often abbreviated as Y0, Y1, ... A permanent variable Yi may be bound to any term, except to another variable on the environment stack if the other variable is located at a higher address.

At procedure calls, the current environment's active size is found at an o set from the continu- ation pointer. This information is needed by certain instructions and is denoted FrameSize(L).

(14)

2.3.6 TheTrail

The main use of this area is to record conditional variable bindings. A variable is conditionally bound i the variable is older than the youngest choicepoint. Upon backtracking, entries are simply popped o the trail stack and the bound variables are reset to unbound.

If, during backtracking a generic object is encountered on the trail stack, its undo method is called. It can be used to ensure that a side-e ect that had e ect over a nitely failed subcomputation is undone.

The value trail is interleaved with the normal trail. A value trail entry is identi ed by a list tag.

The following byte is the old value of the word pointed to by the list tagged pointer.

In the parallel version the trail has been extended to handel two more situations. NUM tagged entries are used by parallel workers for indicating that a new parallel computation has started. On backtracking each parallel worker unwinds the trail until a NUM tagged entry is found. The value eld is used for storing the heap top before the parallel computation started. On backtracking the heap is reset to this value. On the sequential trail a STR tagged objects are used for indicating that a parallel computation have taken place. When the sequential worker nd a STR tagged entry it force all parallel workers to unwind their trails until they nd a NUM tagged entry.

The trail stack grows towards increasing addresses and its bounds are de ned by the current worker.

2.4 WAM Registers

All global WAM registers are stored in a worker structure. This makes it possible to have several concurrent workers going at the same time. It also makes for a cleaner interface. On the other hand we might loose on the eciency side, but it turned out that no performance decrease can be observed if we cache relevant variables in registers inside procedure calls e.g. the engine.

The worker structure looks like this.

(15)

typedef struct worker {

code *pc; /* program counter */

code *next_instr; /* continuation pointer */

choicepoint *choice; /* last choicepoint */

choicepoint *choice0; /* cut pointer */

environment *frame; /* environment pointer */

u32 event_flag; /* event flag */

int lut_trace; /* trace mode flag */

#ifdef PARALLEL && TIMESTAMP

s32 time; /* current time */

s32 uncond; /* saved timestamp */

#else

TAGGED *uncond; /* first uncond variable */

#endif

TAGGED *s; /* structure arg. reg. */

TAGGED *regs; /* argument and X reg. */

TAGGED *trail_top,

*trail_start,

*trail_end; /* Trail stack bounds */

TAGGED *heap_margin,

*heap_top,

*heap_start,

*heap_end; /* Heap bounds */

TAGGED *stack_start,

*stack_end; /* Stack bounds */

statistics *stats; /* statistics structure */

globalvar *global; /* global variables */

gc_info_t gc_info; /* info needed by GC */

u32 wake_count; /* nr bound CVAs to wake */

#ifdef PARALLEL

int pid; /* worker number */

s32 *level; /* current iteration */

s32 direction; /* direction of iteration */

#endif } worker;

All elds are not necessary kept up to date at all times in the emulator, their values might be cached in registers. When calling c predicates or inlineable predicates all cached registers are ushed with exception of the pcregister which is never used outside the emulator loop.

In addition to the worker structure there are variables which are shared between di erent workers (in the sequential version there is only one worker). These variables are stored in the globalvar structure.

(16)

typedef struct globalvar {

heap *atom_start, /* Atom table bounds */

*atom_end,

*atom_current,

*atomoffset;

heap *patch_start, /* Area used by parser */

*patch_current,

*patch_end;

code *code_start, /* Code area bounds */

*code_end,

*code_current;

struct definition **predtable; /* Predicate database */

atom_bucket **atomtable; /* Atom database */

s32 active_workers; /* Number of active workers*/

prologflag flags; /* Execution flags */

struct definition *interrupt_code;/* Interrupt handler */

#ifdef PARALLEL

TAGGED *global_regs; /* Global registers */

worker_command parallel_start; /* worke activation record */

BOOL global_fail; /* global fail flag */

s32 scheduling; /* static/dynamic sched. */

s32 scheduling_mode; /* vertical/horizontal shed*/

s32 sched_level; /* dynamic shed. queue */

double *reduction_results; /* parallel reduction res. */

TAGGED *collect; /* compiled reduc. result. */

s32 debugsig; /* WAM-level debug flag */

#endif };

2.5 The Database

There are three kinds of predicates which has to be stored in the database|emulated predicates (WAM code), C predicates (C code), and interpreted predicates (clauses). The rst two predicate types are described elsewhere and will only brie y be described here.

All predicates in the emulator are associated with a de nition structure which contains informa- tion about its type, its name (and arity), which module it belongs to, and a pointer to the code for the predicate (thedefinfo eld). The de nfo eld of an emulated predicate contains a pointer to the WAM code that de nes the predicate, for a C predicate there is a pointer to the C function that de nes it, and for an interpreted predicate we have a pointer to code that does the head uni cation for clause/2 (a dynamic, or interpreted predicate, can be seen as a number of clause/2 assertions).

The code for an interpreted predicate is executed by a specialized version of the WAM emulator loop calledmatch_term. The predicate code is originally constructed, when a clause is asserted, by

(17)

a one pass compiler written in C (compile_clause). The instruction set is designed to do depth rst uni cation. A complete description of the method can be found in \Fast Head Uni cation"

by Johan Bevemyr and Thomas Lindgren (UPMAIL Technical Report).

2.5.1 InstructionSetfor Assert

For a complete description see the chapter \Instruction Set".

a_get_x_variable <X reg> <X reg>

a_get_x_value <X reg> <X reg>

a_get_constant <Constant> <Xreg>

a_read_list_top <X reg> <K>

a_read_struct_top <Functor> <X reg> <K>

a_read_list <K>

a_read_struct <Functor> <K>

a_read_list_tail <K>

a_read_struct_tail <Functor> <K>

a_unify_constant_up <Constant>

a_unify_x_variable_up <X reg>

a_unify_x_value_up <X reg>

a_unify_x_local_value_up <X reg>

a_unify_void_up a_unify_void

a_unify_x_variable <X reg>

a_unify_x_value <X reg>

a_unify_x_local_value <X reg>

a_unify_constant <Constant>

(18)
(19)

3 Emulator Overview

3.1 Usage

The emulator is started either by typinglutorluther. Lut is a shell script which in turn calls luther with the minimal arguments needed to get a working prolog. The following arguments are accepted by luther.

-debug Starts emulator with WAM level debugger in action.

-list_all

Tells the emulator to list all predicates in the database.

-list Tells the emulator to list all prolog de ned predicates in the database.

-memory <bytes>

Tells the emulator how much memory it is allowed to allocate for the sequential worker.

The default is 3072 kbytes.

-wmemory <bytes>

Tells the emulator how much memory each parallel worker is allowed to allocate. The default is 1536 kbytes.

-opcode Tells the emulator to output opcode tables for WAM instructions and quickload in- structions, and the inline predicate table.

-ql <filename>

(quick load) Tells the emulator to read predicate de nitions from<filename>in quick- load format.

-verbose Tells the emulator to notify the user when a predicate is added to the database.

-w <N> Tells the emulator to use N parallel workers. The default is to use one parallel worker.

-wam <filename>

(wam load) Tells the emulator to read predicate de nitions from <filename>in wam code format.

3.2 Booting

Booting is done in two stages. First the internal of the machine is initialized by calling

initialize (initial.c). Initialize in turn calls init_once, init once allocates memory areas and

(20)

initializes the database with the builtin predicates. Each worker is initialized by init_worker. In the second stage, code is read from the boot les given by the -wamor-ql argument switch, e.g.

luther -ql boot1.ql -wam boot2.ql. When all boot les has been processed the WAM is started by calling the predicate start/0.

3.3 Internal Overview

The instruction set in the emulator is based on the SICStus WAM emulator with extensions for recursion-parallel execution (see Bevemyr, Lindgren and Millroth, \Reform Prolog: The Language and its Implementation", in ICLP'93, MIT Press, 1993).

The kernel of the emulator consists of two switch statements, one for read mode and one for write mode. This eliminates the need to check whether we are in read or write mode. We know that once we have entered write mode we will remain there while we are executing unify instructions.

If the case statements inside the switch are arranged in a proper way (depending on how advanced your C compiler is) the following code will be compiled to an ecient table lookup and a jump instruction. The table contains a mapping from opcode to memory address.

/***********************************************************************

* Main emulator loop. Instruction is decoded then switched on.

*/

instructions:

switch(Get_Op(pc)) {case SWITCH_ON_TERM:

switch_on_term:

...}

write_instructions:

switch(Get_Op(instruction)) {case SWITCH_ON_TERM:

goto switch_on_term;

...}

A more ecient method for instruction decoding is to use threaded code. The idea is to let the opcode point directly at the memory address for the instruction. This way the cost of doing the

(21)

table lookup is eliminated. On the other hand, since we do not have two lookup tables, we cannot eliminate the write mode ag. The overall result is a speed improvement of approximately 25%.

goto Get_Op(pc); /* initially */

switch_on_term:

goto Get_Op(pc);...

switch_on_structure:

unify_constant:...

if(write_mode) { } else {...

}...

...

The drawback is that most C compilers do not treat labels as rst class objects|it is not in the ANSI standard. However, GCC has this feature for the purpose of supporting ecient emulation.

The emulator can be compiled to use threaded code if the compiler allow labels as rst class objects.

3.3.1 TheParallel Machine

Stack 1

Trail 1 Worker 1

E

B0 HT(1) Time

B

TR P ...

Data Base

...

Heap 1 Heap n

...

Worker n Stack n

Trail n Shared Memory

E B0 B

TR P ...

Time HT(n)

Global Arguments

The parallel machinery consists of a set of workers numbered 0,1,:::,$n-1$, one per processor.

Each worker is implemented as a separate process running a WAM-based Prolog engine with ex- tensions to support parallel execution. The execution of a program alternates between two modes:

sequential execution and parallel execution. A phase of sequential execution is referred to as a

(22)

sequential phase and a phase of parallel execution as a parallel phase. One worker is responsible for sequential execution (the sequential worker). During sequential execution all other workers (the parallel workers) are idle, during parallel execution the sequential worker is idle. The terms created by the sequential worker must be accessible to the parallel workers during parallel phases.

This includes variables, numbers, structures, and lists. Terms created by the parallel workers must likewise be accessible to other workers during the parallel phase, and to the sequential worker dur- ing the next sequential phase. Sharing of data is implemented by letting all workers have restricted access to each others heaps.

It is the sequential worker's responsibility to initiate parallel execution and to resume sequential execution when the parallel workers have executed all recursion levels. The sequential worker sets up the arguments for each recursion level in its own argument registers, which are globally accessible.

3.4 The WAM-code Debugger

It is possible to follow the execution of the emulator with the WAM level debugger. Spy points and break points may be set, clauses skipped, and variables examined. A call towamdebug/0starts the debugger. The debugger can also be invoked by interrupting the execution (using CTRL-C).

The following commands are available:

+ -- spy on instruction ++ <P> -- spy on instruction P +p <P> <A> -- spy on predicate P/A - -- remove spy on instruction -- <P> -- remove spy on instruction P -p <P> <A> -- remove spy on predicate P/A

a -- abort

b -- breakpoint on this instruction

c -- creep

e -- toggle display env info db -- display break points ds -- display spy points

dt -- toggle display arguments

da <Nr> -- display a register nr Nr in choicepoint df -- toggle display fail

dx <Nr> -- display x register nr Nr dy <Nr> -- display y register nr Nr dp -- display as prolog terms f -- toggle break at fail

g <Nr> -- force garbage collection (Nr live X)

l -- leap

n -- turn off debugger

(23)

o <file> -- redirect error info to <file>

py -- print y register

pa -- print a register in choicepoint px <Nr> -- print x register to reg.nr Nr

q -- quit

r -- remove breakpoint at this instruction s -- skip (break, leap)

t -- trace (leap and print debug info.) tc -- trace (leap and print calls)

When examining variables and registers it is important that they are initialized or set toNULL, there is otherwise a good possibility of a fault occurring in the display function.

If you 'skip' a call or execute to a clause that fails you will lose control of the trace.

3.5 Handling Events

The emulator contains an mechanism for handling asynchronous events such as signal handlers, garbage collection, and waking frozen goals. Events are signaled by setting the appropriate bits in the event ag. This ag is checked at each call (when executing the call and execute instructions).

For example, if the emulator receives an interrupt signal, then the EVENT_INTERRUPTbit is set in the event ag of the worker. At the next call the event is handled by calling the Prolog predicate interrupt handler/0. Likewise, an event is signaled when constrained variables are bound, and the frozen goals are executed at the next call. Interrupts are only handled at calls (when thecallor

execute instruction is executed). Interrupts are not handled when builtin predicates are executed (e.g.,is/2 and !).

d(a) :- !, ...

d(b) :- ...

p(X) :- freeze(X,X=b),d(X).

This program would fail since X=b is evaluated after the cut. The programmer can still force waking ofX=bbefore the cut by inserting a call to a predicate (e.g.,true/0).

(24)

3.5.1 Waking FrozenGoals

We want to interrupt the current execution and execute a frozen goal. When the goal succeeds we want to continue executing the interrupted goal. Clearly, we can only interrupt the execution when the WAM is in a controlled state, remember that we must be able to handle garbage collection during the execution of the waked goal. One of the points at which the WAM is in a controlled state is when a call (or execute) instruction is about to be executed. At this point it is known which X and Y registers are used, and the machine is not in the middle of building a structure on the heap (this could be the case if we allowed interrupts at each instruction).

Suppose the call p/3 is about to be executed and a frozen goal s(b,A) has been waken. We would like to rst execute s(b,A) and then p/3. This can be done by executing the conjunction of s(b,A) and p/3 using a meta interpreter. Instead of calling p/3, we construct the structure p(X0,X1,X2)) on the heap and call the built in predicate ','/2 with X0 pointing to the structure s(b,A) and X1 pointing to q(X0,X1,X2). There is a slight complication when the interrupted predicate belong to another module than the current one. In that case we have to wrap the current call in a call to SYSCALL/1 (actually we would like to do module:goal but our module system is currently to primitive for this).

The code for doing all this is:

case EVENT_INTERRUPT:

{ register TAGGED current_goal;

register int arity, i;

Make_STR(w->heap_top,current_goal,def->name);

for(i=0, arity = ArityOf(def->name) ; i != arity ; i++) { if(IsSVA(X(i)))

{ WriteLocalValue(w->heap_top,X(i));

else}

{ PushOnHeap(w->heap_top,X(i));

} }

if(def->module == module_prolog) register TAGGED sgoal = goal;{

Make_STR(H,goal,functor_syscall);

(25)

PushOnHeap(H,sgoal);

}

X(0) = atom_interrupt_handler;

X(1) = current_goal;

choice0 = w->choice;

pc = interpret_conj->entry_code.incoreinfo; /* call ','/2 */

goto instructions;

}break;

case EVENT_WAKE:

#ifdef CONSTR

/* First, find all woken CVAs on the trail */

{ register TAGGED woken = atom_nil;

register TAGGED *tc;

tc = w->trail_top;

while(w->wake_count > 0) { tc--;

switch(TagOf(*tc)) {case HVA:

#ifdef UNBOUND tc--;

#endif /* UNBOUND */

break;

case CVA:

{ if(IsCVA(*RemoveTag(*tc,CVA))) { register TAGGED ctl;

LastTail(ctl,GetCVAGoals(*RemoveTag(*tc,CVA)));

Bind(ctl,GetCVAGoals(*tc),goto fail;);

else}

{ register TAGGED ctl;

LastTail(ctl,GetCVAGoals(*tc));

Bind(ctl,woken,goto fail;);

woken = GetCVAGoals(*tc);

w->wake_count--;}

(26)

#ifdef UNBOUND tc--;

#endif /* UNBOUND */

case SVA:} case NUM:

break;

case LST:

tc--;

break;

case GEN:

break;

default:

} }

/* Second, if there is anything to wake, save current goal in

* structure on heap

*/

if (woken != atom_nil) { register TAGGED goal;

register int arity, i;

Make_STR(H,goal,def->name);

for(i=0, arity = ArityOf(def->name) ; i != arity ; i++) { if(IsSVA(X(i)))

{ WriteLocalValue(H,X(i));

else}

{ PushOnHeap(H,X(i));

} }

if(def->module == module_prolog) { register TAGGED sgoal = goal;

Make_STR(H,goal,functor_syscall);

PushOnHeap(H,sgoal);

}

X(1) = goal;

(27)

X(0) = woken;

choice0 = w->choice;

pc = interpret_wake->entry_code.incoreinfo; /* call 'wake'/2 */

goto instructions;

} }

#endif /* CONSTR */

break;

(28)
(29)

4 Instruction Set

The descriptions of the instructions will need several support macros which are de ned in chapter

\Emulator Macros". They attempt to hide implementation details which may obscure the logic of the various instructions. They also hide the underlying memory model from the emulator code.

Most of the di erences between the sequential and the parallel implementation are hidden in the macro de nitions. We have chosen to only describe one version of the parallel implementation, the one using an atomic swap instruction for binding variables. We have also experimented with a version that use explicit locking of variables to ensure that only one worker may bind a variable.

In order to support both versions we had to use a slightly di erent set of macros. Due to this, the actual code in engine.c is slightly di erent. The main di erence is that we use a macro Dere- fLockSwitch(Term,Vcode,Ccode) for dereferencing a term, locking it if it is a varaible, and nally execute Vcode if it is dereferenced to a variable and Ccode otherwise.

4.1 Indexing Instructions

switch_on_term(Lvar,Latom,Lnum,Llist,Lstr)

This represents choosing between a set of clauses depending on the tag of the rst argument. It is used to make the rst selection of clauses that can possibly match the call. If X0 is a variable then execution proceeds at label Lvar, if it is an atom then it proceeds at label Latom etc. X0 is left dereferenced.

Deref(X(0),X(0));

switch(TagOf(X(0))) { case HVA:

#ifdef CONSTR case CVA:

#endif /* CONSTR */

case SVA:

Dispatch(pc,Lvar);

goto instructions;

case NUM:

Dispatch(pc,Lnum);

goto instructions;

case FLT:

Dispatch(pc,Lnum);

goto instructions;

(30)

case ATM:

Dispatch(pc,Latom);

goto instructions;

case LST:

Dispatch(pc,Llist);

goto instructions;

case STR:

Dispatch(pc,Lstr);

goto instructions;

case GEN:

/* This could be extended so that switch_on_term has a special field for generic objects, and switch_on_generic could also be done. For now, we use the same as for variables as they are the most general. (might be trouble with cut

optimization) Dispatch(pc,*/ Lvar);

goto instructions;

}

switch_on_constant(i,(C1-L1...Ci-Li),Ldefault)

This represents choosing between a set of i clauses which all have a constant as their rst argument. It is used to make the second selection of clauses that can possibly match the call. If X0 is the constant C1 then execution proceeds at label L1, if it is the constant C2 then execution proceeds at label L2 etc. If none of the constants match X0 then execution continues at the label Ldefault. If there are 5 or more alternatives then binary search is used, otherwise the table is searched linearly.

{ if(i < 5) { /* linear search */

while(i--) {

if(X(0) == Ci) {

Dispatch(pc,Li);

goto instructions;

} else {

Inc_Label(pc);

} }

} else { /* binary search */

register int x, l, r;

l = 0; r = i-1;

(31)

do {x = (l + r) / 2;

if (X(0) < Cx)

r = x - 1;

else if (X(0) > Cx)

l = x + 1;

else {

Dispatch(pc,Lx);

goto instructions;

} while (r >= l);} }

Dispatch(pc,Ldefault);

goto instructions;

}

switch_on_structure(i,(F1-L1...Fi-Li),Ldefault)

This represents choosing between a set of i clauses which all have a structure as their rst argument. It is used to make the second selection of clauses that can possibly match the call. If X0 is the structure F1 then execution proceeds at label L1, if it is the structure F2 then execution proceeds at label L2 etc. If none of the structures match X0 then execution continues at Ldefault. If 5 or more alternative exist then binary search is used, otherwise linear search.

{ if(i < 5) { while(i--) {

if(GetFunctor(X(0)) == Fi) {

Dispatch(pc,Li);

goto instructions;

} else {

Inc_Label(pc);

} } } else {

register int x, l, r;

l = 0; r = i-1;

do {x = (l + r) / 2;

if (X(0) < Fx)

r = x - 1;

else if (X(0) > Fx)

l = x + 1;

else {

Dispatch(pc,Lx);

(32)

goto instructions;

} while (r >= l);} }

Dispatch(pc,Ldefault);

goto instructions;

}

try(L) This is used only in procedures which include multiple alternatives. The st alternative is located at L.

{ register choicepoint *newchoice;

register int i;

newchoice = (choicepoint *) Get_Local_Stack_Top;

#ifdef TIMESTAMP

w->time += TIMEUNIT;

#endif

newchoice->trail_top = w->trail_top;

newchoice->global_top = w->heap_top;

newchoice->last_choice = w->choice;

newchoice->cont_env = w->frame;

newchoice->next_instr = w->next_instr;

newchoice->next_clause = pc+1;

newchoice->arity = w->arity;

for(i=0; i!=w->arity ; i++) newchoice->areg[i] = X(i);

#ifdef TIMESTAMP

newchoice->timestamp = w->time;

w->uncond = w->time;

#else

w->uncond = w->heap_top;

#endif

w->choice = newchoice;

pc = DispatchLabel(pc,L);

goto instructions;

}

retry(L) In a procedure that contains more than one alternative this must precede each clause but the rst and the last. The next alternative is located at L.

w->choice->next_clause = pc+1;

(33)

pc = DispatchLabel(pc,L);

goto instructions;

trust(L) This precedes the last clause in a procedure.

w->choice = w->choice->last_choice;

#ifdef TIMESTAMP

w->uncond = w->choice->timestamp;

#else

w->uncond = w->choice->global_top;

#endif

pc = DispatchLabel(pc,L);

goto instructions;

try_me_else(L)

This is used only in procedures which include multiple alternatives. The next alterna- tive is located at L.

{ register choicepoint *newchoice;

register int i;

#ifdef TIMESTAMP w->time += TIMEUNIT

#endif

newchoice = (choicepoint *) Get_Local_Stack_Top;

newchoice->trail_top = w->trail_top;

newchoice->global_top = w->heap_top;

newchoice->last_choice = w->choice;

newchoice->cont_env = w->frame;

newchoice->next_instr = w->next_instr;

newchoice->next_clause = DispatchLabel(pc,L);

pc++;

newchoice->arity = w->arity;

for(i=0; i!=w->arity ; i++) newchoice->areg[i] = X(i);

#ifdef TIMESTAMP

newchoice->timestamp = w->time;

w->uncond = w->time;

#else

w->uncond = w->heap_top;

#endif

w->choice = newchoice;

goto instructions;

}

(34)

retry_me_else(L)

In a procedure that contains more than one alternative this must precede each clause but the rst and the last. The following alternative is located at L.

w->choice->next_clause = DispatchLabel(pc,L);

pc++;

goto instructions;

trust_me This precedes the last alternative in a procedure.

w->choice = w->choice->last_choice;

#ifdef TIMESTAMP

w->uncond = w->choice->timestamp;

#else

w->uncond = w->choice->global_top;

#endif

goto instructions;

get_constant_x0(C)

This represents a rst head argument that is the constant C. X0 is already dereferenced and is either uninstantiated or instantiated to C.

{ if (IsVar(X(0))) {

Bind(X(0),C,goto fail;);

} else if (X(0) != C)

goto fail;

goto instructions;

}

In the parallel version X0 is not guaranteed to be dereferenced since some other pro- cessor may have bound the variable. We therefore have to deal with the situation that some other processor may bind X0 at all times. Therefore the third argument toBind.

get_nil_x0

This represents a rst head argument that is the empty list. X0 is already dereferenced and is either uninstantiated or instantiated to the empty list.

{ if (IsVar(X(0))) {

Bind(X(0),atom_nil,goto fail;);

} else if (X(0) != atom_nil) goto fail;

goto instructions;

} get_structure_x0(F)

This represents a rst head argument that is a structure whose functor is F. X0 is already dereferenced and is either uninstantiated or instantiated to a structure whose functor is F. The instruction is followed by a sequence ofunify instructions.

(35)

{ if(IsVar(X(0))) { register TAGGED new;

Make_STR(w->heap_top,new,F);

Bind(X(0),new,goto fail;);

goto write_instructions;

} else {

if(GetFunctor(X(0)) != F)

goto fail;

s = GetArg(X(0),0);

goto instructions;

} }

get_list_x0

This represents a rst head argument that is a list. X0 is already dereferenced and is either uninstantiated or instantiated to a list. The instruction is followed by a sequence of unifyinstructions.

{ if (IsVar(X(0))) { register TAGGED l;

Make_LST(w->heap_top,l);

Bind(X(0),l,goto fail;);

goto write_instructions;

} else if (IsLST(X(0))) { s = TagToPointer(X(0));

goto instructions;

} else

goto fail;

}

4.2 Utility Instructions

choice_x(n)

This represents the presence of a cut operator in a negation, disjunction or implication which is the rst procedure call of this clause. The cut will reset the current choicepoint from the temporary variable Xn.

X(n) = PointerToTerm(w->choice0);

goto instructions;

(36)

choice_y(n)

This represents the presence of a cut operator after the rst procedure call of this clause. The cut will reset the current choicepoint from the permanent variable Yn.

Y(n) = PointerToTerm(w->choice0);

goto instructions;

cut This represents a cut operator before the rst procedure call not occurring in a negation, disjunction, or implication.

if(w->choice > w->choice0) { w->choice = w->choice0;

#ifdef TIMESTAMP

w->uncond = w->choice->timestamp;

#else

w->uncond = w->choice->global_top;

#endif

TidyTrail;

}goto instructions;

cut_x(n) This represents a cut operator before the rst procedure call occurring in a negation, disjunction, or implication. The cut will reset the current choicepoint from the tempo- rary variable Xn.

w->choice = (choicepoint *) TermToPointer(X(n));

#ifdef TIMESTAMP

w->uncond = w->choice->timestamp;

#else

w->uncond = w->choice->global_top;

#endif TidyTrail;

cut_y(n) This represents a cut operator after the rst procedure call. The cut will reset the current choicepoint from the permanent variable Yn.

w->choice = (choicepoint *) TermToPointer(Y(n));

#ifdef TIMESTAMP

w->uncond = w->choice->timestamp;

#else

w->uncond = w->choice->global_top;

#endif TidyTrail;

builtin(Fnk,n1,...,nn)

This represents a call to the inlineable builtin procedure Fnk with the rst argument in Xn1, the second in Xn2,...

{ if ((GetInlineFnk(Fnk))(w,Get_UseArgs(pc)) == FALSE) goto fail;

pc += GetInlineArity(Fnk);

goto instructions;

(37)

}

inline(Faillabel,Fnk,n1,...,nn)

This represents a call to the inlineable builtin procedure Fnk with the rst argument in Xn1, the second in Xn2,... If the procedure fails, then the execution proceeds at Faillabel.

{ if ((GetInlineFnk(Fnk))(w,Get_UseArgs(pc)) == FALSE) { pc = DispatchLabel(pc,Faillabel)

} else {

pc += GetInlineArity(Fnk);

}

goto instructions;

}

meta_call(i,k)

This represents a meta call of the temporary variable Xi not occurring last in a clause.

The environment has size k.

pc++; /* skip environment size */

w->next_instr = pc+1;

goto meta_execute;

meta_execute(i)

This represents a meta call of the temporary variable Xi occurring last in a clause.

{ register TAGGED goal;

register definition *def;

register int k;

Deref(goal,X(i));

/* Get definition */

(38)

if (IsSTR(goal))

{ def = get_definition(GetFunctor(goal));

k = GetArity(goal);

else if (IsATM(goal))}

{ def = get_definition(StoreFunctor(goal,0));

k = 0;

else if (IsLST(goal))}

{ def = get_definition(functor_list);

k = 2;

else if (IsNumber(goal))} { goto fail;

else}

{ luther_error(E_ILLEGAL_GOAL, goal);

goto fail;

}

/* check for events */

switch (def->enter_instruction) { case ENTER_INTERPRETED:

w->choice0 = w->choice;

X(0) = goal;

if(w->lut_trace == 1) {

pc = interpret_goal_trace->entry_code.incoreinfo;

} else {

pc = interpret_goal->entry_code.incoreinfo;

}break;

case ENTER_SPY:

w->choice0 = w->choice;

X(0) = goal;

pc = interpret_goal_trace->entry_code.incoreinfo;

break;

case ENTER_EMULATED:

/* Copy arguments from structure */

(39)

if(IsLST(goal)) {

X(0) = Ref(GetCar(goal));

X(1) = Ref(GetCdr(goal));

} else {

while(i--) {

X(i) = Ref(GetArg(goal,i));

} }

/* save the program counter in the continuation */

w->choice0 = w->choice;

pc = def->entry_code.incoreinfo;

break;

case ENTER_C:

/* Copy arguments from structure, must be struct or atom since ','(_,_) is not a C defined predicate */

while(i--) {

X(i) = Ref(GetArg(goal,i));

}

switch((def->entry_code.cinfo)(w)) { case FALSE:

goto fail;

case TRUE:

pc = w->next_instr;

break;

}break;

case ENTER_UNDEFINED:

luther_error(E_PRED_NOT_DEF, (TAGGED) def);

goto fail;

}

goto instructions;

}

4.3 Procedural Instructions

These instructions are used to deal with recursive procedure calls and managing the temporary data areas needed on the stack.

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Generally, a transition from primary raw materials to recycled materials, along with a change to renewable energy, are the most important actions to reduce greenhouse gas emissions

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

I dag uppgår denna del av befolkningen till knappt 4 200 personer och år 2030 beräknas det finnas drygt 4 800 personer i Gällivare kommun som är 65 år eller äldre i

På många små orter i gles- och landsbygder, där varken några nya apotek eller försälj- ningsställen för receptfria läkemedel har tillkommit, är nätet av