[m-dev.] For Review: Bytecode interpreter
Fergus Henderson
fjh at cs.mu.OZ.AU
Thu Jan 25 21:30:31 AEDT 2001
On 24-Jan-2001, Levi Cameron <l.cameron2 at ugrad.unimelb.edu.au> wrote:
>
> +++ mb_machine.h 2001/01/24 07:44:34
> +struct MB_Machine_State_Tag;
> +typedef struct MB_Machine_State_Tag MB_Machine_State;
For consistency with code elsewhere, s/_Tag/_Struct/g
> @@ -193,6 +71,11 @@
> +/* Create a bytecode machine */
> +void MB_machine_create(MB_Word* new_ip, MB_Machine_State* ms);
What's the meaning of the `new_ip' parameter?
> +/* Execute a bytecode machine */
> +MB_Word* MB_machine_exec(MB_Machine_State* ms);
You should document the meaning of the return value.
Also, when does execution terminate?
> +++ mb_machine_show.c 2001/01/24 07:44:34
> +#include <mercury_imp.h>
"..." not <...>.
> /* Display the current state of the machine */
> void
> MB_show_state(MB_Machine_State* ms, FILE* fp)
> {
...
> /* show the surrounding lines of code */
> - MB_listing(ms, fp, ip-2, ip+4);
> + MB_listing(ms, fp, ip-2, ip+4, 73);
s/-/ - /
s/+/ + /
What's the magic number 73?
> fprintf(fp, "\n");
>
> {
> + MB_Word i;
>
> + /* Show the registers */
> + for (i = 0; i < NREGS; i++) {
> + fprintf(fp, "reg[%02d] = %08x ",
s/%08x/0x%08x/
It may also be useful to precede this with the decimal value.
> + fprintf(fp, " sp = %08x "
> + " maxfr = %08x\n",
Likewise here.
> + (int)MB_sp,
> + (int)MB_maxfr);
> + for (i = 0; i < NSTACK; i++) {
> + fprintf(fp, "%cdet[%02d] = %08x "
> + "%cnondet[%02d] = %08x\n",
And here.
> Index: mb_mem.c
> /* Imports */
> #include <stdlib.h>
> #include <string.h>
> +#include <mercury_tags.h>
<> vs "".
> @@ -32,6 +29,33 @@
>
> /* Implementation */
>
> +#ifndef MB_NO_GC
> +# pragma message "Garbage collection is on"
What's the `# pragma message'? That is non-portable, and will result
in warnings from gcc's `-Wunknown-pragmas'.
> +# define GC_DEBUG
That should probably be defined by setting CFLAGS
in Mmake.params rather than being unconditionally
enabled here.
> +# include <gc.h>
<> vs "".
> +/*
> +** Initialise memory allocation
> +*/
> +void
> +MB_mem_init()
s/()/(void)/
> +{
> +#ifndef MB_NO_GC
> + int i;
> +
> + /* Initialise the garbage collector */
> + GC_INIT();
> +
> + for (i = 0; i < (1 << TAGBITS); i++) {
> + GC_REGISTER_DISPLACEMENT(i);
> + }
> +#endif
> +}
The Mercury runtime library startup code will do that for you anyway.
Why is that code duplicated here?
Are you not linking in the Mercury runtime library?
> void*
> MB_malloc(size_t size)
> {
What's the reason for this layer of abstraction?
It should be documented.
> Index: mb_mem.h
> /*
> ** Do not use MB_malloc() or MB_realloc() directly, unless you want
> ** to allocate raw memory. Normally you should use the macros
> ** MB_new(), MB_new_array(), and MB_resize_array() instead.
s/new/NEW/ and likewise for new_array and resize_array.
These should be in capitals since they are macros
(and not function-like).
> +/*
> +** Garbage collected versions of the above
> +**
> +*/
> +
> +/* Atomic == doesn't contain any pointers */
> +
> +void* MB_GC_malloc(size_t size);
> +
> +void* MB_GC_malloc_atomic(size_t size);
> +
> +void* MB_GC_realloc(void* mem, size_t size);
Is it OK to call MB_GC_realloc() for something allocated with
MB_GC_malloc_atomic()? The names don't make it clear.
Likewise for MB_GC_resize_array().
> Index: mb_stack.h
> +typedef struct MB_Stack_Tag {
s/_Tag/_Struct/
> +/* allocates space for a new stack. gc indicates whether the stack region
> +** should be allocated with the garbage collector or with the C malloc
> +** (the garbage collector won't follow references from the c malloc area)
> +**
> +** For the garbage collector, assumes that data is not atomic
> +*/
The comment layout does not match our C coding guidelines.
Sentences should start with a capital letter or quote.
(For sentences starting with e.g. `gc', put it in single
quotes and/or rephrase, e.g. "The `gc' parameter indicates ...".)
Which garbage collector is "the" garbage collector?
Is that the Boehm et al conservative collector?
Or the Mercury accurate garbage collector?
Or perhaps even the .NET garbage collector or the Java garbage collector? ;-)
> +MB_Stack MB_stack_new(MB_Word init_size, MB_Bool gc);
> /* get number of words already pushed on stack */
> MB_Word MB_stack_size(MB_Stack* s);
> -/* pushes a value onto the stack */
> -void MB_stack_push(MB_Stack* s, MB_Word x);
> +/* pushes a value onto the stack. Return index of pushed word */
> +MB_Word MB_stack_push(MB_Stack* s, MB_Word x);
> /* removes a value off the stack */
> MB_Word MB_stack_pop(MB_Stack* s);
> /* allocates space for multiple places on the stack */
> @@ -40,7 +43,7 @@
> /* peek at an item index items away from the top of the stack */
> MB_Word MB_stack_peek_rel(MB_Stack* s, MB_Word idx);
> /* get the address for the item at index
> -** Note: if you add or remove items, this value could change */
> +** NOTE: if you add or remove items, this value could change */
> MB_Word* MB_stack_peek_p(MB_Stack* s, MB_Word idx);
> /* get the address for the item at index relative to the top of the
> stack */
> MB_Word* MB_stack_peek_rel_p(MB_Stack* s, MB_Word idx);
> @@ -51,6 +54,16 @@
> /* deallocate space for the stack */
> void MB_stack_delete(MB_Stack* s);
>
> -#endif /* MB_STACK_H */
> +/*
> +** Uses the stack to allocate num elements of type, returns pointer to
> first
> +** element (rounds total memory allocated up to a multiple of
> sizeof(MB_Word))
> +*/
> +#define MB_stack_alloc_mem(stack, type, num) \
> + MB_stack_alloc((stack), \
> + ( (num) \
> + * (sizeof(type)+sizeof(MB_Word)-1) \
> + / sizeof(MB_Word) ) \
> + )
The argument to MB_stack_alloc() is a size in words.
For type=char, num=10, sizeof(MB_Word)=4,
this will allocate 10 * (1+4-1) / 4 = 10 * 4 / 4 = 40 / 4 = 10
words. That is more than necessary, since for 10 characters
you only need allocate 3 words.
I suggest you use a bit of abstraction/code reuse, e.g. use the
MR_round_up() macro, or define something similar yourself and use that.
(Using the MR_round_up() macro might not be completely appropriate
since that assumes that you're rounding up to a power of two,
and it would be better to avoid that assumption if possible.
But it would make sense to define something similar.)
> Index: mb_util.c
> +void SAY(const char* fmt, ...)
That is not namespace-clean. It should be MB_SAY().
> Index: mb_util.h
>
> +void SAY(const char* fmr, ...);
That should be documented.
s/SAY/MB_SAY/
> +++ mb_interface.h Wed Jan 24 18:44:34 2001
> +typedef struct {
> + /*
> + ** if cached_adr is NULL, this function hasn't been looked up yet
> + */
> +
> + MB_Word* cached_ip;
> + const char* module_name;
> + const char* pred_name;
> + MB_Word proc_num;
> + MB_Word arity;
> + MB_Byte is_func;
> +} MB_Call;
You should document what the MB_Call structure is for.
I'm a bit confused about the MB_CString stuff,
but is there some particular reason why you're using
`const char *' here and `MB_CString_Const' elsewhere?
It might be a good idea to make greater use of typedefs
for the other fields, e.g. `MB_CodeAddr' instead of
`MB_Word*'.
> +/* Entry point for a native code call to det bytecode */
> +MB_Word* MB_bytecode_call_entry(MB_Call* bytecode_call);
Document the meaning of the return value.
s/det/model_det/
> +/* Return to deterministic code after call to native code */
> +MB_Word* MB_bytecode_return_det(void);
Likewise.
> +/* Returns pointer to the stub that calls bytecode_return_det */
> +MB_Word* MB_native_get_return_det(void);
> +
> +/* Find the native code entry point for a procedure */
> +MB_Word* MB_code_find_proc_native(MB_CString_Const module,
> + MB_CString_Const pred, MB_Word proc,
> + MB_Word arity, MB_Byte is_func);
It would be a good idea to use a more meaningfully named
typedef rather than `MB_Word*' for the return type.
> +/**************************************************************/
> +/*
> +** det stack
> +**
> +** Each det stack frame looks like the following:
> +** sp-0: [succip]
> +** sp-1: [semidet success indicator - may or may not be there]
> +** sp-2: [var 0]
> +** sp-3: [var 1]
> +** ...
> +** sp-n: [temp 0]
> +** sp-n-1: [temp 1]
> +** ...
> +**
> +** Note the addition of [number of vars] so that MB_func_type_check
> +** can determine where the temp list begins
> +*/
> +
> +#define MB_DETFRAME_SUCCIP (0) /* saved succip */
> +#define MB_DETFRAME_BC_SUCCIP (1) /* bytecode return address for stub */
The code here doesn't match the comments.
Is sp-1 the success indicator, or the "bytecode return address for stub"?
> +
> +#define MB_DETFRAME_SIZE (2) /* size of det stack frame */
> +
> +/*
> +** nondet stack
> +**
> +** An ordinary stack frame looks like so:
> +** curfr[ 0] prevfr
> +** curfr[-1] redoip
> +** curfr[-2] redofr
> +** curfr[-3] succip
> +** curfr[-4] succfr
> +** then follows var[0] to var[n]
> +** then follows temp[0] to temp[n]
> +**
> +*/
> +#define MB_FRAME_PREVFR (0)
> +#define MB_FRAME_REDOIP (1)
> +#define MB_FRAME_REDOFR (2)
> +
> +#define MB_FRAME_DETFR (3)
> +
> +#define MB_FRAME_SUCCIP (3)
> +#define MB_FRAME_SUCCFR (4)
> +
> +#define MB_FRAME_SIZE 5
> +#define MB_FRAME_TEMP_DET_SIZE 4
> +#define MB_FRAME_TEMP_SIZE 3
What are all these macros for?
Their meaning should be documented.
The comments above don't seem to relate to
the macros defined below, at least for MB_FRAME_DETFR.
And what are FRAME_TEMP_SIZE and FRAME_TEMP_DET_SIZE?
> +/**************************************************************/
> +/* register definitions */
> +#define MB_reg(n) MR_virtual_reg(n)
> +#define MB_succip MR_virtual_succip
> +#define MB_sp MR_virtual_sp
> +#define MB_curfr MR_virtual_curfr
> +#define MB_maxfr MR_virtual_maxfr
It would be good to have a comment here explaining that while in the
bytecode interpreter, you use the copies of the registers that have
been saved in the MR_fake_reg array, rather than the real machine registers.
> +#define MB_incr_sp(x) ( \
> + MB_sp = MB_sp + (x), \
Use `+='.
> +/**************************************************************/
> +/* tags */
> +#include "mercury_tags.h"
> +#define MB_mktag(t) MR_mktag(t)
> +#define MB_mkbody(b) MR_mkbody(b)
> +#define MB_tag(t) MR_tag(t)
> +#define MB_body(w,t) MR_body(w,t)
> +#define MB_mkword(t,b) MR_mkword(t,b)
> +#define MB_strip_tag(w) MR_strip_tag(w)
What's the purpose of the extra level of abstraction here?
I think you may as well just use the MR_* versions directly
in the bytecode interpreter.
> +#endif /* MB_INTERFACE_H */
> ===================================================================
> --- /dev/null Wed Nov 22 17:39:10 2000
> +++ mb_interface.c Wed Jan 24 18:44:34 2001
> @@ -0,0 +1,172 @@
> +
> +/*
> +** Copyright (C) 2000-2001 The University of Melbourne.
> +** This file may only be copied under the terms of the GNU Library
> General
> +** Public License - see the file COPYING.LIB in the Mercury
> distribution.
> +**
> +*/
> +
> +
> +#include "mercury_imp.h"
> +#include "mercury_regs.h"
> +#include "mercury_trace.h"
> +#include "mercury_trace_tables.h"
> +
> +#include "mb_interface.h"
> +#include "mb_module.h"
> +#include "mb_machine.h"
> +
> +/* Exported definitions */
> +
> +/* Local declarations */
> +
> +/* Implementation */
> +
> +/*
> +** A native code procedure wishes to call a deterministic bytecode
> procedure
> +*/
> +
> +#include "mb_machine_def.h" /* needed to instantiate MB_Machine_State
> */
Why isn't this #include with the other #includes?
> +/* This is called when native code wishes to call bytecode */
> +MB_Word*
> +MB_bytecode_call_entry(MB_Call* bytecode_call)
The comment there seems to substantially duplicate the one above.
> + MB_fatal("Unable to find procedure\n"
> + "(Is the native code and bytecode consistent?)");
> + }
s/Is the/Are the/
> +/*
> +** When a det procedure wishes to call native code:
> +** Bytecode sets SUCCIP to MB_bytecode_return_det,
> +** Bytecode sets MB_DETFRAME_BC_SUCCIP to bytecode return address
> +** returns to native code, passing the address of the native code
> +** to be called
> +**
> +** After native code is finished, returns to this function which
> +** reads MB_DETFRAME_BC_SUCCIP and returns to that location
> +*/
> +MB_Word*
> +MB_bytecode_return_det(void)
> +{
> + MB_Word* ip = (MB_Word*)MB_stackitem(MB_DETFRAME_BC_SUCCIP);
> +
> + MB_Machine_State ms;
> + MB_machine_create(ip, &ms);
> +
> + ip = MB_machine_exec(&ms);
> +
> + return ip;
> +}
I think that function needs to be defined using MR_Define_Entry()
rather than being a C function.
> +MB_Word*
> +MB_native_get_return_det(void)
> +{
> + static MB_Word* return_det_stub = NULL;
> + if (return_det_stub == NULL) {
> + return_det_stub =
> + MB_code_find_proc_native(
> + "mb_interface_stub",
> + "mb_native_return_det",
> + 0, 1, FALSE);
> + }
> + return return_det_stub;
> +}
> +
> +
> +MB_Word*
> +MB_code_find_proc_native(MB_CString_Const module, MB_CString_Const
> pred,
> + MB_Word proc, MB_Word arity, MB_Byte is_func)
> +{
> + MR_Matches_Info matches;
> + MR_Proc_Spec spec;
> +
> + MR_register_all_modules_and_procs(stderr, TRUE);
> + SAY("\n");
> +
> + spec.MR_proc_module = module;
> + spec.MR_proc_name = pred;
> + spec.MR_proc_arity = arity;
> + spec.MR_proc_mode = proc;
> + spec.MR_proc_pf = (is_func) ? MR_FUNCTION : MR_PREDICATE;
> +
> + SAY("Looking for procedures .... ");
> + matches = MR_search_for_matching_procedures(&spec);
> +
> + {
> + MB_Word i;
> + for (i = 0; i < matches.match_proc_next; i++) {
> + SAY("Match %d: %s %s__%s/%d (%d)",
> + i,
> + (matches.match_procs[i]
> + ->MR_sle_proc_id.MR_proc_user
> + .MR_user_pred_or_func == MR_PREDICATE) ?
> + "pred" : "func",
> + matches.match_procs[i]
> + ->MR_sle_proc_id.MR_proc_user
> + .MR_user_def_module,
> + matches.match_procs[i]
> + ->MR_sle_proc_id.MR_proc_user
> + .MR_user_name,
> + matches.match_procs[i]
> + ->MR_sle_proc_id.MR_proc_user
> + .MR_user_arity,
> + matches.match_procs[i]
> + ->MR_sle_proc_id.MR_proc_user
> + .MR_user_mode
> + );
> + }
> + }
> +
> + switch (matches.match_proc_next) {
> + case 0:
> + return NULL;
> + case 1:
> + {
> + MB_Word adr = (MB_Word)
> + matches.match_procs[0]->
> + MR_sle_traversal.MR_trav_code_addr;
> + SAY("Adr %08x", adr);
> + }
> + return matches.match_procs[0]->
> + MR_sle_traversal.MR_trav_code_addr;
> + default:
> + MB_fatal("More than one native code entry found!");
> + return NULL;
> + }
> +}
> +
> ===================================================================
> --- /dev/null Wed Nov 22 17:39:10 2000
> +++ mb_basetypes.h Wed Jan 24 18:44:34 2001
> @@ -0,0 +1,48 @@
> +
> +/*
> +** Copyright (C) 2000-2001 The University of Melbourne.
> +** This file may only be copied under the terms of the GNU Library
> General
> +** Public License - see the file COPYING.LIB in the Mercury
> distribution.
> +**
> +** This file contains the basic type definitions
> +*/
> +
> +#ifndef MB_BASETYPES_H
> +#define MB_BASETYPES_H
> +
> +#include <stdio.h>
> +
> +#include <mercury_conf.h>
> +#include <mercury_types.h>
> +#include <mercury_float.h>
<> vs "".
> +/* XXX expects sizeof(unsigned char) == 1 */
> +typedef unsigned char
> + MB_Byte;
The C standard guarantees that sizeof(unsigned char) == 1,
so delete that XXX.
> +typedef MR_Word
> + MB_Word;
> +
> +typedef MR_Unsigned
> + MB_Unsigned;
What's the difference between MB_Word and MR_Word?
What's the difference between MB_Unsigned and MR_Unsigned?
Is there any advantage to the extra layer of abstraction here?
> +/* XXX Assume char is 8 bits (Is there a system where is isn't) */
> +#define MB_WORD_BITS ((sizeof(MB_Word)/sizeof(char))*8)
Use CHAR_BIT from <limits.h> rather than hard-coding 8.
sizeof(char) is guaranteed to be 1, so just write
that as (sizeof(MB_Word) * CHAR_BIT).
Or you could just use `#define MB_WORD_BITS MR_WORDBITS',
since MR_WORDBITS is defined in runtime/mercury_tags.h.
> +:- module mb_interface_stub.
> +
> +:- interface.
> +
> +:- pred mb_native_return_det_stub(int).
> +:- mode mb_native_return_det_stub(out) is det.
> +
> +:- implementation.
> +
> +:- pragma c_code(
> + mb_native_return_det_stub(Garbage::out),
> + [may_call_mercury],
> +" {
> + }
> + Garbage = 4;
> +"
That code is clearly unfinished.
You should document what it is for.
Get rid of the Garbage::out parameter.
And it would be a good idea to declare the predicate as impure.
> +++ mb_machine_def.h Wed Jan 24 18:44:34 2001
> +struct MB_Machine_State_Tag {
s/_Tag/_Struct/
> + MB_Module* module;
> + MB_Word* ip; /* next instruction pointer*/
Is that a pointer to a machine instruction or to a byte code instruction?
> + /* The native code address to return to at finish */
> + MB_Word* native_return;
That should probably have type `MR_Code *' rather than `MB_Word *'.
> + /* The following proc information is all set by MB_func_type_check()
> */
> + struct {
> + /* The determinism of the currently executing function */
> + /* (set to a return value from MB_code_get_det) */
> + MB_Word is_det;
s/function/procedure/
Is that field the determinism (i.e. an enum of 8 alternatives),
the code_model (i.e. an enum of 3 alternatives),
or is it just true iff the procedure is model_det?
The comment isn't clear.
> +#include <mercury_std.h>
<> vs "".
> +#if (MR_VARIABLE_SIZED > 0)
That is a syntax error in the case when
MR_VARIABLE_SIZED is defined to /* nothing */
> +# define MB_CLOSURE_SIZE(x) (sizeof(MB_Closure) \
> + - sizeof(((MB_Closure*)(NULL))-> \
> + closure_hidden_args \
> + + sizeof(MB_Word)*(x))
> +#else
> +# define MB_CLOSURE_SIZE(x) (sizeof(MB_Closure) \
> + + sizeof(MB_Word)*(x))
> +#endif
Instead of those contortions, use offsetof().
> +typedef struct {
> + MB_Word code_adr;
> + MB_Word num_hidden_args;
> + MB_Word closure_hidden_args[MR_VARIABLE_SIZED];
> +} MB_Closure;
Why don't you use the same structure for closures as is used in the C
back-end?
> +++ mb_module.h Wed Jan 24 18:44:34 2001
> +struct MB_Module_Tag;
> +typedef struct MB_Module_Tag MB_Module;
s/_Tag/_Struct/g
> +/*
> +** Special code addresses. INVALID_ADR must be last as the asserts
> +** assume that any address above INVALID_ADR is a special code
>
> +** If you alter these, ensure MB_ip_special reflects this
> +*/
> +#define MB_CODE_DO_FAIL ((MB_Word*)(-1))
> +#define MB_CODE_DO_REDO ((MB_Word*)(-2))
> +#define MB_CODE_NATIVE_RETURN ((MB_Word*)(-3))
> +#define MB_CODE_INVALID_ADR ((MB_Word*)(-4))
Pointer comparisons may be either signed or unsigned.
Don't rely on either.
> +/*
> +** A native code procedure wishes to call a deterministic bytecode
> procedure
> +*/
> +MB_Module* MB_module_load_name(MB_CString_Const module_name);
> +MB_Module* MB_module_load(MB_CString_Const module_name, FILE* fp);
> +void MB_module_unload(MB_Module* module);
The comment there looks like a cut-and-paste error.
> +/*
> +** returns a pointer to a given module
> +** If the module is not loaded, loads it
> +*/
Full stops at the end of sentences, please.
Sentences should start with a capital letter.
> +/* Return the number of bytecodes there are */
> +MB_Word MB_code_size(void);
The comment is unclear.
Do you mean the number of bytecodes in the module? In the program?
The number of bytecodes that have been loaded so far?
Or the number of difference bytecodes that the bytecode interpreter
knows how to interpret?
Why is the return type `MB_Word' rather than e.g. `MB_Unsigned'
(or just `int', for that matter)?
> +#define MB_ISDET_NO 0
> +#define MB_ISDET_YES 1
> +/* Get the determinism of the procedure a code at a given address is in
> */
> +MB_Byte MB_code_get_det(MB_Word*adr);
Does this return the determinism, or does it return
a boolean which is true iff the procedure uses the det stack?
The comment seems to indicate the former, but it is unclear, given the
#defines that precede it, and the return type doesn't help.
> +/* Get the bytecode argument at a given address */
> +MB_Bytecode_Arg*MB_code_get_arg(MB_Word*adr);
> +/* Get the predicate in which the following address resides */
> +/*MB_Bytecode MB_code_get_pred(MB_Word* adr);*/
> +MB_Word* MB_code_get_pred_adr(MB_Word* adr);
> +/* Get the procedure in which the following address resides */
> +/*MB_Bytecode MB_code_get_proc(MB_Word* adr);*/
> +MB_Word* MB_code_get_proc_adr(MB_Word* adr);
Please use some more blank lines.
Why are the commented-out prototypes commented out?
> +/* Allocate memory in the code argument data array */
> +#define MB_code_data_alloc(type, number) \
> + ((type*)(MB_code_data_alloc_words(((sizeof(type)*(number))+3) / 4)))
What are the magic numbers 3 and 4 doing here?
You should use the round_up macro that I mentioned earlier.
This macro should be in capitals, since its argument is
a type rather than an expression, so it's not a function-like macro.
> +++ mb_module.c Wed Jan 24 18:44:34 2001
> +#define MB_BCID_MAKE(id, arg) ( ((id) & ((1 << CHAR_BIT) - 1)) | \
> + (((MB_Word*)(arg) - code_arg_data) << CHAR_BIT)\
> + )
Complicated bit hacking like this should generally be avoided if possible.
> +/* get the bytecode id (with determinism flag) */
> +#define MB_BCID_IDDET(x) ((x) & ((1 << CHAR_BIT)-1))
> +/* get the bytecode id (without determinism flag) */
> +#define MB_BCID_ID(x) ((x) & ((1<<(CHAR_BIT-1)) - 1))
> +
> +/* get the determinism flag for the given bytecode */
> +#define MB_BCID_ISDET ((1) << (CHAR_BIT-1))
> +#define MB_BCID_DET(x) ((x) & MB_BCID_ISDET)
> +/* get the bytecode argument pointer */
> +#define MB_BCID_ARG(x) ((MB_Bytecode_Arg*) \
> + (code_arg_data + \
> + ((MB_Unsigned)(x) >> CHAR_BIT)) \
> + )
> +
> +/* XXX: not thread safe */
> +static MB_Word code_count = 0;
> +static MB_Word code_id[MAX_CODE_COUNT];
> +
> +#define CODE_DATA_NONE 0 /* 0 is reserved for indicating no data */
> +static MB_Word code_data_count = 1;
> +static MB_Word code_arg_data[MAX_CODE_DATA_COUNT];
> +
> +struct MB_Module_Tag {
> + /* XXX: Hash the module & predicate names */
> + /* The name of the module */
> + MB_CString module_name;
> +
> + /*
> + ** The following should not be directly accessed unless
> + ** absolutely necessary; use one of the (many) wrapper functions
> + */
> +
> + /*
> + ** The code indices of all the predicates in this module
> + ** If this is empty, then it means we tried to load
> + ** the module but we couldn't find bytecodes for it
> + */
> + /* XXX: This really should be hashed too */
> + MB_Stack pred_index_stack;
> +
> +};
> +
> +/* XXX: not thread safe */
> +static MB_Word module_count = 0;
> +static MB_Module* module_arr[MAX_MODULES];
> +
> +static MB_Bool translate_calls(MB_Word* bc, MB_Word number_codes);
> +static MB_Bool translate_labels(MB_Word* bc, MB_Word number_codes,
> + MB_Stack* label_stack);
> +static MB_Bool translate_detism(MB_Word* bc, MB_Word number_codes);
> +static MB_Bool translate_switch(MB_Word* bc, MB_Word number_codes);
> +static MB_Bool translate_temps(MB_Word* bc, MB_Word number_codes);
> +
> +/* Implementation */
> +
> +/*
> +** Translates calls from a predicate name/procedure to an actual code
> address
> +** Translates call & higher_or
> +*/
> +static MB_Bool
> +translate_calls(MB_Word*bc, MB_Word number_codes)
> +{
> + /* XXX: temporarily table the procs, instead of re-searching
> + ** each time, but since there is usually only one proc per predicate,
> + ** don't bother for now
> + */
> +
> + MB_Word i;
> + for (i = 0; i < number_codes; i++, bc++) {
> + /* proc to be called attributes */
> + MB_CString module_name = NULL;
> + MB_CString pred_name = NULL;
> + MB_Word arity;
> + MB_Byte is_func;
> + MB_Word proc_id;
> + /* location to store the proc to be called */
> + MB_Byte* target_is_native = NULL;
> + MB_Word** target_adr = NULL;
> +
> + /* Get the information about the procedure to call */
> + MB_Byte call_id = MB_code_get_id(bc);
> + if (call_id == MB_BC_call) {
> + MB_Bytecode_Arg* call_arg = MB_code_get_arg(bc);
> + module_name = call_arg->call.module_id;
> + arity = call_arg->call.arity;
> + is_func = call_arg->call.is_func;
> + pred_name = call_arg->call.pred_id;
> + proc_id = call_arg->call.proc_id;
> + target_is_native=&call_arg->call.is_native;
> + target_adr = &call_arg->call.adr;
> +
> + } else if (call_id == MB_BC_construct) {
> + MB_Bytecode_Arg* construct_arg =
> + MB_code_get_arg(bc);
> + if (construct_arg->construct.consid.id ==
> + MB_CONSID_PRED_CONST)
> + {
> + MB_fatal("Unable to translate predicate constructs");
> + #if 0
> + module_name = construct_arg->construct.
> + consid.opt.pred_const.module_id;
> + arity = construct_arg->construct.
> + consid.opt.pred_const.arity;
> + is_func = construct_arg->construct.
> + consid.opt.pred_const.is_func;
> + pred_name = construct_arg->construct.
> + consid.opt.pred_const.pred_id;
> + proc_id = construct_arg->construct.
> + consid.opt.pred_const.proc_id;
> + target_adr = &construct_arg->construct.
> + consid.opt.pred_const.adr;
> + #endif
> + }
> + }
> +
> +
> + if (pred_name != NULL) {
> + SAY("Looking for %s %s__%s/%d (%d)",
> + (is_func) ? "func" : "pred",
> + module_name,
> + pred_name,
> + arity,
> + proc_id);
> + }
> +
> + /* Find the predicate start */
> + if (pred_name != NULL) {
> + /* First check if we can find it in the bytecode */
> + MB_Word* adr = MB_code_find_proc(module_name,
> + pred_name, proc_id,
> + arity, is_func);
> +
> + if (adr == MB_CODE_INVALID_ADR) {
> + SAY(" Not found in bytecode");
> + /* Otherwise look in the native code */
> + adr = MB_code_find_proc_native(module_name,
> + pred_name, proc_id, arity, is_func);
> +
> + SAY(" Address from native: %08x", adr);
> + if (adr == NULL) {
> + MB_util_error(
> + "Reference in bytecode %08x"
> + " to unknown"
> + " %s %s__%s/%d (%d)",
> + (int)i,
> + is_func ? "func" : "pred",
> + module_name,
> + pred_name,
> + (int)arity,
> + (int)proc_id);
> + MB_fatal("(Are you sure the module"
> + " was compiled with trace"
> + " information enabled?)");
> + } else {
> + *target_is_native = FALSE;
> + }
> +
> + } else {
> + *target_is_native = TRUE;
> + }
> + *target_adr = adr;
> + }
> + }
> +
> + return TRUE;
> +} /* translate_calls */
> +
> +
> +/* Translates labels to code addresses for those instructions that need
> it
> +** those translated are:
> +** enter_if, endof_then, enter_disjunction, enter_disjunct,
> endof_disjunct
> +** enter_switch, enter_switch_arm, endof_switch_arm, enter_negation,
> enter_proc
> +*/
> +static MB_Bool
> +translate_labels(MB_Word* bc, MB_Word number_codes, MB_Stack*
> label_stack)
> +{
> + MB_Word i;
> + MB_Bytecode_Arg* cur_proc_arg = NULL;
> +
> + for (i = 0; i < number_codes; i++, bc++) {
> + MB_Bytecode_Arg* cur_arg =
> + MB_code_get_arg(bc);
> + #define XLATLABEL(bytecodetype, lbl) \
> + cur_arg->bytecodetype.lbl.adr = \
> + ((cur_arg->bytecodetype.lbl.index < \
> + cur_proc_arg->enter_proc.label_count) \
> + && (cur_arg->bytecodetype.lbl.index > 0) \
> + ? (MB_Word*)MB_stack_peek(label_stack, \
> + cur_proc_arg->enter_proc.label_index + \
> + cur_arg->bytecodetype.lbl.index) \
> + : MB_CODE_INVALID_ADR)
> +
> + switch (MB_code_get_id(bc)) {
> +
> + case MB_BC_enter_proc:
> + cur_proc_arg = cur_arg;
> + XLATLABEL(enter_proc, end_label);
> + break;
> +
> + case MB_BC_enter_if:
> + XLATLABEL(enter_if, else_label);
> + XLATLABEL(enter_if, end_label);
> + break;
> +
> + case MB_BC_endof_then:
> + XLATLABEL(endof_then, follow_label);
> + break;
> +
> + case MB_BC_enter_disjunction:
> + XLATLABEL(enter_disjunction, end_label);
> + break;
> +
> + case MB_BC_enter_disjunct:
> + XLATLABEL(enter_disjunct, next_label);
> + break;
> +
> + case MB_BC_endof_disjunct:
> + XLATLABEL(endof_disjunct, end_label);
> + break;
> +
> + case MB_BC_enter_switch:
> + XLATLABEL(enter_switch, end_label);
> + break;
> +
> + case MB_BC_enter_switch_arm:
> + XLATLABEL(enter_switch_arm, next_label);
> + break;
> +
> + case MB_BC_endof_switch_arm:
> + XLATLABEL(endof_switch_arm, end_label);
> + break;
> +
> + case MB_BC_enter_negation:
> + XLATLABEL(enter_negation, end_label);
> + break;
> +
> + default:
> +
> + }
> + }
> + return TRUE;
> +} /* translate_labels */
> +
> +
> +/* Store the procedure's determinism that each instruction is executing
> under */
> +static MB_Bool
> +translate_detism(MB_Word* bc, MB_Word number_codes)
> +{
> + MB_Word i;
> + MB_Byte bc_id;
> + MB_Byte cur_detism = MB_BCID_ISDET;
> +
> + for (i = 0; i < number_codes; i++, bc++) {
> + bc_id = MB_code_get_id(bc);
> + if (bc_id == MB_BC_enter_proc) {
> + switch (MB_code_get_arg(bc)->enter_proc.det) {
> + case MB_DET_DET:
> + case MB_DET_SEMIDET:
> + cur_detism = MB_BCID_ISDET;
> + break;
> + case MB_DET_MULTIDET:
> + case MB_DET_NONDET:
> + cur_detism = 0;
> + break;
> + case MB_DET_INVALID:
> + cur_detism = 0;
> + break;
> + default:
> + assert(FALSE);
> + }
> + }
> + if (cur_detism) {
> + *bc |= cur_detism;
> + }
> +
> + if (bc_id == MB_BC_endof_proc) cur_detism = 0;
> + }
> + return TRUE;
> +} /* translate_detism */
> +
> +
> +/*
> +** Fill in the variable that each switch arm is using
> +*/
> +static MB_Bool
> +translate_switch(MB_Word* bc, MB_Word number_codes)
> +{
> + MB_Word i;
> + MB_Bytecode_Arg* cur_switch = NULL;
> + for (i = 0; i < number_codes; i++, bc++) {
> + switch (MB_code_get_id(bc)) {
> + case MB_BC_enter_switch:
> + cur_switch = MB_code_get_arg(bc);
> + break;
> +
> + case MB_BC_enter_switch_arm: {
> + MB_Bytecode_Arg* cur_arg
> + = MB_code_get_arg(bc);
> +
> + cur_arg->enter_switch_arm.var =
> + cur_switch->enter_switch.var;
> +
> + break;
> + }
> + }
> + }
> + return TRUE;
> +} /* translate_switch */
> +
> +/*
> +** transform temporary stack slot numbers into variable slot numbers
> +** for all bytecodes that use a temporary stack slot
> +*/
> +#define XLATTEMP(name) case MB_BC_##name: \
> + cur_arg = MB_code_get_arg(bc); \
> + cur_arg->name.frame_ptr_tmp += \
> + cur_proc_arg->enter_proc.list_length; \
> + break;
> +static MB_Bool
> +translate_temps(MB_Word* bc, MB_Word number_codes)
> +{
> + MB_Word i;
> + MB_Bytecode_Arg* cur_arg ;
> + MB_Bytecode_Arg* cur_proc_arg = NULL;
> + MB_Word code_size = MB_code_size();
> + for (i = 0; i < number_codes; i++, bc++) {
> + switch (MB_code_get_id(bc)) {
> + case MB_BC_enter_proc:
> + cur_proc_arg = MB_code_get_arg(bc);
> + break;
> + XLATTEMP(enter_if);
> + XLATTEMP(enter_then);
> + XLATTEMP(enter_negation);
> + XLATTEMP(endof_negation_goal);
> + XLATTEMP(enter_commit);
> + }
> + }
> + return TRUE;
> +}
> +
> +/*
> +** A native code procedure wishes to call a deterministic bytecode
> procedure
> +*/
> +MB_Module*
> +MB_module_load_name(MB_CString_Const module_name)
> +{
> + MB_Module* module;
> + MB_CString filename = MB_str_new_cat(module_name, ".mbc");
> +
> + FILE* fp = fopen(filename, "rb+");
> +
> + module = MB_module_load(module_name, fp);
> +
> + MB_str_delete(filename);
> + return module;
> +} /* MB_module_load_name */
> +
> +
> +/*
> +** Gets a module. Loads the module if it is not already loaded.
> +** If there is no bytecode information for this module, returns NULL
> +*/
> +MB_Module*
> +MB_module_get(MB_CString_Const module_name)
> +{
> + /* Search for the module */
> + MB_Word i;
> + SAY(" Looking for %s among %d modules", module_name, module_count);
> + for (i = 0; i < module_count; i++) {
> + SAY(" Testing module %d", i);
> + if (!MB_str_cmp(module_name, module_arr[i]->module_name)) {
> + SAY(" Module %s found", module_name);
> + return module_arr[i];
> + }
> + }
> +
> + SAY(" module %s not found, attempting to load", module_name);
> +
> + /* We didn't find it so load it */
> + return MB_module_load_name(module_name);
> +} /* MB_module_get */
> +
> +
> +#define ARGSIZE(name) (sizeof(((MB_Bytecode*)NULL)->opt.name) + \
> + sizeof(MB_Word)-1) \
> + / sizeof(MB_Word)
> +/* XXX ORDER */
> +/* the size of the arguments in a MB_Bytecode struct, in number of
> MB_Words*/
> +static const MB_Word argument_size[] = {
> + ARGSIZE(enter_pred),
> + ARGSIZE(endof_pred),
> + ARGSIZE(enter_proc),
> + ARGSIZE(endof_proc),
> + ARGSIZE(label),
> + ARGSIZE(enter_disjunction),
> + ARGSIZE(endof_disjunction),
> + ARGSIZE(enter_disjunct),
> + ARGSIZE(endof_disjunct),
> + ARGSIZE(enter_switch),
> + ARGSIZE(endof_switch),
> + ARGSIZE(enter_switch_arm),
> + ARGSIZE(endof_switch_arm),
> + ARGSIZE(enter_if),
> + ARGSIZE(enter_then),
> + ARGSIZE(endof_then),
> + ARGSIZE(endof_if),
> + ARGSIZE(enter_negation),
> + ARGSIZE(endof_negation),
> + ARGSIZE(enter_commit),
> + ARGSIZE(endof_commit),
> + ARGSIZE(assign),
> + ARGSIZE(test),
> + ARGSIZE(construct),
> + ARGSIZE(deconstruct),
> + ARGSIZE(complex_construct),
> + ARGSIZE(complex_deconstruct),
> + ARGSIZE(place_arg),
> + ARGSIZE(pickup_arg),
> + ARGSIZE(call),
> + ARGSIZE(higher_order_call),
> + ARGSIZE(builtin_binop),
> + ARGSIZE(builtin_unop),
> + ARGSIZE(builtin_bintest),
> + ARGSIZE(builtin_untest),
> + ARGSIZE(semidet_succeed),
> + ARGSIZE(semidet_success_check),
> + ARGSIZE(fail),
> + ARGSIZE(context),
> + ARGSIZE(not_supported),
> + ARGSIZE(enter_else),
> + ARGSIZE(endof_negation_goal)
> +}; /* argument_size */
> +
> +/*
> +** Load a module
> +** If fp is NULL then that means there is no bytecode information
> +** for this module -- revert to native code.
> +*/
> +MB_Module* MB_module_load(MB_CString_Const module_name, FILE* fp) {
> + MB_Short version;
> + MB_Word module_code_count = 0;
> + MB_Word*module_start = code_id + code_count;
> +
> + /* Array of indexes for label translation (used only during load) */
> + MB_Stack label_stack = MB_stack_new(128, FALSE);
> +
> + /* Create the new module */
> + MB_Module* module = MB_GC_new(MB_Module);
> + module->pred_index_stack= MB_stack_new((fp == NULL) ? 0 : 64, FALSE);
> + module->module_name = MB_str_dup(module_name);
> +
> + /* XXX adding to the array like this is not thread safe */
> + if (module_count >= MAX_MODULES) {
> + MB_fatal("Too many modules");
> + }
> + module_arr[module_count++] = module;
> +
> + if (fp == NULL) return module;
> +
> + /* Check the file version is ok */
> + if (!MB_read_bytecode_version_number(fp, &version)) {
> + MB_util_error("Unable to read version number\n");
> + return NULL;
> + }
> +
> + if (version != FILEVERSION) {
> + MB_util_error("Unknown file format version\n");
> + return NULL;
> + }
> +
> + {
> + MB_Bytecode bc;
> + MB_Bytecode_Arg*cur_proc_arg = NULL;
> + MB_Word* cur_proc = MB_CODE_INVALID_ADR;
> +
> + /* read in each bytecode */
> + while (MB_read_bytecode(fp, &bc)) {
> + MB_Bytecode_Arg* cur_arg;
> +
> + if (bc.id == MB_BC_label) {
> + /* XXX: we strictly don't actually need to save the
> + ** labels but it makes label translations a lot faster.
> + ** After translation, the label stack is deleted
> + */
> + if (cur_proc_arg == NULL) {
> + MB_fatal("Label outside proc\n");
> + }
> +
> + /* Add the label to the current proc's list of labels */
> + MB_stack_poke(&label_stack,
> + cur_proc_arg->enter_proc.label_index
> + + bc.opt.label.label,
> + (MB_Word)(code_id + code_count));
> + } else if (bc.id == MB_BC_not_supported) {
> + /*
> + ** We came across unsupported code. Mark this proc as
> + ** unusable
> + */
> +
> + if (cur_proc_arg == NULL) {
> + MB_fatal("Code outside proc\n");
> + }
> +
> + cur_proc_arg->enter_proc.det = MB_DET_INVALID;
> + }
> +
> + /* copy the bytecode arguments into the code.data
> + ** structure, save the index & increment code.data
> + ** counters
> + */
> + if (bc.id < sizeof(argument_size)/sizeof(argument_size[0]))
> + {
> + if (argument_size[bc.id] == 0) {
> + /* If bytecode has no arguments, skip alloc */
> + cur_arg = NULL;
> + } else {
> + /* Allocate space for bytecode's arguments */
> + cur_arg = MB_code_data_alloc(MB_Bytecode_Arg,
> + argument_size[bc.id]);
> +
> + /* Copy arguments onto argument data stack */
> + memcpy(cur_arg,
> + &(bc.opt),
> + argument_size[bc.id]*sizeof(MB_Word));
> +
> + /* Check if we just entered/exited a procedure*/
> + switch (bc.id) {
> + case MB_BC_enter_proc:
> + /*
> + ** Save the new current proc (so
> + ** labels know where they are)
> + */
> + cur_proc = code_id + code_count;
> + cur_proc_arg = cur_arg;
> +
> + /*
> + ** and mark where the label indexes
> + ** will begin
> + */
> + cur_proc_arg->enter_proc.label_index =
> + MB_stack_size(&label_stack);
> +
> + MB_stack_alloc(&label_stack,
> + cur_proc_arg->
> + enter_proc.label_count);
> + break;
> +
> + case MB_BC_endof_proc: {
> + /*
> + ** Save the proc we were in
> + */
> + cur_arg->endof_proc.proc_start =
> + cur_proc;
> +
> + cur_proc_arg = NULL;
> + break;
> + }
> +
> + case MB_BC_enter_pred:
> + MB_stack_push(&module->pred_index_stack,
> + code_count);
> + break;
> + }
> + }
> +
> + /* actually save the bytecode id & argument index*/
> + code_id[code_count] = MB_BCID_MAKE(bc.id, cur_arg);
> + } else {
> + MB_util_error("Unknown op code");
> + MB_module_unload(module);
> + MB_stack_delete(&label_stack);
> + return NULL;
> + }
> + code_count++;
> + module_code_count++;
> + }
> +
> + }
> +
> + if (feof(fp) &&
> + (module_code_count > 0) &&
> + (translate_labels(module_start, module_code_count,
> + &label_stack)) &&
> + (translate_calls(module_start, module_code_count)) &&
> + (translate_detism(module_start, module_code_count)) &&
> + (translate_switch(module_start, module_code_count)) &&
> + (translate_temps(module_start, module_code_count)))
> + {
> + /* delete the label stack; we've done all the translations*/
> + MB_stack_delete(&label_stack);
> +
> + return module;
> + } else {
> + MB_fatal("Error reading bytecode file");
> + }
> + return NULL;
> +
> +} /* MB_module_load */
> +
> +
> +/*
> +** free memory associated with module structure itself
> +** (does not unload bytecodes from code array)
> +*/
> +void
> +MB_module_unload(MB_Module* module)
> +{
> + if (module != NULL) {
> + /* the stacks will always be allocated since it will
> + ** have aborted if their allocation failed
> + */
> + MB_str_delete(module->module_name);
> + MB_stack_delete(&module->pred_index_stack);
> + MB_GC_free(module);
> + }
> +}
> +
> +/* Get the actual size of a program, in bytecodes */
> +MB_Word
> +MB_code_size(void)
> +{
> + return code_count;
> +}
> +
> +#if 0
> +/* Get the bytecode at a given address */
> +MB_Bytecode
> +MB_code_get(MB_Word* adr)
> +{
> + MB_Bytecode bc;
> +
> + assert(MB_ip_normal(adr));
> +
> + bc.id = MB_code_get_id(adr);
> +
> + assert(bc.id < sizeof(argument_size)/sizeof(argument_size[0]));
> +
> + if (argument_size[bc.id] > 0) {
> +
> + memcpy(&(bc.opt),
> + MB_code_get_arg(adr),
> + argument_size[bc.id]*sizeof(MB_Word));
> + }
> + return bc;
> +} /* MB_code_get */
> +#endif
> +
> +/* Get the bytecode type at a given address */
> +MB_Byte
> +MB_code_get_id(MB_Word* adr)
> +{
> + if (!MB_ip_normal(adr))
> + return MB_BC_debug_invalid;
> +
> + /* return the code with the determinism flag stripped away */
> + return MB_BCID_ID(*adr);
> +}
> +
> +/* Get a bytecode's procedure's determinism */
> +MB_Byte
> +MB_code_get_det(MB_Word* adr)
> +{
> + assert(MB_ip_normal(adr));
> +
> + /* return the determinism flag */
> + return MB_BCID_DET(*adr) ? MB_ISDET_YES : MB_ISDET_NO;
> +}
> +
> +/* Get the bytecode argument at a given address */
> +MB_Bytecode_Arg*
> +MB_code_get_arg(MB_Word* adr)
> +{
> + MB_Bytecode_Arg* data_p;
> +
> + if (!MB_ip_normal(adr)) return NULL;
> +
> + data_p = MB_BCID_ARG(*adr);
> + if (data_p == (MB_Bytecode_Arg*)code_arg_data) {
> + return NULL;
> + } else {
> + return data_p;
> + }
> +} /* MB_code_get_arg */
> +
> +MB_Word*
> +MB_code_get_proc_adr(MB_Word* adr)
> +{
> + MB_Byte bc_id;
> + adr++;
> + do {
...
> + }
> + while (bc_id != MB_BC_enter_proc);
The } and while should be on the same line.
> + return adr;
> +} /* MB_code_get_proc_adr */
> +/* given a code address, forces it into a valid range*/
Add a space after "range".
> +/*
> +** Returns true if a given instruction pointer points to a normal
> +** address
> +*/
> +MB_Bool
> +MB_ip_normal(MB_Word* ip)
What's a "normal" address?
> +{
> + return ((ip >= code_id) && (ip < code_id+MAX_CODE_COUNT));
> +}
That test is non standard-conforming; the cases where it is intended
to return FALSE in fact have undefined behaviour according to the
C standard. It would be better to avoid that if possible.
In practice it is moderately portable, so it may be acceptable if
there's no other reasonably efficient alternative. It fails on
e.g. 8086 in "small" code model, where pointer `>=' tests only compare
the offsets, not the segments. A more portable alternative would be
to cast the pointers to `MB_Unsigned' or the like before comparing them.
> +MB_Bool
> +MB_ip_special(MB_Word* ip)
> +{
> + return ((MB_Unsigned)ip > (MB_Unsigned)MB_CODE_INVALID_ADR);
> +}
> +
> +MB_Bool
> +MB_ip_native(MB_Word* ip)
> +{
> + return !MB_ip_special(ip) && !MB_ip_normal(ip);
> +}
You should document these functions.
[...to be continued...]
--
Fergus Henderson <fjh at cs.mu.oz.au> | "I have always known that the pursuit
| of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list