[m-dev.] for review: bytecode interpreter

Fergus Henderson fjh at cs.mu.OZ.AU
Sun Dec 10 23:49:12 AEDT 2000


On 08-Dec-2000, Levi Cameron <l.cameron2 at ugrad.unimelb.edu.au> wrote:
> this is the beginnings of the new bytecode interpreter

At first glance, overall it looks good.

The hard part will be integrating it with the existing implementation.

Since the existing stuff in the bytecode directory doesn't compile,
and since what you've got there looks good, you can't really do much
harm by committing things into our cvs repository as you go, even if
they are not complete.  So I think that it would be good if you could
post a cvs diff for review, following the review guidelines on the
web page, and then once you've addressed the review comments and
once the code freeze is over (which should be very soon) you can
commit it into the cvs repository.

I won't review all of the stuff that you posted now, since you didn't
post diffs for most of the files.  But I have a few preliminary
comments based on what you've posted so far.  These are all minor
comments about some of the details.

On 08-Dec-2000, Levi Cameron <l.cameron2 at ugrad.unimelb.edu.au> wrote:
bytecode/mb_machine.c:
> /*
> ** Copyright (C) 2000 The University of Melbourne.
> ** This file may only be copied under the terms of the GNU Library General
> ** Public License - see the file COPYING.LIB in the Mercury distribution.
> **
> ** $Id$
> */

Personally I think that using the cvs $Id$ keywords is not worth the
hassle.

> #define ARGSIZE(name)	(sizeof(((MB_Bytecode*)NULL)->opt.##name) + \
> 				sizeof(MB_Word)-1) \
> 				/ sizeof(MB_Word)

Use `offsetof(type, field)' rather than `sizeof(((type*)NULL)->field'.
Also macro definitions should always be full parenthesized.
Use whitespace around operators such as `-'.

> /* Translates calls from a predicate name / procedure to an actual code address */
> static MB_Bool
> translate_calls(MB_Machine_State* ms)
> {
> 	/* first run through and save all the predicate names table
> 	** XXX: should use a hash table for this: mercury_hash_table
> 	** has one but it doesn't use the same memory allocation as
> 	** in mb_mem.h - Is this a problem?
> 	*/

Please use the layout conventions specified in the Mercury C coding
guidelines: block comments should use the layout

	/*
	** Comment starts on the second line
	** ...
	*/

not

	/* Comment starts on the first line
	** ...
	*/

(This comment applies in lots of places.)

> /* -------------------------------------------------------------------------- */
> static void instr_invalid	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static void instr_enter_proc	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static void instr_endof_proc	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static void instr_enter_if	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static void instr_endof_if	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static void instr_construct	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static void instr_place		(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static void instr_pickup	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static void instr_call		(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static void instr_builtin_binop	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> 
> static void instr_noop		(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static void instr_notdone	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> 
> 
> /* XXX: relies on the order of the definitions */
> static void (*instruction_table[])(MB_Machine_State*, const MB_Bytecode_Arg*) = {

It would be clearer to use a typedef here.

/*
** An `InterpretInstrFunc' is a C function to interpret a given bytecode instruction
*/
typedef void InterpretInstrFunc (MB_Machine_State*, const MB_Bytecode_Arg*);

static InterpretInstrFunc instr_invalid;
static InterpretInstrFunc instr_enter_proc;
...
static InterpretInstrFunc instr_notdone;

static InterpretInstrFunc *instruction_table[] = {
	...
};

> static void
> instr_place(MB_Machine_State* ms, const MB_Bytecode_Arg* bca)
> {
> 	/* copy value from var slot to reg */
> 	MB_reg_set(ms, bca->place_arg.to_reg,
> 		MB_var_get(ms, bca->place_arg.from_var));
> 
> 	/* XXX for debugging only */
> 	MB_var_set(ms, bca->place_arg.from_var, UNSETSTACK);

That looks wrong.

Why do you clobber the value of the variable there?

> static MB_Word binop_add	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_sub	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_mul	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_div	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_mod	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_lshift	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_rshift	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_and	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_or		(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_xor	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_logand	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_logor	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_eq		(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_ne		(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_lt		(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_gt		(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_le		(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_ge		(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> static MB_Word binop_bad	(MB_Machine_State* ms, const MB_Bytecode_Arg* bca);
> 
> /*
> **	XXX: Currently we depend on the order of elements in the table.
> */
> static MB_Word (*binop_table[])(MB_Machine_State* ms, const MB_Bytecode_Arg* bca) = {

This is another good spot to use a typedef.

> 	binop_add,
> 	binop_sub,
> 	binop_mul,
> 	binop_div,
> 	binop_mod,
> 	binop_lshift,
> 	binop_rshift,	/* XXX signed */
> 	binop_and,
> 	binop_or,
> 	binop_xor,
> 	binop_logand,
> 	binop_logor,
> 	binop_eq,
> 	binop_ne,
> 	binop_bad,	/* array_index */
> 	binop_bad,	/* str_eq */
> 	binop_bad,	/* str_ne */
> 	binop_bad,	/* str_lt */
> 	binop_bad,	/* str_gt */
> 	binop_bad,	/* str_le */
> 	binop_bad,	/* str_ge */
> 	binop_lt,
> 	binop_gt,
> 	binop_le,
> 	binop_ge,
> 	binop_bad,	/* float_plus */
> 	binop_bad,	/* float_minus */
> 	binop_bad,	/* float_times */
> 	binop_bad,	/* float_divide */
> 	binop_bad,	/* float_eq */
> 	binop_bad,	/* float_ne */
> 	binop_bad,	/* float_lt */
> 	binop_bad,	/* float_gt */
> 	binop_bad,	/* float_le */
> 	binop_bad,	/* float_ge */
> 	binop_bad	/* body */
> };
> 
> #define SIMPLEBINOP(name, op)	\
> 	static MB_Word \
> 	binop_##name(MB_Machine_State* ms, const MB_Bytecode_Arg* bca) \
> 	{ \
> 		assert(bca->builtin_binop.arg1.id == MB_ARG_VAR); \
> 		assert(bca->builtin_binop.arg2.id == MB_ARG_VAR); \
> 		return (MB_Integer)(MB_var_get(ms, bca->builtin_binop.arg1.opt.var)) \
> 			op (MB_Integer)(MB_var_get(ms, bca->builtin_binop.arg2.opt.var)); \
> 	}

I suggest s/SIMPLEBINOP/SIMPLE_BINOP/

Macros that continue over many lines are more readable if you
line up all the line continuation backslashes, somewhere near
column 80.

Which reminds me, a lot of your source code uses lines longer than 80
columns; they should be wrapped to ensure that the source fits in 80
columns.

> static MB_Word
> binop_bad(MB_Machine_State* ms, const MB_Bytecode_Arg* bca)
> {
> 	MB_fatal("Unsupported binop\n");
> 	return 0;
> }

`binop_notdone' or `binop_invalid' would be a better name than
`binop_bad'.  It's not clear for binop_bad operators whether they are
are deliberately not supported because the bytecode generator is not
supposed to generate them, or whether they have just not been
implemented yet.

> 
> /* --------------------------------------------------------------------- */
> 
> static void
> instr_builtin_binop(MB_Machine_State* ms, const MB_Bytecode_Arg* bca)
> {
> 	MB_Byte binop = bca->builtin_binop.binop;
> 	if (binop < (sizeof(binop_table)/sizeof(binop_table[0]))) {
> 		MB_var_set(ms,
> 			bca->builtin_binop.to_var,
> 			binop_table[bca->builtin_binop.binop](ms, bca));
> 	} else {
> 		MB_fatal("Invlid binop");
> 	}

s/Invlid/Invalid/

> 	/* move to the next instruction*/
> 	instr_noop(ms, bca);
> }

I think it would be better to name that `instr_next' rather than
`instr_noop'; or to have both, with `instr_noop' calling
`instr_next', but with places like this calling `instr_next'
rather than `instr_noop'.

> static void
> instr_noop(MB_Machine_State* ms, const MB_Bytecode_Arg* bca)
> {
> 	/* increment instruction pointer */
> 	MB_ip_set(ms, MB_ip_get(ms)+1);
> }
> 
> static void
> instr_notdone(MB_Machine_State* ms, const MB_Bytecode_Arg* bca)
> {
> 	/* invalid instruction */
> 	MB_fatal("That instruction is not implemened yet\n");
> 	instr_noop(ms, bca);
> }

s/invalid/unimplemented/ ?

> /* Single step execute */
> void
> MB_step(MB_Machine_State* ms)
> {
> 	MB_Word ip = MB_ip_get(ms);
> 
> 	MB_Byte bc_id = MB_code_get_id(ms, ip);
> 	if (bc_id >= sizeof(instruction_table) / sizeof(instruction_table[0])) {
> 		instr_noop(ms, NULL);

Why is the `instr_noop' needed there?
A comment here might help.

bytecode/mb_machine.h:
> void
> MB_step_over(MB_Machine_State* ms)
> {
> 	MB_Word ip = MB_ip_get(ms);
> 	MB_Byte bc_id = MB_code_get_id(ms, ip);
> 	
> 	if (ip == MB_CODE_INVALID_ADR) {
> 		MB_util_error("Attempt to execute invalid code address\n");
> 	}
> 
> 	switch (bc_id) {
> 		case MB_BC_call: {
> 			
> 			/* If we are about to step into a predicate */
> 			/* then replace the following bytecode with */
> 			/* a MB_BC_debug_trap and run until it traps */
> 			/* then put things back to what they were */
> 			MB_Byte old_id;
> 			assert(ip+1 < MB_code_size(ms));
> 			old_id = ms->code.id[ip+1];
> 
> 			ms->code.id[ip+1] = MB_BC_debug_trap;
> 			MB_run(ms);
> 
> 			ms->code.id[ip+1] = old_id;
> 			break;
> 		}
> 		default:
> 			MB_step(ms);
> 	}
> }

In general it is probably not a good idea for you to spend *too* much
time implementing debugger functionality like step_over; it would
be better to hook the bytecode interpreter up to our existing debugger,
which already supports that kind of functionality.
(However, implementing *some* such functionality may help you to debug
the bytecode interpreter itself, of course.)

> typedef struct MB_Machine_State_Tag {
> 	MB_Word ip;			/* next instruction pointer*/
> 	
> 	/* det stack */
> 	struct {
> 		MB_Word succip;		/* sucess return address */
> 		MB_Stack stack;		/* stack data */
> 	} det;

The succip abstract machine register is used for nondet calls too,
not just for det calls, so it's not really appropriate to put it
in the `det' struct here.

> 	/* nondet stack */
> 	struct {
> 		MB_Word	curfr;		/* stack frame of current procedure */
> 		MB_Word	maxfr;		/* highest frame on nondet stack */
> 		MB_Stack stack;		/* stack data */
> 	} nondet;

Note that the `maxfr' abstract machine register (`nondet.maxfr' here) should
be the same as `nondet.stack.

> 	/* MB_Bytecode is 44 bytes long - this is obviously very inefficient
> 	** for most instructions.

I suggest adding something like "/* code */" before this, and
adding "Hence we don't store the code as just an array of MB_Bytecode".
after this.

> 	** All code accesses should go through MB_get_code so the following
> 	** is abstracted away:
> 	**
> 	** code_id is an array of bytecode types
> 	** code_index is the index into code_data for the bytecode arguments
> 	** code_data is the bytecode arguments
> 	**
> 	** This way each instruction takes 5 bytes + argument size
> 	** rather than 1 byte + size of largest possible argument
> 	*/
> 	struct {
> 		MB_Word	 count;		/* number of instructions */
> 		MB_Byte* id;		/* instruction types */
> 		MB_Stack data_index;	/* index into code data for aguments */
> 		MB_Stack data;		/* argument data stack */
> 	} code;

The comments here refer to code_id, code_index, code_date
but the code below it defines code.id, code.data_index, code.data.

s/aguments/arguments/

> 	/* high-water marked */
> 	struct {
> 		MB_Stack stack;
> 	} label;

What's the label stack for?

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list