[m-dev.] For Review: Bytecode interpreter

Levi Cameron l.cameron2 at ugrad.unimelb.edu.au
Thu Jan 25 18:25:08 AEDT 2001


Fergus Henderson wrote:
> 
> On 24-Jan-2001, Levi Cameron <l.cameron2 at ugrad.unimelb.edu.au> wrote:
> >
> > bytecode/mb_bytecode.c
> > bytecode/mb_bytecode.h
> ...
> >       MB_read_cstring changed (see comments for new arguments)
> 
> The log message here is somewhat misleading, since
> the change to MB_read_cstring didn't add any new arguments.
> 

It changed how the returned string was to be deallocated; comments
changed to be a bit more accurate now

> ...
> > Index: mb_bytecode.c
> > @@ -540,10 +597,14 @@
> >                               bc_p->opt.call.pred_id =
> >                                       pred_id;
> >                               bc_p->opt.call.arity = arity;
> > +                             bc_p->opt.call.is_func = is_func;
> >                               bc_p->opt.call.proc_id = proc_id;
> > +
> > +                             bc_p->opt.call.is_native = TRUE;
> > +                             bc_p->opt.call.adr = NULL;
> >                               return TRUE;
> 
> Why do you set is_native = TRUE here?
> 
> That won't always be the case, so there should at least be a comment
> explaining why you set it to TRUE here.

It is supposed to cause an error if you ever try to execute it.
Changed to is_native = FALSE; adr = MB_CODE_INVALID_ADR
which should be a bit clearer

> > @@ -845,50 +903,55 @@
> >  }
> >
> >  /*
> > -** MB_read_cstring MB_mallocs a string each time. The caller MB_frees
> > it.
> > -** Starts assuming string won't be more than a certain length,
> > -** reallocates if it gets too long
> > +** Returned string is allocated with string routines MB_str_xxx
> > +** It is the responsibility of the caller to free it using
> > MB_str_delete
> > +**
> > +** Will read from the file a complete string, but will only store in
> > +** memory string of maximum size of buffer below
> >  static MB_Bool
> >  MB_read_cstring(FILE *fp, MB_CString *str_p)
> 
> Why does it only save up the the maximum size of the buffer?
> Why not handle arbitrary-length strings?
> 
> The old code, which handled arbitrary length strings,
> seems much nicer than the new code.
> 

The original one used a static high water marked buffer which wasn't
thread safe. I then changed it to one that reallocated the buffer from
scratch at each call, but I felt this was a bit of a hack because for
a 7 character string for example it allocated a block of memory.
Here is a new one that starts off allocating a block on the stack
and if that isn't enough resorts to a malloced block that gets
expanded as needed

/*
** Returned string is allocated with string routines MB_str_xxx
** It is the responsibility of the caller to free it using MB_str_delete
*/
static MB_Bool
MB_read_cstring(FILE *fp, MB_CString *str_p)
{
	/*
	** Initially tries to read string into buffer, but if this gets
	** full then mallocs another buffer which is doubled in size
	** whenever it gets full.
	** Returned string is allocated with MB_str_dup
	*/
	char	buffer[64];
	MB_Word	bufsize = sizeof(buffer);
	char*	str = buffer;
	
	MB_Word		i = 0;
	MB_Byte		c;
	
	do {
		/* get the next char */
		if (!MB_read_byte(fp, &c)) {
			MB_fatal("Error reading C String from file");
		}

		/*
		** If the next char is going to overflow the buffer then
		** expand the buffer
		*/
		if (i == bufsize) {
			if (str == buffer) {
				/*
				** If we are still using the stack buffer,
				** allocate a new buffer with malloc
				*/
				str = MB_malloc(bufsize*2);
				memcpy(str, buffer, bufsize);
			} else {
				/*
				** The current buffer is already malloced;
				** realloc it
				*/
				str = MB_realloc(str, bufsize*2);
				bufsize *= 2;
			}

			if (str == NULL) return FALSE;
		}

		str[i++] = c;

	} while (c != 0);

	if ((*str_p = MB_str_dup(str)) == NULL) {
		return FALSE;
	}

	/* Free the string if it isn't on the local stack */
	if (str != buffer) {
		MB_free(str);
	}

	return TRUE;
} /* end MB_read_cstring() */

> > Index: mb_bytecode.h
> > ===================================================================
> > RCS file: /home/mercury1/repository/mercury/bytecode/mb_bytecode.h,v
> > retrieving revision 1.1
> > diff -u -r1.1 mb_bytecode.h
> > --- mb_bytecode.h     2001/01/24 07:42:22     1.1
> > +++ mb_bytecode.h     2001/01/24 07:44:34
> > @@ -1,49 +1,26 @@
> >
> >  /*
> > -** Copyright (C) 1997 The University of Melbourne.
> > +** Copyright (C) 1997-2001 The University of Melbourne.
> 
> That should be
> 
>         ** Copyright (C) 1997, 2000-2001 The University of Melbourne.
> 
> since that file wasn't touched in 1998 or 1999.
> Likewise for everything else in this directly, I imagine.
>

Done

> 
> You should use "..." rather than <...> for non-system headers.
> <...> should only be used for system headers.
> 
Done

> >  typedef struct MB_Op_arg_struct {
> >       MB_Byte id;
> > @@ -128,7 +111,13 @@
> >                       MB_CString      module_id;
> >                       MB_CString      pred_id;
> >                       MB_Short        arity;
> > +                     MB_Byte         is_func;
> >                       MB_Byte         proc_id;
> > +
> > +                     /* Whether call is a native code call */
> > +                     MB_Byte         is_native;
> 
> Why is that MB_Byte rather than `bool'?

Changed to MB_Bool which is MB_Byte (and in all the other places
is_xxx is used)-- why is MR_Bool defined as the same size as an
int? 32 bits seems somewhat wasteful when 1 will do.

> 
> I suggest s/adr/addr/g
>

Done

> >               } pred_const;
> >               struct {
> >                       MB_CString      module_id;
> > @@ -159,9 +148,18 @@
> >  #define      MB_CONSID_BASE_TYPE_INFO_CONST  6
> >  #define      MB_CONSID_CHAR_CONST            7
> >
> > +/*
> > +** Internal label structure. index is read from the file and translated
> > +** into adr, which is the actual code address of a label
> > +*/
> > +typedef union {
> > +     MB_Short        index;
> > +     MB_Word*        adr;
> > +} MB_Label;
> 
> I notice that this is an untagged union.
> The documentation should make it clearer
> which of these two fields is valid when.a

New comment:

/*
** Internal label structure. At load time the index is read from the file
** and stored. translate_labels translates indexes into actual memory
** addresses. The module load and label translation functions are the only
** only functions that should access index, the rest of the program should
** only use addr.
*/

>
> Our convention is that put the `*' next to the name
> of the variable or type being declared.
> (I don't like that one much, but it's better to be consistent.)
> 
> Likewise here.  And in lots of other places.

After much typing ... Done

> 
> The layout for [multiline] comments is
> ...
>
Zoltan has already told me, I've been changing them to the proper layout
as I come across them

> > +** and calculated arguments (ie not in the bytecode file) in round
> > brackets
> 
> Your mailer seems to be wrapping long lines.
> You should turn that off when posting diffs.

It was wrapping at 72 (who set that?)

[space after typecasts]

Done.

[Print() etc -> PRINT() ]

Done

> >               case MB_CONSID_INT_CONST:
> > -                     /*
> > -                     ** (This comment is labelled "CAST COMMENT".
> > -                     ** If you remove this comment, also
> > -                     ** remove references to it in this file.
> > -                     ** Search for "CAST COMMENT".)
> > -                     **
> > -                     ** XXX: The cast to `long' in the following code
> > -                     ** is needed to remove a warning. `int_const' has
> > -                     ** type `Integer', but Integer may be typedef'ed
> > -                     ** to `int', `long', `long long' or whatever.
> > -                     ** The correct solution may be to define a
> > -                     ** format string for Integer in conf.h.
> > -                     */
> >                       Print()
> > -                             "int_const %ld",
> > -                             (long) cons_id.opt.int_const
> > +                             "int_const %x",
> > +                             (int)cons_id.opt.int_const
> 
> Why did you change the cast from int to long?
> And why did you remove the XXX comment?  It still applies, I think.
> 
> Why do you print int constants in hex rather than decimal?
> 
> If you are going to print in hex, you should definitely
> precede it with "0x" to make it clear that the output is
> in hex.  Otherwise output like "10" would be ambiguous.
> 
> If you think hex output would be useful, it may be better
> to output both decimal _and_ hex, e.g. "int_const 16 (0x10)".

I misinterpreted the comment. Changed now to:

				/*
				** XXX: int_const has type Integer which could
				** be int, long or long long. Correct solution
				** is to define a format string in conf.h, but
				** for now assume long is enough
				*/
				"int_const %ld (%lx)",
				(long) cons_id.opt.int_const,
				(long) cons_id.opt.int_const
(and duplicated at str_op_arg > MB_ARG_INT_CONST)

> >
> > @@ -609,12 +701,14 @@
> >                       break;
> >               case MB_CONSID_PRED_CONST:
> >                       Print()
> > -                             "%s %s %s %d %d",
> > -                             "pred_const ",
> > +                             "%s %s %s__%s/%d proc %d (%p)",
> 
> I suggest s/proc/mode/

Changed.
Also changed bytecode data structure names:
module_id -> module_name
pred_id -> pred_name
proc_id -> mode_num

(Except in some places in mb_bytecode.c when they are being
loaded, where initially they stay as the original names for
consistency between the generator and the interpreter)

> 
> >               case MB_CONSID_CHAR_CONST:
> > ...
> >
> 
> I suggest prefixing the hex value with 0x and preceding it with
> the decimal value, i.e. "%s %d (0x%2X)".
> 

Done

> 
> > +     if (MB_ip_special(new_ip)) {
> > +             switch ((MB_Word)new_ip) {
> > +                     case (MB_Word)MB_CODE_DO_FAIL:
> > +                             instr_do_fail(ms, NULL);
> > +                             break;
> > +                     case (MB_Word)MB_CODE_DO_REDO:
> > +                             instr_do_redo(ms, NULL);
> > +                             break;
> 
> Why do you need the casts to MB_Word here?
> 

C doesn't allow you to switch on a pointer (as I discovered)


> > +/* Check which function we are in & save pointers to temp & var slots
> > */
> > +void
> > +MB_func_type_check(MB_Machine_State* ms)
> 
> I think "function" is not the right word.
> Perhaps you mean "check what code_model the procedure are in has, ..."?
> 

Comments changed:

/*
** Check which procedure we are in & set variable stack pointer appropriately
**
** If you don't call this after the ip switches to a new function
** then MB_var_get and MB_var_set will give incorrect results
**
** If det/semidet, set the machine state variable slot pointer to the det stack
** If nondet, set the machine state variable slot pointer to the nondet stack
*/
void
MB_proc_var_init(MB_Machine_State *ms)

...

/*
** Get a variable from the appropriate mercury stack
**
** It knows which stack to use because you have of course already
** called MB_proc_var_init to set the current procedure's
** variable pointer to point to variable 0.
*/
MB_var_get(...

> Another one to grep for:
> 
>         grep ',[^ ]' *.h *.c
> 
> > +static void (*instruction_table[])(MB_Machine_State*, const
> > MB_Bytecode_Arg*) = {
> 
> It would be clearer to use a typedef.
> 
> > +     instr_builtin_binop,
> > +     instr_builtin_unop,     /**** unop */
> > +     instr_builtin_bintest,
> > +     instr_builtin_untest,   /**** unop test */
> 
> `untest' is not a very good name.
> I think it would be better to rename untest as unop_test
> and then delete those two comments.
> 

> > +     if (bca->enter_proc.det == MB_DET_SEMIDET) {
> > +             /*
> > +             ** If a semidet procedure then mark our success slot as failure
> > +             ** until we know otherwise.
> > +             */
> > +             MB_temp_set(ms, MB_SEMIDET_SUCCESS_SLOT, MB_SEMIDET_FAILURE);
> 
> Rather than checking for MB_DET_SEMIDET (determinism semidet)
> you should be checking that it has a model_semi code model.
> In other words, `MB_DET_CC_NONDET' and `MB_DET_FAILURE'
> should be treated the same as MB_DET_SEMIDET.
> See compiler/code_model.m (although you'll have to write
> your own code to do it, rather than reusing the code there).
> 

Done

Levi
l.cameron2 at ugrad.unimelb.edu.au
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list