[m-dev.] For Review: Bytecode interpreter
Levi Cameron
l.cameron2 at ugrad.unimelb.edu.au
Thu Jan 25 18:25:08 AEDT 2001
Fergus Henderson wrote:
>
> On 24-Jan-2001, Levi Cameron <l.cameron2 at ugrad.unimelb.edu.au> wrote:
> >
> > bytecode/mb_bytecode.c
> > bytecode/mb_bytecode.h
> ...
> > MB_read_cstring changed (see comments for new arguments)
>
> The log message here is somewhat misleading, since
> the change to MB_read_cstring didn't add any new arguments.
>
It changed how the returned string was to be deallocated; comments
changed to be a bit more accurate now
> ...
> > Index: mb_bytecode.c
> > @@ -540,10 +597,14 @@
> > bc_p->opt.call.pred_id =
> > pred_id;
> > bc_p->opt.call.arity = arity;
> > + bc_p->opt.call.is_func = is_func;
> > bc_p->opt.call.proc_id = proc_id;
> > +
> > + bc_p->opt.call.is_native = TRUE;
> > + bc_p->opt.call.adr = NULL;
> > return TRUE;
>
> Why do you set is_native = TRUE here?
>
> That won't always be the case, so there should at least be a comment
> explaining why you set it to TRUE here.
It is supposed to cause an error if you ever try to execute it.
Changed to is_native = FALSE; adr = MB_CODE_INVALID_ADR
which should be a bit clearer
> > @@ -845,50 +903,55 @@
> > }
> >
> > /*
> > -** MB_read_cstring MB_mallocs a string each time. The caller MB_frees
> > it.
> > -** Starts assuming string won't be more than a certain length,
> > -** reallocates if it gets too long
> > +** Returned string is allocated with string routines MB_str_xxx
> > +** It is the responsibility of the caller to free it using
> > MB_str_delete
> > +**
> > +** Will read from the file a complete string, but will only store in
> > +** memory string of maximum size of buffer below
> > static MB_Bool
> > MB_read_cstring(FILE *fp, MB_CString *str_p)
>
> Why does it only save up the the maximum size of the buffer?
> Why not handle arbitrary-length strings?
>
> The old code, which handled arbitrary length strings,
> seems much nicer than the new code.
>
The original one used a static high water marked buffer which wasn't
thread safe. I then changed it to one that reallocated the buffer from
scratch at each call, but I felt this was a bit of a hack because for
a 7 character string for example it allocated a block of memory.
Here is a new one that starts off allocating a block on the stack
and if that isn't enough resorts to a malloced block that gets
expanded as needed
/*
** Returned string is allocated with string routines MB_str_xxx
** It is the responsibility of the caller to free it using MB_str_delete
*/
static MB_Bool
MB_read_cstring(FILE *fp, MB_CString *str_p)
{
/*
** Initially tries to read string into buffer, but if this gets
** full then mallocs another buffer which is doubled in size
** whenever it gets full.
** Returned string is allocated with MB_str_dup
*/
char buffer[64];
MB_Word bufsize = sizeof(buffer);
char* str = buffer;
MB_Word i = 0;
MB_Byte c;
do {
/* get the next char */
if (!MB_read_byte(fp, &c)) {
MB_fatal("Error reading C String from file");
}
/*
** If the next char is going to overflow the buffer then
** expand the buffer
*/
if (i == bufsize) {
if (str == buffer) {
/*
** If we are still using the stack buffer,
** allocate a new buffer with malloc
*/
str = MB_malloc(bufsize*2);
memcpy(str, buffer, bufsize);
} else {
/*
** The current buffer is already malloced;
** realloc it
*/
str = MB_realloc(str, bufsize*2);
bufsize *= 2;
}
if (str == NULL) return FALSE;
}
str[i++] = c;
} while (c != 0);
if ((*str_p = MB_str_dup(str)) == NULL) {
return FALSE;
}
/* Free the string if it isn't on the local stack */
if (str != buffer) {
MB_free(str);
}
return TRUE;
} /* end MB_read_cstring() */
> > Index: mb_bytecode.h
> > ===================================================================
> > RCS file: /home/mercury1/repository/mercury/bytecode/mb_bytecode.h,v
> > retrieving revision 1.1
> > diff -u -r1.1 mb_bytecode.h
> > --- mb_bytecode.h 2001/01/24 07:42:22 1.1
> > +++ mb_bytecode.h 2001/01/24 07:44:34
> > @@ -1,49 +1,26 @@
> >
> > /*
> > -** Copyright (C) 1997 The University of Melbourne.
> > +** Copyright (C) 1997-2001 The University of Melbourne.
>
> That should be
>
> ** Copyright (C) 1997, 2000-2001 The University of Melbourne.
>
> since that file wasn't touched in 1998 or 1999.
> Likewise for everything else in this directly, I imagine.
>
Done
>
> You should use "..." rather than <...> for non-system headers.
> <...> should only be used for system headers.
>
Done
> > typedef struct MB_Op_arg_struct {
> > MB_Byte id;
> > @@ -128,7 +111,13 @@
> > MB_CString module_id;
> > MB_CString pred_id;
> > MB_Short arity;
> > + MB_Byte is_func;
> > MB_Byte proc_id;
> > +
> > + /* Whether call is a native code call */
> > + MB_Byte is_native;
>
> Why is that MB_Byte rather than `bool'?
Changed to MB_Bool which is MB_Byte (and in all the other places
is_xxx is used)-- why is MR_Bool defined as the same size as an
int? 32 bits seems somewhat wasteful when 1 will do.
>
> I suggest s/adr/addr/g
>
Done
> > } pred_const;
> > struct {
> > MB_CString module_id;
> > @@ -159,9 +148,18 @@
> > #define MB_CONSID_BASE_TYPE_INFO_CONST 6
> > #define MB_CONSID_CHAR_CONST 7
> >
> > +/*
> > +** Internal label structure. index is read from the file and translated
> > +** into adr, which is the actual code address of a label
> > +*/
> > +typedef union {
> > + MB_Short index;
> > + MB_Word* adr;
> > +} MB_Label;
>
> I notice that this is an untagged union.
> The documentation should make it clearer
> which of these two fields is valid when.a
New comment:
/*
** Internal label structure. At load time the index is read from the file
** and stored. translate_labels translates indexes into actual memory
** addresses. The module load and label translation functions are the only
** only functions that should access index, the rest of the program should
** only use addr.
*/
>
> Our convention is that put the `*' next to the name
> of the variable or type being declared.
> (I don't like that one much, but it's better to be consistent.)
>
> Likewise here. And in lots of other places.
After much typing ... Done
>
> The layout for [multiline] comments is
> ...
>
Zoltan has already told me, I've been changing them to the proper layout
as I come across them
> > +** and calculated arguments (ie not in the bytecode file) in round
> > brackets
>
> Your mailer seems to be wrapping long lines.
> You should turn that off when posting diffs.
It was wrapping at 72 (who set that?)
[space after typecasts]
Done.
[Print() etc -> PRINT() ]
Done
> > case MB_CONSID_INT_CONST:
> > - /*
> > - ** (This comment is labelled "CAST COMMENT".
> > - ** If you remove this comment, also
> > - ** remove references to it in this file.
> > - ** Search for "CAST COMMENT".)
> > - **
> > - ** XXX: The cast to `long' in the following code
> > - ** is needed to remove a warning. `int_const' has
> > - ** type `Integer', but Integer may be typedef'ed
> > - ** to `int', `long', `long long' or whatever.
> > - ** The correct solution may be to define a
> > - ** format string for Integer in conf.h.
> > - */
> > Print()
> > - "int_const %ld",
> > - (long) cons_id.opt.int_const
> > + "int_const %x",
> > + (int)cons_id.opt.int_const
>
> Why did you change the cast from int to long?
> And why did you remove the XXX comment? It still applies, I think.
>
> Why do you print int constants in hex rather than decimal?
>
> If you are going to print in hex, you should definitely
> precede it with "0x" to make it clear that the output is
> in hex. Otherwise output like "10" would be ambiguous.
>
> If you think hex output would be useful, it may be better
> to output both decimal _and_ hex, e.g. "int_const 16 (0x10)".
I misinterpreted the comment. Changed now to:
/*
** XXX: int_const has type Integer which could
** be int, long or long long. Correct solution
** is to define a format string in conf.h, but
** for now assume long is enough
*/
"int_const %ld (%lx)",
(long) cons_id.opt.int_const,
(long) cons_id.opt.int_const
(and duplicated at str_op_arg > MB_ARG_INT_CONST)
> >
> > @@ -609,12 +701,14 @@
> > break;
> > case MB_CONSID_PRED_CONST:
> > Print()
> > - "%s %s %s %d %d",
> > - "pred_const ",
> > + "%s %s %s__%s/%d proc %d (%p)",
>
> I suggest s/proc/mode/
Changed.
Also changed bytecode data structure names:
module_id -> module_name
pred_id -> pred_name
proc_id -> mode_num
(Except in some places in mb_bytecode.c when they are being
loaded, where initially they stay as the original names for
consistency between the generator and the interpreter)
>
> > case MB_CONSID_CHAR_CONST:
> > ...
> >
>
> I suggest prefixing the hex value with 0x and preceding it with
> the decimal value, i.e. "%s %d (0x%2X)".
>
Done
>
> > + if (MB_ip_special(new_ip)) {
> > + switch ((MB_Word)new_ip) {
> > + case (MB_Word)MB_CODE_DO_FAIL:
> > + instr_do_fail(ms, NULL);
> > + break;
> > + case (MB_Word)MB_CODE_DO_REDO:
> > + instr_do_redo(ms, NULL);
> > + break;
>
> Why do you need the casts to MB_Word here?
>
C doesn't allow you to switch on a pointer (as I discovered)
> > +/* Check which function we are in & save pointers to temp & var slots
> > */
> > +void
> > +MB_func_type_check(MB_Machine_State* ms)
>
> I think "function" is not the right word.
> Perhaps you mean "check what code_model the procedure are in has, ..."?
>
Comments changed:
/*
** Check which procedure we are in & set variable stack pointer appropriately
**
** If you don't call this after the ip switches to a new function
** then MB_var_get and MB_var_set will give incorrect results
**
** If det/semidet, set the machine state variable slot pointer to the det stack
** If nondet, set the machine state variable slot pointer to the nondet stack
*/
void
MB_proc_var_init(MB_Machine_State *ms)
...
/*
** Get a variable from the appropriate mercury stack
**
** It knows which stack to use because you have of course already
** called MB_proc_var_init to set the current procedure's
** variable pointer to point to variable 0.
*/
MB_var_get(...
> Another one to grep for:
>
> grep ',[^ ]' *.h *.c
>
> > +static void (*instruction_table[])(MB_Machine_State*, const
> > MB_Bytecode_Arg*) = {
>
> It would be clearer to use a typedef.
>
> > + instr_builtin_binop,
> > + instr_builtin_unop, /**** unop */
> > + instr_builtin_bintest,
> > + instr_builtin_untest, /**** unop test */
>
> `untest' is not a very good name.
> I think it would be better to rename untest as unop_test
> and then delete those two comments.
>
> > + if (bca->enter_proc.det == MB_DET_SEMIDET) {
> > + /*
> > + ** If a semidet procedure then mark our success slot as failure
> > + ** until we know otherwise.
> > + */
> > + MB_temp_set(ms, MB_SEMIDET_SUCCESS_SLOT, MB_SEMIDET_FAILURE);
>
> Rather than checking for MB_DET_SEMIDET (determinism semidet)
> you should be checking that it has a model_semi code model.
> In other words, `MB_DET_CC_NONDET' and `MB_DET_FAILURE'
> should be treated the same as MB_DET_SEMIDET.
> See compiler/code_model.m (although you'll have to write
> your own code to do it, rather than reusing the code there).
>
Done
Levi
l.cameron2 at ugrad.unimelb.edu.au
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list