[m-dev.] diff: Compiler support for stack_layouts
Tyson Richard DOWD
trd at students.cs.mu.oz.au
Thu Oct 30 15:47:58 AEDT 1997
Zoltan Somogyi wrote:
>
> > This diff is the compiler support for stack layouts.
> >
> > This support is not yet complete, but is functional enough to
> > be useful.
> >
> > Could someone review this please?
>
> > Index: compiler/base_type_layout.m
> > - base_type_layout__get_next_label(LayoutInfo0, NextLabel),
> > - base_type_layout__incr_next_label(LayoutInfo0, LayoutInfo1),
> > Pseudo0 = yes(const(data_addr_const(data_addr(TypeModule,
> > base_type(info, TypeName, Arity))))),
> > + Label1 = Label0 + 1,
>
> Why is arithmetic being done here? Why don't you use an access predicate?
There isn't one. The only threaded data structure is the integer
representing the label itself. It would be possible to write a single
predicate that increments label, but I'm not sure it's going to make
the code any clearer, and if I write that, other access predicate code
should be changed to use it instead of doing the increment itself.
A couple of times in this review, this issue of these "labels" comes up.
>From llds.m:
; create(tag, list(maybe(rval)), bool, int)
% create(Tag, Arguments, IsUnique, LabelNumber):
% A `create' instruction is used during code generation
% for creating a term, either on the heap or
% (if the term is constant) as a static constant.
% After code generation, only constant term create() rvals
% should be present in the LLDS, others will get transformed
% to incr_hp(..., Tag, Size) plus assignments to the fields.
%
% The boolean should be true if the term
% must be unique (e.g. if we're doing to do
% destructive update on it). This will prevent the term
% from being used for other purposes as well; unique terms
% are always created on the heap, not as constants, and
% we must not do common term elimination on them.
%
% The label number is needed for the case when
% we can construct the term at compile-time
% and just reference the label.
That's where I got the term "label number".
However, in some other modules, it's called "CellNo" or "CNum", and
there is a code_info predicate called code_info__get_next_cell_number.
I think cell number is a better name, shall we use that? Then I can
clean up the code.
>
> > + %
> > + % Information for any procedure, includes information about the
> > + % procedure itself, and any internal labels within it.
> > + %
> > +:- type entry_layout_info
>
> The name somewhat misleading here; it should probably have "proc" somewhere
> in it.
>
Ok, I'll call it proc_layout_info, because it represents a whole
procedure. (We'll reserve "entry" for possible use with associating
information with the entry label for tracing).
> > + entry_layout_info(
> > + proc_label, % the proc label
> > + int, % number of stack slots
> > + code_model, % which stack is used
> > + maybe(int), % location of succip on stack
> > + map(code_addr, internal_layout_info)
> > + % info for each internal label
>
> I think the key type should be label, not code_addr. It does not make sense
> to me to have layout information about e.g. do_redo. Am I missing something?
No, that's fine.
>
> > +:- type internal_layout_info
> > + --->
> > + internal_layout_info(
> > + code_addr, % what label is this associated with
> > + maybe(continuation_label_info)
> > + ).
>
> Ditto. Also, why is the key repeated here?
>
Diff-rot. (A lot of redesign was done after realizing there could be multiple
calls returing to the same label. Some loose ends were left behind).
I've removed the code_addr from the layout info, and everything is
now indexed on label instead of code_addr. (The code is simpler, too).
> Ditto for continuation_info.
>
> > + %
> > + % Information for a label that is a continuation.
> > + %
> > + % Different calls can assign slightly
> > + % different liveness annotations to the labels after the call.
> > + % (Two different paths of computation can leave different
> > + % information live).
> > + % We take the intersection of these annotations. Intersecting
> > + % is easy if we represent the live values and type infos as
> > + % sets.
>
> Please give an example how two different computation paths can leave different
> information alive and yet still end up at the same label. Offhand, I can't
> think of any.
If the code that follows the label is "succeed()", then it can happen.
This happens in tree234__member/3. The continuations of the recursive calls
are all redirected to the single succeed() at the end of the code.
I'm not sure what the correct solution to this problem is at the
moment...
(This actually reminds me of another point - any label that the redoip is
set to is also a continuation point).
> > +continuation_info__add_proc_info(PredProcId, ProcLabel, StackSize, CodeMode
> l,
> > + SuccipLocation, ContInfo0, ContInfo) :-
> > +
> > + map__init(InternalMap),
> > + EntryLayout = entry_layout_info(ProcLabel, StackSize, CodeModel,
> > + SuccipLocation, InternalMap),
> > + continuation_info__add_proc_layout(PredProcId, EntryLayout,
> > + ContInfo0, ContInfo).
>
> Why is the internal map empty here?
Because you can't process the instructions to find information about
the internal labels until you have finished generating code and
optimizing the code.
There should be some comments describing the two-stages of the
continuation_info process.
> > -:- pred continuation_info__set_internal_info(list(internal_layout_info),
> > - continuation_info, continuation_info).
> > + %
> > + % Initialize the internal info.
> > + %
> > +:- pred continuation_info__initialize_internal_info(
> > + continuation_info, continuation_info).
> > +:- mode continuation_info__initialize_internal_info(in, out) is det.
> > +
> > +continuation_info__initialize_internal_info(ContInfo0, ContInfo) :-
> > + ContInfo0 = continuation_info(A, _),
> > + map__init(Internals),
> > + ContInfo = continuation_info(A, Internals).
>
> Why do you need to initialize the second field independently of the first?
The first field is map that we build up after processing each procedure.
The second field is a map that is built up during processing each procedure.
It is reset at the start of each new procedure.
Do you think this should this be documented? (Since you had to ask, I
assume the answer is yes).
>
> > --- llds.m 1997/10/12 13:32:31 1.211
> > +++ llds.m 1997/10/21 01:54:18
> > @@ -425,7 +426,8 @@
> > ; float_const(float)
> > ; string_const(string)
> > ; code_addr_const(code_addr)
> > - ; data_addr_const(data_addr).
> > + ; data_addr_const(data_addr)
> > + ; label_entry(label).
>
> The argument here provides no information that is associated with the label.
> Why?
>
I'm not sure what you mean. The label_entry is to facilitate outputting
ENTRY(mercury__some_old_predicate) references. It's a type of rval
constant.
> > :- type data_addr
> > ---> data_addr(string, data_name).
> > @@ -433,8 +435,11 @@
> >
> > :- type data_name
> > ---> common(int)
> > - ; base_type(base_data, string, arity).
> > + ; base_type(base_data, string, arity)
> > % base_data, type name, type arity
> > + ; general(string).
> > + % any sort of data, string provides a name
> > + % for it
>
> I don't like this approach; we should define a specific data_name for each
> use. In fact, in a similar case, I was thinking that there are arguments
> in favor of removing the c_code LLDS instruction altogether (e.g. the fact
> that tracing is now done via c_code means that a program compiled with
> tracing cannot benefit from value numbering).
I'm willing to change it to use a specific data name.
As for your arguments for removing c_code LLDS, I'm not entirely sure
what you mean. I am in favour of improving the methods for generating
static data structures in the compiler, however. Perhaps we should
discuss the issue in a different thread?
>
> > --- llds_out.m 1997/10/12 13:32:34 1.59
> > +++ llds_out.m 1997/10/22 02:44:56
> > @@ -2665,6 +2696,10 @@
> > % we need to cast them here to avoid type errors
> > io__write_string("(const Word *) &"),
> > output_data_addr(BaseName, VarName).
> > +output_rval_const(label_entry(Label)) -->
> > + io__write_string("ENTRY("),
> > + output_label(Label),
> > + io__write_string(")").
>
> Have you considered forwarding this case to the existing code that writes
> out labels e.g. for return addresses in calls?
It's a bit inconsistent with the existing code. The existing code
assumes that ENTRY is only used with proc_labels, not with internal
labels. But the accurate GC mechanism uses ENTRY on all labels.
I'd like to improve this somewaht, but the old design of how labels
work is pretty deeply ingrained, and seems a little fragile. I wasn't
keen to modify it (== rewrite it).
> > % Data Stucture: stack_layouts
> > %
> > % For each procedure,
> > % mercury_data__stack_layout__mercury__<proc_label>
> > % containing:
> > % code address (Code *) - address of entry
> > % number of stack slots (Integer)
> > % code_model (Integer) actually, type MR_Code_Model
> > % 0 = DET, 1 = NONDET
> > % succip stack location (Integer) actually, type MR_LIVE_LVAL
>
> Is there a special value meaning no succip in stack frame? Or is any out
> of range integer supposed to mean this?
Yes, this should be documented.
> > % For each internal label in a procedure (that is a continuation point)
>
> Not all internal labels are continuation points. In addition, for tracing
> you want info about the entry and exit points, which are in general not
> continuation points.
The comment is incorrect.
>
> > % mercury_data__stack_layout__mercury__<proc_label>_i<label number>
> > % containing:
> > % code address (Code *) - address of label
> > % procedure info (Word *) - pointer to procedure stack layout
> > % number of live vars (Integer)
> > % live data locations and (Word *) - pointer to vector of
> > % types MR_LIVE_LVAL and MR_LIVE_TYPE pairs
> > % type parameters (Word *) - pointer to vector of
> > % MR_LIVE_LVAL
> > %
> > % If the number of live vars is 0, there could be two explanations. The
> > % continuation label might actually have no live data, or (more likely)
> > % it isn't a continuation label at all.
> > %
> > % If you need to know the live variables at non-continuation labels,
> > % this code will not be sufficient.
>
> But the entry and exit point info is what Erwan needs.
Yes.
This code will need to be modified to provide entry and exit point info.
> > :- module stack_layout.
> >
> > :- interface.
> >
> > :- import_module hlds_module.
> >
> > :- pred stack_layout__generate_llds(module_info, list(c_module)).
> > :- mode stack_layout__generate_llds(in, out) is det.
> >
> > :- implementation.
> >
> > :- import_module llds, globals, options, continuation_info, llds_out.
> > :- import_module base_type_layout.
> > :- import_module assoc_list, bool, string, int, list, map, std_util, require
> .
> > :- import_module set.
> >
> > :- type stack_layout_info --->
> > stack_layout_info(
> > string, % module name
> > bool, % are 2-bit tags available?
>
> If not, does this mean no tags or 3-bit tags?
>
This is junk. I was going to use it, but then changed the data
representation altogether.
> > int, % next available label
>
> I don't think this comment can be right.
cell number, not label.
> > :- pred stack_layout__construct_entry_layout(proc_label, int, code_model,
> > maybe(int), stack_layout_info, stack_layout_info).
> > :- mode stack_layout__construct_entry_layout(in, in, in, in, in, out) is det
> .
> > stack_layout__construct_entry_layout(ProcLabel, StackSlots, CodeModel,
> > SuccipLoc) -->
> > {
> > SuccipLoc = yes(Location0)
> > ->
> > Location = Location0
> > ;
> > % XXX
> > % Assume a dummy location of -1 if there is
> > % no succip on the stack. The runtime system
> > % might be able to work around this, depending
> > % upon what it is using the stack layouts for.
> > Location = -1
> > },
>
> What is the XXX for?
That this isn't properly documented, which you picked up on before.
I'll fix it.
> > {
> > CodeModel = model_det,
> > SuccipLval = stackvar(Location)
> > ;
> > CodeModel = model_semi,
> > SuccipLval = stackvar(Location)
> > ;
> > CodeModel = model_non,
> > SuccipLval = framevar(Location)
> > },
> > { stack_layout__construct_lval(SuccipLval, SuccipRval) },
>
> I suggest stack_layout__construct_lval => stack_layout__represent_lval
>
> and similarly in other pred names.
Yep, that's a good word for it.
> > :- pred stack_layout__construct_internal_layout(internal_layout_info,
> > stack_layout_info, stack_layout_info).
> > :- mode stack_layout__construct_internal_layout(in, in, out) is det.
> > stack_layout__construct_internal_layout(Internal) -->
> > { Internal = internal_layout_info(CodeAddr, ContinuationLabelInfo) },
> > {
> > ContinuationLabelInfo = yes(continuation_label_info(
> > LiveLvalSet0, _TVars))
> > ->
> > LiveLvalSet = LiveLvalSet0
> > ;
> > % We record no live values here. This might not be
> > % true, however this label is not being used as a
> > % continuation, so it shouldn't be relied upon.
> >
> > set__init(LiveLvalSet)
> > },
> >
> > % XXX Need to do something with TVars?!?
>
> I presume this is part of what you mean by "not yet complete".
Yep.
>
> > stack_layout__construct_liveval_pairs(LiveLvals, Rval) -->
> > list__map_foldl(stack_layout__construct_liveval_pair, LiveLvals,
> > RvalsList),
> > { list__condense(RvalsList, Rvals) },
> > stack_layout__get_next_label(Label),
> > stack_layout__incr_next_label,
> > { Rval = create(0, Rvals, no, Label) }.
> >
> > % Construct a pair of (lval, live_value_type) representations.
> >
> > :- pred stack_layout__construct_liveval_pair(pair(lval, live_value_type),
> > list(maybe(rval)), stack_layout_info, stack_layout_info).
> > :- mode stack_layout__construct_liveval_pair(in, out, in, out) is det.
> >
> > stack_layout__construct_liveval_pair(Lval - LiveValueType, Rvals) -->
> > { stack_layout__construct_lval(Lval, Rval0) },
> > stack_layout__construct_live_value_type(LiveValueType, Rval1),
> > { Rvals = [yes(Rval0), yes(Rval1)] }.
>
> Don't these create rvals lead to an array of pointers, one per live value,
> to be constructed for each label, instead of a 2D array?
>
No. Here's an example:
/*
* Garbage collection livevals info
* detstackvar(8) succip
* r1 var((tree234:tree234(V_1, int)), ground) 1 - detstackvar(6)
* detstackvar(6) var((mercury_builtin:type_info(V_1)), ground) 1 - detstackvar(6)
*/
const struct
mercury_data__stack_layout__mercury__bag__intersect_2_4_0_i14_struct {
Code * f1;
const Word * f2;
Integer f3;
const Word * f4;
} mercury_data__stack_layout__mercury__bag__intersect_2_4_0_i14 = {
ENTRY(mercury__bag__intersect_2_4_0_i14),
(const Word *) &mercury_data__stack_layout__mercury__bag__intersect_2_4_
0,
(Integer) 3,
(const Word *) mkword(mktag(0), (Word) (const Word *) &mercury_data_bag_
_common_4)
};
static const struct mercury_data_bag__common_4_struct {
Integer f1;
const Word * f2;
Integer f3;
const Word * f4;
Integer f5;
Integer f6;
} mercury_data_bag__common_4 = {
(Integer) 256,
(const Word *) mkword(mktag(0), (Word) (const Word *) &mercury_data_bag_
_common_1),
(Integer) 1538,
(const Word *) mkword(mktag(0), (Word) (const Word *) &mercury_data_bag_
_common_3),
(Integer) 2050,
(Integer) 0
};
256 = r1, 1538 = detstackvar(6), 2050 = detstackvar(8)
The two common blocks (1 and 3) are (pseudo)typeinfos, and 0 represents succip.
> > stack_layout__construct_live_value_type(var(Type, _Inst), Rval) -->
> > stack_layout__get_next_label(Label0),
> > { base_type_layout__construct_pseudo_type_info(Type, Rval0, Label0,
> > Label1) },
> > stack_layout__set_next_label(Label1),
> > % XXX hack - don't yet write out insts
> > { Rval1 = const(int_const(-1)) },
> > stack_layout__get_next_label(Label2),
> > stack_layout__incr_next_label,
> > { Rval = create(0, [yes(Rval0), yes(Rval1)], no, Label2) }.
>
> Why don't you invent representations for ground and free, and require
> that every inst be one of those two for the time being?
>
> I don't like the mixing of get, set and incr, and they are not label
> numbers in any case.
There should be a get & incr at the same time, but in some cases it
is necessary to pull the cell numbers out of the threaded data
structure.
>
> > % Some things in this module are encoded using a low tag.
> > % This is not done using the normal compiler mkword, but by
> > % doing the bit shifting here.
> > %
> > % This allows us to use more than the usual 2 or 3 bits, but
> > % we have to use low tags and cannot tag pointers this way.
>
> Why do you want to use more than 2 or 3 bits, and how?
>
To represent lvals.
r(1), detstackvar(7), f(3), etc.
I just use the low 8 bits to represent what sort of thing it is (r(N),
detstackvar(N), etc), and the rest of the bits to represent the number N.
I'm fixing all the other problems brought up in your review, thanks.
--
Tyson Dowd # To fix this, edit BLAH\BlahKey\Blah\Whatever
# in the registry.
trd at cs.mu.oz.au # *WARNING* Editing the registry can DESTROY
http://www.cs.mu.oz.au/~trd # your machine forever. Do not do it.
More information about the developers
mailing list