[m-dev.] discussion about the implementation of compact type representations

Peter Wang novalazy at gmail.com
Mon Oct 30 18:16:34 AEDT 2017


On Mon, 30 Oct 2017 09:25:27 +1100 (AEDT), "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> Attached is a set of proposals that I would like your opinions on.
> Which parts do you agree with and which parts do you disagree with?
> Do you have a better idea for any component?

> Some principles for the implementation of compact type representations
> 
> At the YesLogic meeting on friday 2017-10-27 we discussed several ways
> in which the Mercury compiler could compact the representation of
> several kinds of types.  I mentioned that the key challenge in
> the implementation of such compact representations was ensuring
> that all modules agree on the compact representation of e.g. a type t2,
> even when they differ in whether they see the definition of a type, e.g. t1,
> that is a component of t2. If t1 is an argument in a term of type t2,
> then all code accessing that term needs to know where the storage of
> the *next* argument after the one of type t1 starts, which requires knowing
> how big the representation of t1 is.
> 
> Here are some principles that I think can help solve that problem.
> They are not yet fully worked out.
> 
> principle 1:
>     If a module m1 defines and abtract-exports a type t1, whose
>     representation, though semantically hidden, could affect decisions
>     about the representations of other types (t2, t3 etc) defined
>     *either* in m1 or in other modules, then a description of its
>     representation SHOULD be included in module m1's interface files.
>     (I am not sure yet about *which* interface files exactly.)
> 
>     The abstract-exported type definitions affected this way are
> 
>     - type constructors whose definitions show that they are dummy
>       types (if their arity is nonzero, this means all their instances
>       are also dummy types)
> 
>     - type constructors whose definitions show that they are notag types
> 
>     - type constructors whose definitions show that they fit in a given
>       number of bits that is less than a word in size
> 
>     - type constructors that are defined to be equivalent to another type t4
>       that either
>       
>       - *are* in one of these four categories (if t4 is defined in m1), or
>       - *may be* in one of these four categories (if t4 is defined outside m1).

I think we require one further condition for types defined to be
equivalent to an atomic type with a 2-word representation:
currently only float but may soon include int64 and uint64.

> -----------------------------------------
> 
> An implementation of compact term representation along the lines above
> will need answers to two questions:
> 
> - what exactly is a dummy type, and
> - what exactly is a notag type,
> 
> beyond the fact that they both have a single function symbol of
> arity 0 and arity 1 respectively.
> 
> - Can they have user defined comparison and unification predicates?
> 
> - Can there be a reserve_tag pragma for them?
> 
> - Can there be a foreign_type pragma for them?
> 
> - Can the argument of a notag type have an existential type?
> 
> - For each of the above questions, if the answer is "no, there can't be",
>   should it be an error if "there is"? For example, if a type with
>   a single function symbol of arity 1 has a user defined comparison
>   predicate, is that an error, or is that ok, with the type simply
>   not being a notag type?
> 
> I propose that for types with a single function symbol of arity 0,
> 
> - Specifying a user-defined unification or comparison predicate
>   should be an error.

Sounds sensible.

> 
> - Specifying a reserve tag pragma for them should be an error.

Never used reserved tags myself.

> 
> - Specifying a foreign_type tag pragma for them should be an error.

A small number of types in the standard library will need to be
trivially changed (io.file_id, dir.stream, etc.).
User code should be less likely to be broken by this change.

> 
> I propose that for types with a single function symbol of arity 1,
> 
> - Specifying a user-defined unification or comparison predicate
>   should not be an error, but should make the type NOT a notag type.
> - Specifying a reserve tag pragma for them should be an error.
> 
> - Specifying a foreign_type tag pragma for them should not be an error.
>   In grades where the foreign type pragma is applicable, that will be the type;
>   in grades where no such foreign type pragma is applicable, the type
>   should be a notag type, unless one of the other considerations prevents that.

> - If the argument of the function symbol has an existential type,
>   that should not be an error, but should make the type NOT a notag type.

All seems fine.

> -----------------------------------------
> 
> An implementation detail of principle 3 is: how should we choose the extensions
> of "interface Mercury" over standard Mercury that record information about
> type representations?
> 
> At the moment, if an abstract exported type is an enum, we record that fact
> in interface files using syntax such as
> 
> :- type t6 where type_is_abstract_enum(N).
> 
> where N is the number of bits required to represent the enum. In this case,
> the extension is a kind of where clause that we wouldn't want to permit
> in user-written Mercury code.
> 
> I would prefer a syntax along the lines of
> 
> :- type_representation(t6, abstract_enum(N)).
> :- type_representation(t7, dummy_type).
> :- type_representation(t8, notag_type).
> :- type_representation(t9, equivalent_to(t10)).
> 
> Distinguishing between interface Mercury and standard Mercury at the top level
> function symbol of the declaration seems to me to be a cleaner separation.
> 
> While we currently represent "where type_is_abstract_enum(N)" declarations
> using the usual kind of type_defn items, it would be natural to represent
> type_representation declarations using a different kind of item, and
> principle 4a could be implemented as a list of this new kind of item.
> 
> The type_representation syntax should also be slightly easier to parse;
> since it does not use any operators, we won't have to worry about their
> precendences.

That's fine. Is the lack of 'pragma' to distinguish it from code
that can be written by the user?

We also have 'where type_is_abstract_noncanonical' and 'where
direct_arg' that may be replaced by this new mechanism.

Peter


More information about the developers mailing list