[m-dev.] discussion about the implementation of compact type representations

Julien Fischer jfischer at opturion.com
Tue Oct 31 11:27:16 AEDT 2017


Hi Zoltan,

On Mon, 30 Oct 2017, Zoltan Somogyi wrote:

> Attached is a set of proposals that I would like your opinions on.
> Which parts do you agree with and which parts do you disagree with?
> Do you have a better idea for any component?

...

> principle 1:
>     If a module m1 defines and abtract-exports a type t1, whose
>     representation, though semantically hidden, could affect decisions
>     about the representations of other types (t2, t3 etc) defined
>     *either* in m1 or in other modules, then a description of its
>     representation SHOULD be included in module m1's interface files.
>     (I am not sure yet about *which* interface files exactly.)
>
>     The abstract-exported type definitions affected this way are
>
>     - type constructors whose definitions show that they are dummy
>       types (if their arity is nonzero, this means all their instances
>       are also dummy types)
>
>     - type constructors whose definitions show that they are notag types
>
>     - type constructors whose definitions show that they fit in a given
>       number of bits that is less than a word in size
>
>     - type constructors that are defined to be equivalent to another type t4
>       that either
>
>       - *are* in one of these four categories (if t4 is defined in m1), or
>       - *may be* in one of these four categories (if t4 is defined outside m1).
>
>       If t4 is defined outside m1, then:
>
>       - when invoked to generate target language code, the compiler *must* know
>         whether t4 is in one of these four categories;
>
>       - when invoked to generate .int3 files, the compiler *will not* know
>         whether t4 is in one of these four categories;
>
>       - when invoked to generate .int2 and .int files, the compiler will have
>         read the .int3 file of the module that defines t4, and therefore
>         it *will* know whether t4 is in one of these four categories,
>         *unless* t4 is itself defined to be equivalent to yet another type
>         in yet another module.

I think foreign_types also need to be included in this list.  (IIRC, the
pragmas need to be visible to the importing modules, since the pragams
contain settings that affect representation.)

>     I don't know whether this situation (an abstract equivalence to
>     an abstract equivalence type) occurs in practice; I can't remember
>     any examples.
>
>     I am not yet sure how we should handle this last case. One possibility
>     would be to always record equivalence types in .int3 files even if
>     the equivalence is not exported, and have the compiler follow
>     such chains of equivalences to the end, dynamically, when generating
>     .int/.int2 files and target code files. This would require principle 2.
>

...

> principle 3:
>     The compiler should be able put into automatically generated interface
>     files code that is NOT valid code in Mercury source files.

Fine with me -- what goes into the interface files is purely an implementation
detail anyway.  (In fact, hopefully any new interface file syntax can be
designed to be more efficiently processed by the compiler.)

>     We could call
>     this extended language "interface Mercury". The constructs describing
>     the representations of abstract exported types should be in
>     interface Mercury but not in actual Mercury; their use in Mercury
>     source files should be an error.
>
>     Interface files will now be explicitly designed to contain both
>
>     - information that is supposed to be visible to semantic analysis, and
>     - information that is supposed to be invisible to semantic analysis,
>       but visible to code generation.
>
>     Interface files have long contained the second kind of information
>     as well as the first. I think we started by exporting the fact that
>     an abstract-exported type was defined to be equivalent to a float
>     (since floats did not fit into the then-standard 32 bit word size).
>     What will be new is that we should now be able to clearly DISTINGUISH
>     the two kinds of information.
>
>     I propose that we apply this principle not just to the new information
>     we want to put into interface files, but also to the information we
>     *already* put into interface files that is not supposed to visible
>     to semantic analysis in importing modules.
> 
> principle 4a:
>     When processing interface files, the code in the make_hlds module
>     should NOT add information of the second kind to the HLDS in the usual way.
>     Instead, it should squirrel that information away, so that it can be added
>     to the HLDS later, after semantic analysis.
>
>     Actually, the squirrel hideaway *can* be in the HLDS itself, as long as
>     it is in a new field of the HLDS that no other code pays attention to.
>
>     Ironically, the semantic analysis passes of the compiler don't care
>     about the sizes of type representations, so delaying adding such
>     information to the HLDS should have no effect. However, if a type t5
>     is abstract-exported from a module m5 but m5's interface files record
>     the fact that inside m5 it is defined to be equivalent to a float,
>     then adding that equivalence to the HLDS only after semantic analysis,
>     i.e. after compiler stage 65, will allow us to reject illegal Mercury code
>     that we have accepted up to now.
> 
> principle 4b:
>     Alternatively, make_hlds could add both kinds of information to
>     (the usual parts of) the HLDS, as long as
>
>     - it adds to the second kind of information flags that say "this
>       information should not be visible to semantic analysis", and
>     - the parts of the compiler that look at that information all obey
>       such flags.
>
>     In theory, this should *also* allow us to ensure that we accept
>     only Mercury code that is valid even if don't know the extra
>     implementation-level information in interface files. It is somewhat
>     more flexible, as it allows us to make finer distinctions (e.g. knowledge
>     of a type's function symbols may be set to be invisible to the type checker
>     but visible to the mode checker when checking bound() insts). However,
>     it requires a lot more programming effort, since it affects all lookups
>     of the affected information.
> 
> I think we should apply principle 3 (in either variant) to .opt and .trans_opt
> files as well.

I propose we get rid of .trans_opt files now.  None of the actual optimizations
use them.   Some program analyses do, but (a) the results of those program
analyses do not significantly affect optimization _in practice_ and
(b) mmc --make is the preferred user level build system and it doesn't support
them anyway.

> -----------------------------------------
> 
> An implementation of compact term representation along the lines above
> will need answers to two questions:
> 
> - what exactly is a dummy type, and
> - what exactly is a notag type,
> 
> beyond the fact that they both have a single function symbol of
> arity 0 and arity 1 respectively.
> 
> - Can they have user defined comparison and unification predicates?
> 
> - Can there be a reserve_tag pragma for them?
> 
> - Can there be a foreign_type pragma for them?
> 
> - Can the argument of a notag type have an existential type?
> 
> - For each of the above questions, if the answer is "no, there can't be",
>   should it be an error if "there is"? For example, if a type with
>   a single function symbol of arity 1 has a user defined comparison
>   predicate, is that an error, or is that ok, with the type simply
>   not being a notag type?
> 
> I propose that for types with a single function symbol of arity 0,
> 
> - Specifying a user-defined unification or comparison predicate
>   should be an error.

Ok.

> - Specifying a reserve tag pragma for them should be an error.

That's fine.

> - Specifying a foreign_type tag pragma for them should be an error.

I disagree with this.  (And I don't see a problem with allowing it.)
The usual use case I have for this is where there is a binding
to some foreign language library that has to be compile, but not
work in the other target languages.  For example.

     :- interface.
     :- type java_thing.

     :- implementation.

     % Defn. for Java grade.
     :- pragma foreign_type("Java", java_thing, "java.lang.Object").

     % Defn. for non-Java grades.
     :- type java_thing ---> java_thing.

     % Predicate defns for Java are foreign procs ...
     % Predicate defns for non-Java call error/1 ...

Yes, obviously the non-Java definition for java_thing could be (a non-dummy)
something else, but defining it as above is nice precisely because it conveys
no information.

> I propose that for types with a single function symbol of arity 1,
> 
> - Specifying a user-defined unification or comparison predicate
>   should not be an error, but should make the type NOT a notag type.

Agreed.

> - Specifying a reserve tag pragma for them should be an error.

Agreed.

> 
> - Specifying a foreign_type tag pragma for them should not be an error.
>   In grades where the foreign type pragma is applicable, that will be the type;
>   in grades where no such foreign type pragma is applicable, the type
>   should be a notag type, unless one of the other considerations prevents that.
> 
> - If the argument of the function symbol has an existential type,
>   that should not be an error, but should make the type NOT a notag type.

Agreed.  (From a representation point of view I don't think it can work in any
other way.)

> -----------------------------------------
> 
> An implementation detail of principle 3 is: how should we choose the extensions
> of "interface Mercury" over standard Mercury that record information about
> type representations?
> 
> At the moment, if an abstract exported type is an enum, we record that fact
> in interface files using syntax such as
> 
> :- type t6 where type_is_abstract_enum(N).
> 
> where N is the number of bits required to represent the enum. In this case,
> the extension is a kind of where clause that we wouldn't want to permit
> in user-written Mercury code.
> 
> I would prefer a syntax along the lines of
> 
> :- type_representation(t6, abstract_enum(N)).
> :- type_representation(t7, dummy_type).
> :- type_representation(t8, notag_type).
> :- type_representation(t9, equivalent_to(t10)).
> 
> Distinguishing between interface Mercury and standard Mercury at the top level
> function symbol of the declaration seems to me to be a cleaner separation.
> 
> While we currently represent "where type_is_abstract_enum(N)" declarations
> using the usual kind of type_defn items, it would be natural to represent
> type_representation declarations using a different kind of item, and
> principle 4a could be implemented as a list of this new kind of item.
> 
> The type_representation syntax should also be slightly easier to parse;
> since it does not use any operators, we won't have to worry about their
> precendences.

I'm fine with that proposal in principle.  (Since interface files are supposed
to be grade independent it may need some modification to work properly with
foreign types.)

Julien.


More information about the developers mailing list