[m-rev.] for review: replace the item list with structured ASTs

Julien Fischer jfischer at opturion.com
Mon Jul 20 09:59:53 AEST 2015


Hi Zoltan,

On Mon, 13 Jul 2015, Zoltan Somogyi wrote:

> For review by Julien (thanks for volunteering) and anyone else
> who is interested. It passes all test cases, except a newly added one,
> for which the reason is noted in the log.
>
> The diff is too big to be posted at once, so the following message
> will contain its second part.
>
> This diff leaves a lot of further improvements as-yet unimplemented,
> in order to make this diff easier to finish, and to review.

Overall the changes is fine, and a vast improvement on the status quo.
There are some very minor issues that I have noted below.

> I would also appreciate opinions on the preferred order of making
> those further improvements.

I don't have any particular preference as to the order.

> Replace the item list with more structured ASTs.
> 
> The parts of the compiler that run before the HLDS is constructed used to use
> a raw list of items to represent source files (.m), interface files (.int0,
> .int3, .int2 and .int) and optimization files (.opt, and .trans_opt).
> These lists had structure, but this structure was implicit, not explicit,
> and its invariants were never really documented.
> 
> This diff changes that. It replaces the item list with FIVE separate types.
> 
> Three of these each represent the unprocessed content of one file:
> 
> - parse_tree_int represents the contents of one interface file;
> - parse_tree_opt represents the contents of one optimization file;
> - parse_tree_src represents the contents of one source file.
> 
> Two of these each represent the processed contents of one or more files:
> 
> - raw_compilation_unit represents the contents of one module in a source file.
>   (The source file may contain several nested modules; the compilation unit
>   represents just one.)
> - aug_compilation_unit represents the contents of one module in a source file,
>   just like raw_compilation_unit, but it is augmented with the contents of the
>   interface and optimization files of the other modules imported (directly or
>   indirectly) by the original module.
> 
> These five separate concepts all used to be represented by the same type,
> list(item), but different invariants applied to the structure of those lists.
> The most important of those invariants at least are now explicit in the types.
> I think it is entirely possible that there are other invariants I haven't
> discovered and documented (for example, .int3 files must have stricter
> invariants on what can appear in them than .int files), but discovering
> and documenting these should be MUCH easier after this change.
> 
> I have marked many further opportunities for improvements with "XXX ITEM_LIST".
> Some of these include moving code between modules, and the creation of new
> modules. However, I have left acting on those XXXs until later, in order to
> keep the the size of this diff down as much as possible, for easier reviewing.
        ^^^^^^^

Doubled-up word.

> compiler/prog_item.m:
>     Define the five new AST types described above, and utility predicates
>     that operate on them.
>
>     In the rest of this change, I tried, as much as possible, to change
>     predicates that used to take item lists as arguments to make them change
>     one of these types instead. In many cases, this required putting
>     the argument lists of those predicates into a more consistent order.
>     (Often, predicates that operated on the contents of the module
>     took the name of the module and the list of items in the module
>     not just as separate arguments, but as separate arguments that
>     weren't even next to each other.)
>
>     Define types that identify the different kinds of interface and
>     optimization files (.int, .int2 etc). These replace the string suffixes
>     we used to use to identify file types. Predicates that used to take strings
>     representing suffixes as arguments now have to specify whether they can
>     handle all these file types (source, information and optimization),

s/information/interface/

>     or just (e.g.) all interface file types.
>

...

> diff --git a/compiler/make_hlds.m b/compiler/make_hlds.m
> index fd774f0..f5cdff7 100644
> --- a/compiler/make_hlds.m
> +++ b/compiler/make_hlds.m
> @@ -36,6 +36,7 @@
>  :- import_module parse_tree.module_qual.
>  :- import_module parse_tree.prog_data.
>  :- import_module parse_tree.prog_item.
> +:- import_module parse_tree.status.
>
>  :- import_module bool.
>  :- import_module list.
> @@ -54,19 +55,21 @@
>      --->    did_not_find_invalid_inst_or_mode
>      ;       found_invalid_inst_or_mode.
> 
> -    % parse_tree_to_hlds(Globals, DumpBaseFileName, ParseTree, MQInfo, EqvMap,
> -    %   UsedModules, QualInfo, InvalidTypes, InvalidModes, HLDS, Specs):
> +    % parse_tree_to_hlds(AugCompUnit, Globals, DumpBaseFileName,
> +    %   MQInfo, TypeEqvMap, UsedModules, QualInfo, InvalidTypes, InvalidModes,
> +    %   HLDS, Specs):
>      %
> -    % Given MQInfo (returned by module_qual.m) and EqvMap and UsedModules
> -    % (both returned by equiv_type.m), converts ParseTree to HLDS.
> -    % Any errors found are returned in Specs.
> -    % Returns InvalidTypes = yes if undefined types found.
> -    % Returns InvalidModes = yes if undefined or cyclic insts or modes found.
> +    % Given MQInfo (returned by module_qual.m) and TypeEqvMap and UsedModules
> +    % (both returned by equiv_type.m), converts AugCompUnit to HLDS.
> +    % It returns messages for Any errors it finds in Specs.

s/Any/any/

> +    % Returns InvalidTypes = yes if it finds any undefined types.
> +    % Returns InvalidModes = yes if it finds any undefined or cyclic
> +    % insts or modes.
>      % QualInfo is an abstract type that is then passed back to
>      % produce_instance_method_clauses (see below).
>      %
> -:- pred parse_tree_to_hlds(globals::in, string::in, compilation_unit::in,
> -    mq_info::in, eqv_map::in, used_modules::in, make_hlds_qual_info::out,
> +:- pred parse_tree_to_hlds(aug_compilation_unit::in, globals::in, string::in,
> +    mq_info::in, type_eqv_map::in, used_modules::in, make_hlds_qual_info::out,
>      found_invalid_type::out, found_invalid_inst_or_mode::out,
>      module_info::out, list(error_spec)::out) is det.

...

> diff --git a/compiler/prog_item.m b/compiler/prog_item.m
> index b0e37d6..e67c4ec 100644
> --- a/compiler/prog_item.m
> +++ b/compiler/prog_item.m

...

> @@ -39,16 +41,164 @@
>
>  %-----------------------------------------------------------------------------%
>  %
> -% This is how programs (and parse errors) are represented.
> +% The different kinds of files the Mercury compiler deals with:
> +%
> +% - source files,
> +% - automatically generated interface files, and
> +% - automatically generated optimization files.
> +%
> +
> +:- type file_kind
> +    --->    fk_src
> +    ;       fk_int(int_file_kind)
> +    ;       fk_opt(opt_file_kind).
> +
> +:- type src_file_kind
> +    --->    sfk_src.
> +
> +:- type int_file_kind
> +    --->    ifk_int0
> +    ;       ifk_int3
> +    ;       ifk_int2
> +    ;       ifk_int.
> +
> +:- type opt_file_kind
> +    --->    ofk_opt
> +    ;       ofk_trans_opt.
> +
> +:- func file_kind_to_extension(file_kind) = string.
> +:- func int_file_kind_to_extension(int_file_kind) = string.
> +:- func opt_file_kind_to_extension(opt_file_kind) = string.
> +
> +:- pred extension_to_file_kind(string::in, file_kind::out) is semidet.
> +
> +%-----------------------------------------------------------------------------%
> +%
> +% The parse_tree_{src,int,opt} types define the ASTs we use for source files,
> +% interface files and optimization files respectively.
> +%
> +% Nested submodules may appear in source files, but not in interface files
> +% or optimization files.
> +%
> +% We use cord of items instead of lists of items where we may need to add

s/cord/cords/

> +% items to an already-existing parse tree.
> +%
> +% The contexts of module and section declarations below may be
> +% term.context_init if the actual context isn't known, but if the recorded
> +% context is not term.context_init, then it is valid.
> +

...

> diff --git a/compiler/status.m b/compiler/status.m
> index e69de29..2e403e9 100644
> --- a/compiler/status.m
> +++ b/compiler/status.m
> @@ -0,0 +1,350 @@
> +%-----------------------------------------------------------------------------%
> +% vim: ts=4 sw=4 expandtab ft=mercury
> +%-----------------------------------------------------------------------------%
> +% Copyright (C) 2015 The Mercury team.
> +% This file may only be copied under the terms of the GNU General
> +% Public License - see the file COPYING in the Mercury distribution.
> +%-----------------------------------------------------------------------------%
> +%
> +% This module defines the type that holds the status of items and
> +% of HLDS entities, such as types and predicates.
> +%
> +% XXX We probably should have different types for the status
> +% in the parse tree and the HLDS.
> +%
> +% XXX The current import_status type applies to every kind of entity in the
> +% HLDS, including types, insts, modes, user-defined functions and predicates,
> +% and compiler-generated unify, index and compare predicates, even though
> +% these have different visibility rules. (For example, types can be
> +% abstract-exported, but predicates cannot.)
> +%
> +% There is an accepted design for replacing this single status type with
> +% with a set of entity-kind-specific types, which avoids this confusion.
> +% This design is also more structured, in that it has separate fields recording
> +% the answers to separate questions, instead of flattening out all possible
> +% combinations of answers into an enum. For the details, see status_proposal
> +% in compiler/notes.

There is no such file.

...

> diff --git a/mdbcomp/sym_name.m b/mdbcomp/sym_name.m
> index 207e5c6..fcd9e79 100644
> --- a/mdbcomp/sym_name.m
> +++ b/mdbcomp/sym_name.m
> @@ -99,6 +99,13 @@
>      %
>  :- pred sym_name_get_module_name(sym_name::in, module_name::out) is semidet.
> 
> +    % sym_name_get_module_name_det(SymName) = ModName:
> +    %
> +    % Given a symbol name, return the module qualifiers(s).
> +    % Aborts if the symbol is unqualified.
> +    %
> +:- pred sym_name_get_module_name_det(sym_name::in, module_name::out) is det.
> +

The usual convention for naming such predicates is to use "det_" as a prefix,
rather than "_det" as a suffix.

Cheers,
Julien.



More information about the reviews mailing list