[m-rev.] for review: replace the item list with structured ASTs
Julien Fischer
jfischer at opturion.com
Mon Jul 20 09:59:53 AEST 2015
Hi Zoltan,
On Mon, 13 Jul 2015, Zoltan Somogyi wrote:
> For review by Julien (thanks for volunteering) and anyone else
> who is interested. It passes all test cases, except a newly added one,
> for which the reason is noted in the log.
>
> The diff is too big to be posted at once, so the following message
> will contain its second part.
>
> This diff leaves a lot of further improvements as-yet unimplemented,
> in order to make this diff easier to finish, and to review.
Overall the changes is fine, and a vast improvement on the status quo.
There are some very minor issues that I have noted below.
> I would also appreciate opinions on the preferred order of making
> those further improvements.
I don't have any particular preference as to the order.
> Replace the item list with more structured ASTs.
>
> The parts of the compiler that run before the HLDS is constructed used to use
> a raw list of items to represent source files (.m), interface files (.int0,
> .int3, .int2 and .int) and optimization files (.opt, and .trans_opt).
> These lists had structure, but this structure was implicit, not explicit,
> and its invariants were never really documented.
>
> This diff changes that. It replaces the item list with FIVE separate types.
>
> Three of these each represent the unprocessed content of one file:
>
> - parse_tree_int represents the contents of one interface file;
> - parse_tree_opt represents the contents of one optimization file;
> - parse_tree_src represents the contents of one source file.
>
> Two of these each represent the processed contents of one or more files:
>
> - raw_compilation_unit represents the contents of one module in a source file.
> (The source file may contain several nested modules; the compilation unit
> represents just one.)
> - aug_compilation_unit represents the contents of one module in a source file,
> just like raw_compilation_unit, but it is augmented with the contents of the
> interface and optimization files of the other modules imported (directly or
> indirectly) by the original module.
>
> These five separate concepts all used to be represented by the same type,
> list(item), but different invariants applied to the structure of those lists.
> The most important of those invariants at least are now explicit in the types.
> I think it is entirely possible that there are other invariants I haven't
> discovered and documented (for example, .int3 files must have stricter
> invariants on what can appear in them than .int files), but discovering
> and documenting these should be MUCH easier after this change.
>
> I have marked many further opportunities for improvements with "XXX ITEM_LIST".
> Some of these include moving code between modules, and the creation of new
> modules. However, I have left acting on those XXXs until later, in order to
> keep the the size of this diff down as much as possible, for easier reviewing.
^^^^^^^
Doubled-up word.
> compiler/prog_item.m:
> Define the five new AST types described above, and utility predicates
> that operate on them.
>
> In the rest of this change, I tried, as much as possible, to change
> predicates that used to take item lists as arguments to make them change
> one of these types instead. In many cases, this required putting
> the argument lists of those predicates into a more consistent order.
> (Often, predicates that operated on the contents of the module
> took the name of the module and the list of items in the module
> not just as separate arguments, but as separate arguments that
> weren't even next to each other.)
>
> Define types that identify the different kinds of interface and
> optimization files (.int, .int2 etc). These replace the string suffixes
> we used to use to identify file types. Predicates that used to take strings
> representing suffixes as arguments now have to specify whether they can
> handle all these file types (source, information and optimization),
s/information/interface/
> or just (e.g.) all interface file types.
>
...
> diff --git a/compiler/make_hlds.m b/compiler/make_hlds.m
> index fd774f0..f5cdff7 100644
> --- a/compiler/make_hlds.m
> +++ b/compiler/make_hlds.m
> @@ -36,6 +36,7 @@
> :- import_module parse_tree.module_qual.
> :- import_module parse_tree.prog_data.
> :- import_module parse_tree.prog_item.
> +:- import_module parse_tree.status.
>
> :- import_module bool.
> :- import_module list.
> @@ -54,19 +55,21 @@
> ---> did_not_find_invalid_inst_or_mode
> ; found_invalid_inst_or_mode.
>
> - % parse_tree_to_hlds(Globals, DumpBaseFileName, ParseTree, MQInfo, EqvMap,
> - % UsedModules, QualInfo, InvalidTypes, InvalidModes, HLDS, Specs):
> + % parse_tree_to_hlds(AugCompUnit, Globals, DumpBaseFileName,
> + % MQInfo, TypeEqvMap, UsedModules, QualInfo, InvalidTypes, InvalidModes,
> + % HLDS, Specs):
> %
> - % Given MQInfo (returned by module_qual.m) and EqvMap and UsedModules
> - % (both returned by equiv_type.m), converts ParseTree to HLDS.
> - % Any errors found are returned in Specs.
> - % Returns InvalidTypes = yes if undefined types found.
> - % Returns InvalidModes = yes if undefined or cyclic insts or modes found.
> + % Given MQInfo (returned by module_qual.m) and TypeEqvMap and UsedModules
> + % (both returned by equiv_type.m), converts AugCompUnit to HLDS.
> + % It returns messages for Any errors it finds in Specs.
s/Any/any/
> + % Returns InvalidTypes = yes if it finds any undefined types.
> + % Returns InvalidModes = yes if it finds any undefined or cyclic
> + % insts or modes.
> % QualInfo is an abstract type that is then passed back to
> % produce_instance_method_clauses (see below).
> %
> -:- pred parse_tree_to_hlds(globals::in, string::in, compilation_unit::in,
> - mq_info::in, eqv_map::in, used_modules::in, make_hlds_qual_info::out,
> +:- pred parse_tree_to_hlds(aug_compilation_unit::in, globals::in, string::in,
> + mq_info::in, type_eqv_map::in, used_modules::in, make_hlds_qual_info::out,
> found_invalid_type::out, found_invalid_inst_or_mode::out,
> module_info::out, list(error_spec)::out) is det.
...
> diff --git a/compiler/prog_item.m b/compiler/prog_item.m
> index b0e37d6..e67c4ec 100644
> --- a/compiler/prog_item.m
> +++ b/compiler/prog_item.m
...
> @@ -39,16 +41,164 @@
>
> %-----------------------------------------------------------------------------%
> %
> -% This is how programs (and parse errors) are represented.
> +% The different kinds of files the Mercury compiler deals with:
> +%
> +% - source files,
> +% - automatically generated interface files, and
> +% - automatically generated optimization files.
> +%
> +
> +:- type file_kind
> + ---> fk_src
> + ; fk_int(int_file_kind)
> + ; fk_opt(opt_file_kind).
> +
> +:- type src_file_kind
> + ---> sfk_src.
> +
> +:- type int_file_kind
> + ---> ifk_int0
> + ; ifk_int3
> + ; ifk_int2
> + ; ifk_int.
> +
> +:- type opt_file_kind
> + ---> ofk_opt
> + ; ofk_trans_opt.
> +
> +:- func file_kind_to_extension(file_kind) = string.
> +:- func int_file_kind_to_extension(int_file_kind) = string.
> +:- func opt_file_kind_to_extension(opt_file_kind) = string.
> +
> +:- pred extension_to_file_kind(string::in, file_kind::out) is semidet.
> +
> +%-----------------------------------------------------------------------------%
> +%
> +% The parse_tree_{src,int,opt} types define the ASTs we use for source files,
> +% interface files and optimization files respectively.
> +%
> +% Nested submodules may appear in source files, but not in interface files
> +% or optimization files.
> +%
> +% We use cord of items instead of lists of items where we may need to add
s/cord/cords/
> +% items to an already-existing parse tree.
> +%
> +% The contexts of module and section declarations below may be
> +% term.context_init if the actual context isn't known, but if the recorded
> +% context is not term.context_init, then it is valid.
> +
...
> diff --git a/compiler/status.m b/compiler/status.m
> index e69de29..2e403e9 100644
> --- a/compiler/status.m
> +++ b/compiler/status.m
> @@ -0,0 +1,350 @@
> +%-----------------------------------------------------------------------------%
> +% vim: ts=4 sw=4 expandtab ft=mercury
> +%-----------------------------------------------------------------------------%
> +% Copyright (C) 2015 The Mercury team.
> +% This file may only be copied under the terms of the GNU General
> +% Public License - see the file COPYING in the Mercury distribution.
> +%-----------------------------------------------------------------------------%
> +%
> +% This module defines the type that holds the status of items and
> +% of HLDS entities, such as types and predicates.
> +%
> +% XXX We probably should have different types for the status
> +% in the parse tree and the HLDS.
> +%
> +% XXX The current import_status type applies to every kind of entity in the
> +% HLDS, including types, insts, modes, user-defined functions and predicates,
> +% and compiler-generated unify, index and compare predicates, even though
> +% these have different visibility rules. (For example, types can be
> +% abstract-exported, but predicates cannot.)
> +%
> +% There is an accepted design for replacing this single status type with
> +% with a set of entity-kind-specific types, which avoids this confusion.
> +% This design is also more structured, in that it has separate fields recording
> +% the answers to separate questions, instead of flattening out all possible
> +% combinations of answers into an enum. For the details, see status_proposal
> +% in compiler/notes.
There is no such file.
...
> diff --git a/mdbcomp/sym_name.m b/mdbcomp/sym_name.m
> index 207e5c6..fcd9e79 100644
> --- a/mdbcomp/sym_name.m
> +++ b/mdbcomp/sym_name.m
> @@ -99,6 +99,13 @@
> %
> :- pred sym_name_get_module_name(sym_name::in, module_name::out) is semidet.
>
> + % sym_name_get_module_name_det(SymName) = ModName:
> + %
> + % Given a symbol name, return the module qualifiers(s).
> + % Aborts if the symbol is unqualified.
> + %
> +:- pred sym_name_get_module_name_det(sym_name::in, module_name::out) is det.
> +
The usual convention for naming such predicates is to use "det_" as a prefix,
rather than "_det" as a suffix.
Cheers,
Julien.
More information about the reviews
mailing list