[m-dev.] proposal: tracking column numbers in text streams

Zoltan Somogyi zs at unimelb.edu.au
Fri Nov 1 17:51:17 AEDT 2013


On 01-Nov-2013, Julien Fischer <jfischer at opturion.com> wrote:
> I propose that we change io.text_input_streams so that in addition to
> keeping track of the line numbers, they also keep track of the column
> number.  Amongst other things this would allow us to associate column
> numbers with compiler error messages (after suitably modifying the term
> parser).

Recording column numbers in I/O streams is a relatively simple thing;
recording more information for error messages is another, more complex thing.
While associating a single linenum/columnum (L/C) pair would seem to be enough
for a single token, what you want for a term is one L/C pair for the first
character of the first token and another L/C pair for the last character
of the last token. You want this because just identifying the first token
or the top-level token of a term is not enough in the relatively frequent case
where a term is written using an infix operator. For example, if you are
recording principal tokens, and the error messages L/C pair tells you that
the error is at the "+" in the term "x + y - z", is the error message
about the subterm "x + y", or is it about the whole term? If you
don't know the operator precedences, you won't know, and if you think you
know the precedences, but are wrong, you will think you know what term
the error message is about, but will also be wrong. If you record first tokens,
you have even more ambiguity in error messages that report a problem
at the location of the "x".

Note that reporting the full extent of terms in error messages requires
recording, or reliably recomputing, the position of the last character
in each token, as well as the first. This is because an error message
that points to the first character of the last token in a term, like this:

   abc + def - ghi * jkl
   ^.................^

will confuse many people.

Of course, if you record this extra information for terms, there will be
significant work to do to on several fronts.

First, we should use this information to improve syntax error messages,
the ones you get during the term to HLDS transformation. Probably the one
that could most use the help is the one where you wanted to write an
if-then-else using -> ; notation and forgot the arrow or the semicolon.

Second, we will want to record the extra information in the HLDS.

Third, we would want to use the extra recorded information when printing
semantic error messages. This will require thinking about how to present
the extra information to users, and modifying error_util.m accordingly.
(The format above is nice when the term or goal being complained about
is all on one line, but not when it spans many lines.)

Were you thinking addressing about these issues as well?

> We can always add "fast" versions of the various I/O operations that do
> not maintain the column number count, if the overhead involved is
> excessive.  (Assuming there is in-principle support for this, I'll
> measure the performance impact before committing anything anyway.)

I think you don't want to replace any existing functionality in the library;
you simply want to ADD new versions of existing predicates that record
more information. Certainly, when I was designing the above scheme in my head
many years ago, I was thinking of adding a new module, a modified duplicate
of term_io.m, to the library.

Zoltan.



More information about the developers mailing list