[m-dev.] proposal: tracking column numbers in text streams

Julien Fischer jfischer at opturion.com
Fri Nov 1 20:11:35 AEDT 2013


Hi Zoltan,

On Fri, 1 Nov 2013, Zoltan Somogyi wrote:

> On 01-Nov-2013, Julien Fischer <jfischer at opturion.com> wrote:
>> I propose that we change io.text_input_streams so that in addition to
>> keeping track of the line numbers, they also keep track of the column
>> number.  Amongst other things this would allow us to associate column
>> numbers with compiler error messages (after suitably modifying the term
>> parser).
>
> Recording column numbers in I/O streams is a relatively simple thing;
> recording more information for error messages is another, more complex thing.
> While associating a single linenum/columnum (L/C) pair would seem to be enough
> for a single token, what you want for a term is one L/C pair for the first
> character of the first token and another L/C pair for the last character
> of the last token. You want this because just identifying the first token
> or the top-level token of a term is not enough in the relatively frequent case
> where a term is written using an infix operator. For example, if you are
> recording principal tokens, and the error messages L/C pair tells you that
> the error is at the "+" in the term "x + y - z", is the error message
> about the subterm "x + y", or is it about the whole term? If you
> don't know the operator precedences, you won't know, and if you think you
> know the precedences, but are wrong, you will think you know what term
> the error message is about, but will also be wrong. If you record first tokens,
> you have even more ambiguity in error messages that report a problem
> at the location of the "x".
>
> Note that reporting the full extent of terms in error messages requires
> recording, or reliably recomputing, the position of the last character
> in each token, as well as the first. This is because an error message
> that points to the first character of the last token in a term, like this:
>
>   abc + def - ghi * jkl
>   ^.................^
>
> will confuse many people.
>
> Of course, if you record this extra information for terms, there will be
> significant work to do to on several fronts.

I agree.  Initially, I'm simply proposing to extend text files streams
support counting columns.  Adding column numbers to contexts in the term
parser and compiler is a whole separate (and much larger) change.

> First, we should use this information to improve syntax error messages,
> the ones you get during the term to HLDS transformation. Probably the one
> that could most use the help is the one where you wanted to write an
> if-then-else using -> ; notation and forgot the arrow or the semicolon.
>
> Second, we will want to record the extra information in the HLDS.
>
> Third, we would want to use the extra recorded information when printing
> semantic error messages. This will require thinking about how to present
> the extra information to users, and modifying error_util.m accordingly.
> (The format above is nice when the term or goal being complained about
> is all on one line, but not when it spans many lines.)
>
> Were you thinking addressing about these issues as well?

I wasn't thinking of addressing any of the compiler level stuff just
now, I have other work to do at the moment.  (One of which is
rewriting how much of the G12 system deals with error messages, so I'm
all too aware just how annoying and fiddly these sort of changes can be
...)

>> We can always add "fast" versions of the various I/O operations that do
>> not maintain the column number count, if the overhead involved is
>> excessive.  (Assuming there is in-principle support for this, I'll
>> measure the performance impact before committing anything anyway.)
>
> I think you don't want to replace any existing functionality in the library;
> you simply want to ADD new versions of existing predicates that record
> more information. Certainly, when I was designing the above scheme in my head
> many years ago, I was thinking of adding a new module, a modified duplicate
> of term_io.m, to the library.

I was referring to the I/O operations at a lower-level, specifically,
io.read_char/4 and its variants, more so than what's in term_io.m.
For the former, we would only add new versions if performance reasons
warrented.  (Since the changes for Unicode support didn't have much
overhead, I doubt we are going to need to.)

Cheers,
Julien.



More information about the developers mailing list