[m-users.] Understanding string literals and name tokens
Richard A. O'Keefe
ok at cs.otago.ac.nz
Mon Dec 15 14:50:12 AEDT 2014
On 15/12/2014, at 1:09 am, Julien Fischer <jfischer at opturion.com> wrote:
> originally written compatibility with NU-Prolog was required and NU-Prolog
> apparently had a bug that lead it to accept unterminated octal escapes.
That wasn’t a bug.
(Writing “lead” when you mean “led”, _that’s_ a bug.)
Octal escapes were copied from C.
C octal escapes are the longest match of \\[0-7]{1,3} you can make.
There never has been any kind of terminator for octal escapes in C.
The result was that Prolog systems like NU Prolog, Quintus Prolog,
SICStus Prolog, and so on that picked up octal escapes picked them
up in exactly this form with NO terminator.
So it definitely was not a bug: that is precisely the way it was
designed to be.
Don’t forget that (a) NU Prolog is a lot older than ISO Prolog
and (b) compatibility with existing code or implementations
was explicitly *rejected* as a goal by the ISO Prolog committee.
The limit of 3 digits worked because these pre-ISO Prolog systems
used 8-bit character sets.
To this day, in ISO standard C, an *octal* escape sequence may
contain only 1 to 3 octal digits, whereas a hexadecimal sequence
may contain any number, and neither has a terminating backslash.
The ISO Prolog standard made a backwards-incompatible change
in section 6.4.2.1, requiring octal and hexadecimal escape sequences
to end with a backslash. The reason for this was to support wider
character sets.
However, this incompatibility with C, C++, Java (3-digit octals
according to JLS8 3.10.6, no \x, \u is 4 hexits and \U is 8),
C# (\0 is only octal escape, \x[0-a-fA-F]{1,4}) according to
the C$ 5.0 spec, and ECMAscript (where \0 is the only allowed octal escape)
NONE of which has a backslash terminator for these escapes
creates a continuing problem for programmers with much exposure
to those languages.
(While ML uses decimal instead of octal, “\1234” has two characters
in Standard ML, not 1.)
Some Prolog systems have taken the “high road” of strict ISO Prolog
compatibility. It is certainly possible to produce a clear error
message if there is no terminating backslash. Some have taken the
“pragmatic” road of backwards compatibility, which has also turned
out to be “cultural” compatibility with almost every other language
that *has* character escapes.
If compatibility with ISO Prolog is a priority for Mercury,
take the high road.
If not putting stumbling blocks in the way of programmers is
a priority for Mercury (as it wasn’t for the ISO Prolog committee),
take the pragmatic road.
If the high road is taken, the error message should be very clear
indeed.
More information about the users
mailing list