[m-users.] Understanding string literals and name tokens

Julien Fischer jfischer at opturion.com
Sun Dec 14 23:09:36 AEDT 2014


Hi,

On Sat, 13 Dec 2014, Xiaofeng Yang wrote:

> I have confused by the Mercury Reference Manual (14.01.1). I found it seems
> not so up-to-date with the implementation. Now I have some questions about
> string literals and name tokens need to be helped.
> 
> Do the differences between a string literal and a quoted name is only
>     * quoted by single/double quotes
>     * '' can appears in a quoted name (but not a string literal), whereas "" can appears in a string literal (but not a quoted name)
> ?

Almost, except that:

'' _can_ appear in a string literal, it just means two single quote characters
unlike in a quoted name where it represents _one_ single quote character.

Likewise, "" in a quoted name represents two double-quote characters and in
a string literal represents one double-quote character.

> I found some usage not be mentioned in the refman, e.g.
> 
> I write "\\" but it interpreted as \\, this is not mentioned.

I'm not sure what you mean by "interpreted" there.  The value given by the
string literal "\\" is a string containing a single backslash character.
It may be that you are trying to print this string  it out using the predicate
io.write.  If so, note what the documentation for that predicate says:

         Strings and characters are always printed out in quotes, using backslash
         escapes if necessary.

If you just want to print out the value of the string literal instead of its
syntactic representation use io.write_string or io.print instead.

> I write "\" but it errors while compiling, so how the backslash works?

That literal is missing a terminating double-quote delimiter.  The escaped
double-quote does not act as one.  It should be:

     "\""

(Which is no different to the way C, C++, C#, Java and many other programming
languages handle the same thing.)

> I write "\143\" and it is interpreted as "c", this is what mentioned in
> refman 2.3. But I write "\143" is also interpreted as "c",

It's a (deliberate) bug in the Mercury implementation.  Unterminated octal
escapes like "\143" should not be allowed, as the reference manual says, but
the implementation currently does allow them because when the lexer was
originally written compatibility with NU-Prolog was required and NU-Prolog
apparently had a bug that lead it to accept unterminated octal escapes.

This should have been fixed a long time ago (since compatibility with NU-Prolog
isn't an issue these days), but I guess the issue hasn't arisen until now
because octal escapes are so rarely used.  I will fix the lexer up so that
it rejects unterminated octal escapes.

> whereas "\43" will interpreted as "#", these are what not mentioned in the refman.

Same issue as above.

> I found the max value for the rule "\x....\" is \xfffff\, and the max value
> for the rule "\....\" is \x777777\, this is what not mentioned in the refman.

The intention is that you should get an error for any character code that lies
outside the Unicode range (0x0 - 0x10ffff).  I'll add something to the reference
manual about this.

Cheers,
Julien.


More information about the users mailing list