[m-users.] Problem with XML extra library

Zoltan Somogyi zoltan.somogyi at runbox.com
Thu Jun 23 20:34:14 AEST 2022


2022-06-23 17:11 GMT+10:00 "Volker Wysk" <post at volker-wysk.de>:
> main([File|Files]) -->
>     ...
>     	    { TextResult = ok(Text) },
> 	    pstate(mkEntity(Text), mkEncoding(utf8), init),
>             ...
> -----
> 
> The two "main" predicates are written in DCG syntax. The call to pstate/5 is
> therefore (??) an abbreviation of
> 
>    pstate(mkEntity(Text), mkEncoding(utf8), init, !IO)
> 
> But the declaration of pstate/5 in parsing.m is:
> 
>    :- pred pstate(entity, encoding, globals, io, pstate(unit)).
>    :- mode pstate(in, in, in, di, puo) is det.
> 
> So the type of the output state variable is "pstate(unit)". But it is "io",
> when main/3 is called. The types don't match. 
> ...
> So why on earth does it work in tryit.m ?

That program was written very early in Mercury's history. As you see,
it was written using DCGs, since state variable notation did not exist yet.
Later, some of its aspects were updated to use more modern Mercury style,
but it was not a complete update.

Many of the operations in xml.m have a pair of pstate() arguments as
the last pair of arguments, which were intended to be used as the
arguments threaded through their callers using DCG notation.
These pstates contain both the I/O state, and the actual state
of the parser, because encoding those two logically separate things
as two separate argument pairs would collide with the fact that
DCGs can thread only ONE argument pair through code.

In many of the operations that take a pair of pstate arguments,
the two pstate arguments actually have two different types,
such as pstate(T1) and pstate(list(T2)). This works, because neither
DCG notation nor (now) state variable notation require that the
arguments in the argument pairs they thread through code
have identical types. Yes, in practice, 99.9+% of the time they
*do* have identical argument types, but they not *have to have*
identical argument types.

The reason why the code you are talking about works is because

- the pstate/5 predicate called near the start of that clause has
  <io> and <pstate(unit)> as its two DCG arguments;
- the finish/3 predicate called near the end of that clause has
  <pstate(T1)> and <io> as its two DCG arguments; and
- the code in between uses !IO to refer to variables whose type is
  pstate(T) for some T.

That last part is misleading for human readers, but the compiler
neither knows this, nor does it care. This probably happened
because the person converting the code from DCGs to state variable
notation did not notice the switch in types from io states to pstates
and then back to io states. (When the code still used DCGs,
the code was less misleading, because the variables holding the pstates
were nameless, instead of having misleading names.)

I will look into fixing this readability/style issue.

Zoltan.



More information about the users mailing list