[m-rev.] for review: convert parse trees to strings

Wed Nov 1 16:35:06 AEDT 2023

On 2023-11-01 16:08 +11:00 AEDT, "Julien Fischer" <jfischer at opturion.com> wrote:
> 
> On Tue, 31 Oct 2023, Zoltan Somogyi wrote:
> 
>> The diff adds the typeclass constraint "pt_output(S, U)"
>> to many predicates in the compiler. To avoid losing efficiency,
>> we want to created versions of those predicates specialized
>> to their current use case (where S is an output stream,
>> and U is the I/O state), and their intended new use case
>> (where U is string.builder.state, and S is the corresponding handle).
>> > Unfortunately, adding two type_spec pragmas to every predicate
>> with that constraint has two problems. First, it is tedious, and
>> second, if any predicate is missed either now or later as new
>> predicates are added, we lose the benefit of specialization
>> in the affected part of the call tree. (This is already an issue
>> in files such as library/term_io.m.)
> 
> This kind of what we might call templatey specialisation is an issue
> beyond the compiler, see for example my mercury_json library.

Yes, I knew that. It is an issue already in e.g. library/term_io.m.

>> I therefore propose a new pragma that asks for the specialization
>> of all predicates and functions in a module that have a specified
>> typeclass constraint. The intended two examples would be
>>
>>     :- pragma type_spec_constrained_preds(pt_output(S, U),
>>         [S = io.text_output_stream, U = io.state]).
>>     :- pragma type_spec_constrained_preds(pt_output(S, U),
>>         [S = string.builder.handle, U = string.builder.state]).
> 
> No objection from me, other than perhaps the name. For that
> I suggest type_constraint_spec.

That gets across one of the ways in which this pragma differs
from pragma type_spec, that it applies to types in constraints,
but not the other, which is that it applies to ALL constrained preds.

I am indifferent about my initial proposed name, but I do think
any name we pick should get across both points. Does anyone
have any more proposals? My guess is that names including
the phase "template specialization" would for the subset of
our users who know how C++ is compiled, but I have no idea
whether it would work for the rest.

>> The internal representation of this pragma would record the name
>> of the module in which the pragma appears. (This would happen regardless
>> of whether it appears in a .m or in a .intN file.) The compiler would
>> then apply the type specialization described by the second arg
>> to all predicates and functions in that module that have the
>> type constraint in the first arg.
> 
> So it only applies to a single module?

What is your alternative?

I guess we *could* specify that if this pragma occurs in a module
that defines and exports

- a type class and
- an instance of that type class

then if it also has this pragma in its interface for that pair, then all modules
importing that module automatically get its effects. However,
we already have modules in the compiler for which we explicitly
disable some optimizations in Mercury.options, because the
optimization applies only to rarely executed code, and the cost
in extra code size is too much to pay for any speedup the
optimization may bring.

So yes, I think the scope of the pragma should be the preds/funcs
in the module in which the pragma appears. This gets the vast majority
of the benefits with minimal cost in typing, and avoids the need
for any extra mechanism to address the above issue.

My guess is that you are concerned about type specialized code
in one module calling a could-be-specialized-but-isnt predicate
in another module. We could address that concern by adding
either a warning or an informational message when that occurs.
This message could operate regardless of whether the type
specialized predicate was specialized by an old style pragma,
or the new form.

>> That typeclass constraint would be required to have the form of
>> a typeclass name applied to one or more type variables. There are two ways
>> we could go on how we determine which predicates and functions have that
>> type constraint. The typeclass name would obviously have to match,
>> but we could either
>> > - require that the names of the type variables match as well, or
>> - apply type specification regardless of whether the names of the
>>   type variables match.
>> > I plan to do whichever of these the current code for type_spec pragmas does,
>> because
>> > - consistency is important,
>> - whatever the current code does has obviously worked well enough, and
>> - it would be easier to implement :-)
> 
> For the sake of robustness, I suggest doing the second.  Users may have
> perfectly valid reasons for renaming the type variables on the constraints
> on predicates or functions.

That's a good idea. I will look into that, but if it looks too hard, I will still
use the existing code in the initial version.

>> > Basically, if you want to specify a substitution for more than one variable,
>> you have to put parentheses around their conjunction.
>> > I think putting all the variable substitutions in a list, regardless
>> of whether there is only one or not, is simpler, which is why the new
>> pragma examples above use that syntax. My preferred course of action
>> would be to
>> > - implement the new pragma with the syntax requiring all substitutions
>>   be in a list,
>> > - change the implementation of the old type_spec pragma to *allow*
>>   all substitutions being in a list, and then
>> > - recommend that people use the new syntax from now on.
> 
> Agreed.  The list syntax is also consistent with the vast majority of
> other pragmas in the language.

We have been moving to that style for a long time.

>> We *could* deprecate and eventually stop supporting the old syntax,
> 
> We should deprecate and eventually stop supporting it.  (But it doesn't
> have to be soon.)
> 
>> but I don' see much to be gained by that course of action.
> 
> - not having to maintain the code that supports to the old syntax.
> - potentially not confusing users, by having two syntaxes.

The old code we would have to maintain is tiny, and needs no real maintenance
anyway, which is why I wrote what I wrote. About user confusion: disabling
the old syntax will probably increase confusion during the changeover,
but reduce it in the long run, so you are right, we should eventually stop
supporting it. We could deprecate this syntax in the next release, with
a warning that can be disabled with --no-warn-obsolete, and then remove
support in the release after that. How does that sound?

Does anyone object to me committing this diff without this new pragma
in place? I would prefer to do so, to reduce the risk of conflicts, accepting
the temporary perf penalty.

Zoltan.