[mercury-users] string__first_char

Fergus Henderson fjh at cs.mu.OZ.AU
Mon Oct 16 04:57:49 AEDT 2000


On 15-Oct-2000, Michael Day <mcda at students.cs.mu.oz.au> wrote:
> 
> Why must Mercury strings be word aligned:

In general they don't have to be --
that requirement is just a property of the current implementation.
It might not hold for the new .NET back-end, for example.

In fact even for the current implementation I think it is just
a property that the implementation tries hard to ensure occurs;
AFAIK there is no code in the current implementation which actually
requires that property.

> a) Tagging/data representation issue?

That was the original motivation.  But the current implementation
doesn't actually make use of it.

The idea was that for data types like

	:- type foo ---> f(string) ; g(string) ; h(int) ; c.

the compiler could put tags directly on the strings, rather than
boxing them.  But the current implementation still boxes them.
(This has been discussed before on this list -- check the list archives.)

> b) Garbage collection?

We generally run the Boehm et al conservative garbage collector in a
mode in which it only follows pointers to bytes within the first word
of an object; pointers into the middle of an object are ignored by the
GC.  So although the GC doesn't require strings to be word aligned, it
doesn't allow arbitrary pointers into the middle of strings, and so 

> c) risc architecture restriction?

That's not really an issue given the representation of Mercury strings
as C strings in the current implementation.

For the .NET back-end, we plan to represent Mercury strings using the
System.String class.  Hence Strings will, like all object references,
normally be word-aligned.  On x86 it may not be strictly necessary but
will certainly be important for good performance.  On RISC
architectures it might be a requirement.

> If it's a) or b), will the high level code/data grades permit new freedom
> here?

Nope.

> Otherwise it seems that working with strings will be uncomfortably
> slow without dropping down to C.

That depends on how you code it.  If you process strings using
string__first_char, then yes, things will probably be inefficient.
But if you make use of string__foldl and enable intermodule optimization,
you should get C-like performance.

For code which makes significant use of substrings, you can use a
representation like

	:- type substring ---> substring(s::string, start::int, end::int).

That will avoid any unnecessary string copying.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3        |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list