[m-rev.] for review: mercury implementation of string.m

Peter Ross pro at missioncriticalit.com
Wed Jun 19 01:23:32 AEST 2002


rafe wrote:
> Michael Day, Monday, 17 June 2002:
> >
> > Right. How about adding a general string buffer type, containing a
string
> > and an index into that string (is anything else required?) that can
be
> > used by lex and other modules that process strings in this way so
that
> > first_char can be deprecated/removed?
>
> first_char was always a bad idea.  [unsafe_]index and fold[lr] are the
> right tools for the job.
>
Agreed.

> Constructing strings is a different matter.  If you use clever
> representations, such as concatenation trees of substrings, you will
> likely also have to have user-defined equality since there will be
> multiple ways to represent each string.  This brings it's own set of
> problems.  I'm inclined to the idea that a separate data structure is
> appropriate for building strings.
>
This is true.  I came across the same problem when looking at writing a
typeclass based stream library for I/O, you really don't want to use the
string primitives for doing I/O.  You just want some abstract type that
you can get the resulting string out of when finished.  I also did a
similar thing for string__format where I tried to minimise the number of
strings that were allocated to try and improve its performance.

> I believe the reason for not having clever (immutable) string
> representations is that the C interface is easier to work with if you
> just use C strings, which couldn't be more basic (I've been bitten
once
> or twice by the fact that you can't have NULs in Mercury strings if
> you're using a C back-end.)  I'm not 100% convinced this is the right
> approach... Pete, Tyson?
>
The reference manual states that strings are represented by the typedef
MR_String, and this is equivalent to char *.  A clever string
representation would be required to enforce this at the c code boundary,
and I think it would break too much code to change this constraint.
Strings are just too fundamental a data type not to be used heavily.
However if a representation could be found that is cheap to marshall
across this boundary then I would be all for it.

Pete

--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list