[m-rev.] for review: mercury implementation of string.m

Simon Taylor stayl at cs.mu.OZ.AU
Sat Jun 15 06:39:46 AEST 2002


On 14-Jun-2002, Peter Ross <pro at missioncriticalit.com> wrote:
> > > @@ -1116,6 +1165,20 @@
> > >  	Index = WholeString->IndexOf(SubString);
> > >  }").
> > >  
> > > +string__sub_string_search(String, SubString, Index) :-
> > > +	string__sub_string_search_2(String, SubString, 0, Index).
> > > +
> > > +:- pred sub_string_search_2(string::in, string::in,
> > > +		int::in, int::out) is semidet.
> > > +
> > > +sub_string_search_2(String, SubString, CurrentIndex, Index) :-
> > > +	( string__prefix(String, SubString) ->
> > > +		Index = CurrentIndex
> > > +	;
> > > +		string__first_char(String, _, Rest),
> > > +		sub_string_search_2(Rest, SubString, CurrentIndex + 1, Index)
> > > +	).
> > 
> > Same here.
> > 
> I haven't done this because I couldn't easily change the algorithm to
> use string__index.  In reality we should implement a proper string
> searching algorithm.
> I have added an XXX.

What's so hard about writing a brute force string matcher?
I'm not happy for this to be committed as is.

> > >  :- pragma foreign_proc("C",
> > >  	string__to_int_list(Str::in, IntList::out),
> > >  		[will_not_call_mercury, promise_pure, thread_safe], "{
> > > @@ -1727,45 +1824,6 @@
> > >  			MR_PROC_LABEL);
> > >  	}
> > >  }").
> > > -
> > > -:- pragma foreign_proc("C",
> > > -	string__to_int_list(Str::out, IntList::in),
> > > -		[will_not_call_mercury, promise_pure, thread_safe], "{
> > > -		/* mode (out, in) is det */
> > > -	MR_Word int_list_ptr;
> > > -	size_t size;
> > > -	MR_Word str_ptr;
> > > -/*
> > > -** loop to calculate list length + sizeof(MR_Word) in `size' using list in
> > > -** `int_list_ptr'
> > > -*/
> > > -	size = sizeof(MR_Word);
> > > -	int_list_ptr = IntList;
> > > -	while (! MR_list_is_empty(int_list_ptr)) {
> > > -		size++;
> > > -		int_list_ptr = MR_list_tail(int_list_ptr);
> > > -	}
> > 
> > > +string__to_int_list(String, IntList) :-
> > > +	string__to_char_list(String, CharList),
> > > +	IntList = list__map(char__to_int, CharList).
> > 
> > Why not just remove the foreign implementation altogether?
> 
> Well the foreign implementation should be faster then the Mercury
> implementation if we are using the foreign_code version of
> string__to_char_list.

Where is it used that needs the extra speed? The only place I can
see is string__hash, which is very sub-optimal anyway. The historical
reason for the foreign code verison was the lack of mode-specific
clauses.
 
> > > @@ -1853,6 +1902,13 @@
> > >  		Ch = Str->get_Chars(Index);
> > >  	}
> > >  ").
> > > +string__index(Str, Index, Char) :-
> > > +	string__first_char(Str, First, Rest),
> > > +	( Index = 0 ->
> > > +		Char = First
> > > +	;
> > > +		string__index(Rest, Index - 1, Char)
> > > +	).
> > 
> > It would be better to call `sorry' here. If string__index
> > doesn't work, string__first_char isn't likely to.
> >   
> No I don't think so.  The idea behind this change is to provide as much
> functionality in the string library as possible so as to minimize the
> amount of work required to get a working version of the library on a new
> backend.  Yes it is very inefficient, but it is one less piece of work
> that has to be done intitially.

That's fine, but you've chosen the wrong primitives to base the rest
of the operations on. Mercury strings are not lists. string__first_char
is an unnatural operation for strings, and expressing algorithms which
iterate over strings in terms of it will cause unacceptable inefficiency.

A better choice of basic operations would be string__alloc (this
would allocate memory for a string given the length), string__length,
string__unsafe_index and string__unsafe_set_char. Operations based
on these primitives should be fairly close in efficiency to foreign
code implementations (if not, we need to improve the compiler).

> > > +:- pragma promise_pure(string__length/2).
> > > +string__length(Str::in, Len::uo) :-
> > > +	string__length_2(Str, Len).
> > > +string__length(Str0::ui, Len::uo) :-
> > > +	copy(Str0, Str),
> > > +	string__length_2(Str, Len).
> > > +
> > > +:- pred string__length_2(string::in, int::uo) is det.
> > > +string__length_2(Str, Length) :-
> > > +	( string__first_char(Str, _First, Rest) ->
> > > +		string__length(Rest, Length0),
> > > +		Length = Length0 + 1
> > > +	;
> > > +		Length = 0
> > > +	).
> > 
> > It would be better to just call `sorry' here.
> > 
> No I think it is better to have a Mercury version even if it very
> inefficient.

That isn't just very inefficient, it's ridiculously inefficient.

Simon.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list