[mercury-users] Destructive list operations

Ralph Becket rafe at csse.unimelb.edu.au
Tue Jan 6 17:00:45 AEDT 2009
Previous message: [mercury-users] Destructive list operations
Next message: [mercury-users] Destructive list operations
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Michael Day, Tuesday,  6 January 2009:
> Hi Ralph,
> 
> >Yes, you will inevitably have this issue.  If you can abstract away
> >from the list type, that would be good.  Destructively updating lists
> >is likely to lead to some very hard to find bugs, IMHO.
> 
> Probably, yes; most likely bugs that will bite us several years later in 
> unrelated code that we'll have trouble tracking down. Nasty.
> 
> With that in mind, I've tried writing a pure Mercury implementation of 
> map_replace that is optimised for the situation in which no changes are 
> made to the list, in which case it will be return as is with no copying 
> or allocations. Here is the initial naive implementation:
> 
> :- pred map_replace_naive(pred(T, maybe(list(T))), list(T), list(T)).
> :- mode map_replace_naive(in(pred(in, out) is det), in, out) is det.
> 
> map_replace_naive(_Pred, [], []).
> map_replace_naive(Pred, [X|Xs0], Ys) :-
>     Pred(X, Res),
>     (
>         Res = no,
>         map_replace_naive(Pred, Xs0, Ys0),
>         Ys = [X|Ys0]
>     ;
>         Res = yes(Xs),
>         map_replace_naive(Pred, Xs0, Ys0),
>         Ys = Xs ++ Ys0
>     ).
> 
> Clear specification, but it rebuilds the list pointlessly if no changes 
> are made. The smarter implementation builds up a list of changes, then 
> applies them to the original list in a separate step:
> 
> :- pred map_replace_smart(pred(T, maybe(list(T))), list(T), list(T)).
> :- mode map_replace_smart(in(pred(in, out) is det), in, out) is det.
> 
> map_replace_smart(Pred, Xs, Ys) :-
>     find_replacements(Pred, 0, Xs, [], Rs),
>     apply_replacements(0, Rs, Xs, Ys).
> 
> :- pred find_replacements(pred(T, maybe(list(T))), int, list(T),
>     assoc_list(int, list(T)), assoc_list(int, list(T))).
> :- mode find_replacements(in(pred(in, out) is det), in, in,
>     in, out) is det.
> 
> find_replacements(_Pred, _Index, [], Rs0, Rs) :-
>     Rs = reverse(Rs0).
> 
> find_replacements(Pred, Index, [X|Xs], Rs0, Rs) :-
>     Pred(X, Res),
>     (
>         Res = no,
>         Rs1 = Rs0
>     ;
>         Res = yes(Ys),
>         Rs1 = [Index - Ys|Rs0]
>     ),
>     find_replacements(Pred, Index + 1, Xs, Rs1, Rs).
> 
> :- pred apply_replacements(int, assoc_list(int, list(T)), list(T), list(T)).
> :- mode apply_replacements(in, in, in, out) is det.
> 
> apply_replacements(_Index, [], Xs, Xs).
> apply_replacements(Index, [R|Rs], Xs, Ys) :-
>     (
>         % this should never happen
>         Xs = [],
>         Ys = []
>     ;
>         Xs = [X|Xs0],
>         ( if R = Index-NewXs then
>             apply_replacements(Index+1, Rs, Xs0, Ys0),
>             Ys = NewXs ++ Ys0
>         else
>             apply_replacements(Index+1, [R|Rs], Xs0, Ys0),
>             Ys = [X|Ys0]
>         )
>     ).
> 
> As you can see it's much longer and less clear, with index manipulation 
> which I particularly don't like. However, in theory it should be faster, 
> if the list is rarely changed, as no copying or allocations will be 
> performed in this case.

Once you go down this path, an array will be much faster than a list.

> The worst case is if every element in the list changes, in which case it 
> will build up a list of replacements and then apply them one by one to 
> rebuild the list again. Perhaps an even smarter solution would be to 
> jump out of find_replacements if Pred ever returns yes, and go back to 
> the beginning and call map_replace_naive? That would waste time though 
> by calling Pred again on the earlier items, for which we know it will 
> return no. So probably not.
> 
> Any better ideas for how to write this purely in Mercury? Note that I'm 
> assuming the replacement list returned by the predicate is fairly short, 
> so that the append in apply_replacements is unlikely to be a problem.

I'm making the following assumptions:
- you want efficient left-to-right traversal;
- you want efficient substitution of subranges;
- you're not too worried about the efficiency of random access (or you
  wouldn't be starting with lists).

My first cut would be something like this:

	% A seqence is a list of subsequences.
	%
:- type seq(T) == list(subseq(T)).

	% A subsequence is a subrange of an array, specified as an
	% offset into that array and the length of the subrange.
	%
:- type subseq(T) --->
	subseq(
		offset	:: int,
		length	:: int,
		items	:: array(T)
	).

	% Create a new sequence from an array.
	%
:- func new_seq(array(T)) = seq(T).

new_seq(A) = [subseq(0, array.size(A), A)].

	% The empty sequence.
	%
:- func empty_seq = seq(T).

empty_seq = [].

	% Fold over a sequence from left to right, supplying the current
	% index and item at each iteration.
	%
:- func seq_foldl(func(T, int, U) = U, seq(T), U) = U.

seq_foldl(F, Seq, Acc) = seq_foldl_2(F, Seq, 0, Acc).


:- func seq_foldl_2(func(T, int, U) = U, seq(T), int, U) = U.

seq_foldl_2(_, [], _, Acc) = Acc.

seq_foldl_2(F, [subseq(Off, Len, A) | Subseqs], ISeq, Acc) =
	seq_foldl_3(F, Off, Len, A, 0, ISeq, Subseqs, Acc).


:- func seq_foldl_3(func(T, U) = U, int, int, int, seq(T), U) = U.

seq_foldl_3(F, Off, Len, A, IA, ISeq, Subseqs, Acc) =
	( if I >= Len then
		seq_foldl_2(F, Subseqs, ISeq, Acc)
	  else
	  	seq_foldl_2(F, Off, Len, A, IA + 1, ISeq + 1, Subseqs,
			F(A ^ elem(Off + IA), ISeq, Acc))
	).


So now we can do linear time traversals of a seq(T) using seq_foldl.


To perform updates we do the following:


	% Collect a list of {I, NewSeq} substitions where I is the index
	% in the original sequence of the item to be replaced with
	% NewSeq.
	%
:- func update_seq(
		pred(T, seq(T))::in(pred(in, out) is semidet),
		seq(T)::in
	) = (seq(T)::out) is det.

update_seq(Replace, Seq0) = Seq :-
	seq_foldl(collect_rev_substs(Replace), Seq0, [], RevSubsts),
	Substs = list.reverse(RevSubsts),
	Seq = apply_substs(Substs, Seq0).


:- func collect_rev_substs(
		pred(T, seq(T))::in(pred(in, out) is semidet),
		int::in,
		T::in,
		list({int, seq(T)})::in
	) = (list({int, seq(T)})::out) is det.

collect_rev_substs(Replace, I, X, RevSubsts) =
	( if Replace(X, NewSeq) then
		[{I, NewSeq} | RevSubsts]
	  else
	  	RevSubsts
	).


apply_substs is left as an exercise for the student :-)  You'll want it
to
- split subranges in the original sequence at the right places according
  to the I arguments in the {I, NewSeq} substitutions,
- insert the NewSeqs in the right place,
- merge contiguous short subseqs together to avoid too much
  fragmentation.


Now, if you make no changes to a sequence, then you don't need to create
any new structures.  If you do make changes, you do an amount of work
comparable to the original traversal that identified the changes.  Using
arrays rather than lists will also save you a fair bit of memory.

If you want decent random access, you can replace the list of subseqs
with a balanced tree of some kind.


Hope this helps,
-- Ralph
--------------------------------------------------------------------------
mercury-users mailing list
Post messages to:       mercury-users at csse.unimelb.edu.au
Administrative Queries: owner-mercury-users at csse.unimelb.edu.au
Subscriptions:          mercury-users-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
Previous message: [mercury-users] Destructive list operations
Next message: [mercury-users] Destructive list operations
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the users mailing list