[m-dev.] New datatype: rope

Paul Bone pbone at csse.unimelb.edu.au
Thu Nov 11 12:27:03 AEDT 2010


I'd like to introduce a new datatype to the standard library.  I call it 'rope'
it is described in [1] by Hans Boehm et al.  It is similar to cord, which is
already part of Mercury's standard library but is different in some important
respects.

A cord is:

    :- type cord(T)
        --->    empty_cord
        ;       nonempty_cord(cord_node(T)).
        
    :- type cord_node(T)
        --->    unit_node(T)
        ;       list_node(T, list(T))
        ;       branch_node(cord_node(T), cord_node(T)).
     
That is it may be:
    empty,
    singleton,
    described by a list,
    a concatenation.

A rope should be the same except that it may not be described by a list but may
instead be described by a slice of an array.

A slice of an array would be:

    :- type slice(T)
        --->    slice(
                    s_array     :: array(T), 
                    s_offset    :: int,
                    s_len       :: int
                ).

If a programmer modifies a cord by updating a cell in the middle of a slice,
the slice can be sliced into two slices and a singleton node can be used to
represent the updated element.  This avoids copying and allocating memory.

There is a case where two slices may be merged if they are neighbours in the
original array, however this is probably rare.

Branch nodes should have an extra fields describing the size of each branch.
Size of a cord can be retrieved in constant time and a random access via an
offset can be performed in logarithmic time (linear with the depth of the
rope).

This is interesting to me to describe and manipulate sub strings of a larger
string; consider a conjunction of goals that is to be split into
sub-conjunctions (some parallel, some sequential) but the goals will usually
stay in the same order.  I hope to avoid a lot of list manipulation using such
a data-structure.

Does anyone see any problem with this idea or a simpler implementation?  In
particular, is there some (good) way to support cords and ropes without
reimplementing a lot of code for ropes that is only different for cords in the
type of data being manipulated?


    1. HJ Boehm, R Atkinson, M Plass: "Ropes: An alternative to strings"
       http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/developers/attachments/20101111/dee171e8/attachment.sig>


More information about the developers mailing list