[m-rev.] diff: string splitting routines to string.m

Ralph Becket rafe at csse.unimelb.edu.au
Fri Feb 2 12:57:41 AEDT 2007


Ondrej Bojar, Friday,  2 February 2007:
> Ralph Becket wrote:
> >In this case I'm just looking for a convincing case *for*
> >split_at_string, but I'm not arguing *against* it.
> 
> It's a generalization of split_at_char. To be honest, I came across it 
> in one project only so far, but it simplifies nicely situations where 
> you want some hierarchy of delimiters:
> 
> sentence_id ||| I|pronoun am|verb sleepy|adj ||| scores
> 
> Usually, one would use 'tab' to delimit main columns and 'space' to 
> delimit words/subfields within fields. Some people are afraid of 
> whitespace delimiters and prefer to use a printable character. One needs 
> to escape the character (the whitespace too, but it's not that common 
> that your data actually contains whitespace). So for each level of 
> delimiting, you need another character and another escape sequence. If 
> your fields are never blank, you can however use just different number 
> of copies of the single delimiter char, to mark different levels of 
> segments. And this is where split_at_string is useful.
> 
> Probably not very convincing, though....

Sounds like the computational linguistics people need to learn something
about representing data :-)
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list