[m-users.] Function to remove unicode byte order mark

Peter Wang novalazy at gmail.com
Sun Feb 26 11:22:32 AEDT 2017


On Sat, 25 Feb 2017 08:57:58 +0100, Dirk Ziegemeyer <dirk at ziegemeyer.de> wrote:
> Hi,
> 
> I’d like to share a litte piece of code that removes the byte order mark (BOM) from the top of an utf-8 file.
> 
> I didn’t even know that a byte order mark exists until I saved an Excel table as Unicode.txt in order to be able to read it with Mercury. BOM is preserved during conversion to utf-8.
> 
> As Mercury doesn’t remove the BOM by itself, it’s up to the application to deal with it.
> 
> Dirk
> 
> 
> 
>     % Remove optional unicode byte order mark (BOM) from the beginning
>     % of an utf-8 file
>     %
> :- func remove_byte_order_mark(string) = string.
> 
> remove_byte_order_mark(RawFirstLine) = FirstLine :-
>     ( if
>         string.first_char(RawFirstLine, FirstChar, Rest),
>         char.to_int(FirstChar) = 0xfeff % Unicode byte order mark (BOM)
>     then FirstLine = Rest
>     else FirstLine = RawFirstLine
>     ).

Hi Dirk,

A slight improvement is to use the (in, in, uo) mode of first_char to
avoid copying the rest of the string unless the first char is the BOM,

    first_char(S, '\xFEFF\', Rest)

Peter


More information about the users mailing list