[m-users.] Function to remove unicode byte order mark
Peter Wang
novalazy at gmail.com
Sun Feb 26 11:22:32 AEDT 2017
On Sat, 25 Feb 2017 08:57:58 +0100, Dirk Ziegemeyer <dirk at ziegemeyer.de> wrote:
> Hi,
>
> I’d like to share a litte piece of code that removes the byte order mark (BOM) from the top of an utf-8 file.
>
> I didn’t even know that a byte order mark exists until I saved an Excel table as Unicode.txt in order to be able to read it with Mercury. BOM is preserved during conversion to utf-8.
>
> As Mercury doesn’t remove the BOM by itself, it’s up to the application to deal with it.
>
> Dirk
>
>
>
> % Remove optional unicode byte order mark (BOM) from the beginning
> % of an utf-8 file
> %
> :- func remove_byte_order_mark(string) = string.
>
> remove_byte_order_mark(RawFirstLine) = FirstLine :-
> ( if
> string.first_char(RawFirstLine, FirstChar, Rest),
> char.to_int(FirstChar) = 0xfeff % Unicode byte order mark (BOM)
> then FirstLine = Rest
> else FirstLine = RawFirstLine
> ).
Hi Dirk,
A slight improvement is to use the (in, in, uo) mode of first_char to
avoid copying the rest of the string unless the first char is the BOM,
first_char(S, '\xFEFF\', Rest)
Peter
More information about the users
mailing list