[m-users.] Function to remove unicode byte order mark

Dirk Ziegemeyer dirk at ziegemeyer.de
Sat Feb 25 18:57:58 AEDT 2017


Hi,

I’d like to share a litte piece of code that removes the byte order mark (BOM) from the top of an utf-8 file.

I didn’t even know that a byte order mark exists until I saved an Excel table as Unicode.txt in order to be able to read it with Mercury. BOM is preserved during conversion to utf-8.

As Mercury doesn’t remove the BOM by itself, it’s up to the application to deal with it.

Dirk



    % Remove optional unicode byte order mark (BOM) from the beginning
    % of an utf-8 file
    %
:- func remove_byte_order_mark(string) = string.

remove_byte_order_mark(RawFirstLine) = FirstLine :-
    ( if
        string.first_char(RawFirstLine, FirstChar, Rest),
        char.to_int(FirstChar) = 0xfeff % Unicode byte order mark (BOM)
    then FirstLine = Rest
    else FirstLine = RawFirstLine
    ).



More information about the users mailing list