[m-dev.] for review: foreign_decl multiple inclusion guards.

Fergus Henderson fjh at cs.mu.OZ.AU
Sun Jan 28 18:04:34 AEDT 2001


On 28-Jan-2001, Tyson Dowd <trd at cs.mu.OZ.AU> wrote:
> Here's another solution to the multiple header file inclusion problem.
> It puts guards around each foreign_decl in the .h files that get
> generated, and makes sure the guards are the same for each foreign_decl
> that is the same.

As discussed in previous mails, I'm not convinced that this is the
right solution.  So probably we should get together and discuss what
is the best way of handling this issue.

However, I have some review comments for the code that you posted.

> compiler/mlds_to_c.m:
> 	Output header guards based on the contents of the foreign_decl.
> 
> 	We use the md5 message digest of the contents of the foreign_decl to
> 	"uniquely" identify it.  Although md5 message digest are not
> 	unique, they are pretty close -- there is a believed to be on
> 	the order of 2 ^ 64 chance of two strings having the same md5
> 	message digest.  These are pretty good odds considering that
> 	the 

That sentence is incom

> +%-----------------------------------------------------------------------------%
> +
> +:- pragma foreign_decl("C",
> +"
> +
> +/*
> + * This code implements the MD5 message-digest algorithm.
> + * The algorithm is due to Ron Rivest.  This code was
> + * written by Colin Plumb in 1993, no copyright is claimed.
> + * This code is in the public domain; do with it what you wish.
> + *
> + * Equivalent code is available from RSA Data Security, Inc.
> + * This code has been tested against that, and is equivalent,
> + * except that you don't need to include two pages of legalese
> + * with every copy.
> + *
> + * To compute the message digest of a chunk of bytes, declare an
> + * MD5Context structure, pass it to MD5Init, call MD5Update as
> + * needed on buffers full of bytes, and then call MD5Final, which
> + * will fill a supplied 16-byte array with the digest.
> + *
> + * Changed so as no longer to depend on Colin Plumb's `usual.h' header
> + * definitions; now uses stuff from dpkg's config.h.
> + *  - Ian Jackson <ijackson at nyx.cs.du.edu>.
> + * Still in the public domain.
> + */

Hmm, the comment there about dpkg's config.h is misleading.

You should add a comment here saying (a) that you adapted it for use
in the Mercury standard library and (b) where you got the original
distribution of this from.

The comment layout should be changed to match our coding conventions.
Likewise elsewhere.

> +++ library/string.m	2001/01/28 00:09:42
> @@ -368,6 +368,21 @@
>  :- mode string__hash(in, out) is det.
>  %	Compute a hash value for a string.
>  
> +:- func string__md5(string) = string.
> +% 	string__md5(String) = Result.
> +%	Result is the MD5 message digest of String in typical hexidecimal
> +%	string notation (Such as "6f5902ac237024bdd0c176cb93063dc4").
>
> +:- func string__md5_bytes(string) = list(int).
> +% 	string__md5_bytes(String) = Result.
> +%	Result is the MD5 message digest of String represented as list of 
> +%	8-bit values (stored in integers).  There will always be
> +%	16 bytes in the list.
> +
> +:- func string__md5_bytes_to_string(list(int)::in) = (string::out) is det.
> +% 	Converts the md5 digest bytes to a typical hexidecimal string
> +% 	notation.

Hmm... would it be worth putting this in its own module `md5'?

The documentation here is misleading.  There's no such thing as "the"
MD5 message digest of a given Mercury string.  The MD5 message digest
is defined on byte strings, not on character strings.  So the result
here will depend on how the Mercury character string list is represented.
You'll get different answers depending on whether the Mercury
implementation stores characters in ASCII, EBCDIC, or Unicode.

This might cause problems for the .NET port, e.g. if you try to mix
`.h' files compiled with a native-hosted Mercury compiler with files
compiled with a .NET-hosted Mercury compiler.

> +static void ML_MD5Init(struct ML_MD5Context *context);
> +static void ML_MD5Update(struct ML_MD5Context *context,
> +	MR_UnsignedChar const *buf, unsigned len);
> +static void ML_MD5Final(unsigned char digest[16],
> +	struct ML_MD5Context *context);
> +static void ML_MD5Transform(MR_uint_least32_t buf[4],
> +	MR_uint_least32_t const in[16]);
> +
> +").

Hmm, these functions are declared `static' inside a `foreign_decl'
pragma... that's probably wrong.

> +:- pragma foreign_code("C",
>
> +#ifdef MR_BIG_ENDIAN

Generally the need to explicitly test for big-endian or little-endian 
is a sign that the code is non-portable and/or badly written.

The Mercury autoconf test for MR_BIG_ENDIAN tests the endianness of
`int', but the code below seems to be using the endianness of
`MR_uint_least32_t', I think.  The C standard does not guarantee that
all integer types have the same endianness.

> +static void
> +ML_byteSwap(MR_uint_least32_t *buf, unsigned words)
> +{
> +	MR_UnsignedChar *p = (MR_UnsignedChar *)buf;
> +
> +	do {
> +		*buf++ = (MR_uint_least32_t)((unsigned)p[3] << 8 | p[2]) << 16 |
> +			((unsigned)p[1] << 8 | p[0]);

That code will have undefined behaviour if `unsigned' is only 16 bits.

> +		p += 4;

This seems to assume that `MR_unit_least32_t' is exactly 4 bytes.

> +	} while (--words);
> +}
> +#else
> +#define ML_byteSwap(buf,words)
> +#endif

s/,/, /

> +/*
> + * Update context to reflect the concatenation of another buffer full
> + * of bytes.
> + */
> +static void
> +ML_MD5Update(struct ML_MD5Context *ctx,
> +	MR_UnsignedChar const *buf, unsigned len)
> +{
> +	MR_uint_least32_t t;
> +
> +	/* Update byte count */
> +
> +	t = ctx->bytes[0];
> +	if ((ctx->bytes[0] = t + len) < t)
> +		ctx->bytes[1]++;	/* Carry from low to high */
> +
> +	t = 64 - (t & 0x3f);	/* Space available in ctx->in (at least 1) */
> +	if (t > len) {
> +		memcpy((MR_UnsignedChar *)ctx->in + 64 - t, buf, len);
> +		return;
> +	}
> +	/* First chunk is an odd size */
> +	memcpy((MR_UnsignedChar *)ctx->in + 64 - t, buf, t);
> +	ML_byteSwap(ctx->in, 16);
> +	ML_MD5Transform(ctx->buf, ctx->in);
> +	buf += t;
> +	len -= t;
> +
> +	/* Process data in 64-byte chunks */
> +	while (len >= 64) {
> +		memcpy(ctx->in, buf, 64);
> +		ML_byteSwap(ctx->in, 16);
> +		ML_MD5Transform(ctx->buf, ctx->in);
> +		buf += 64;
> +		len -= 64;
> +	}
> +
> +	/* Handle any remaining bytes of data. */
> +	memcpy(ctx->in, buf, len);
> +}

The use of `MR_UnsignedChar' there is quite suspect.
`MR_UnsignedChar' pointers to an unsigned Mercury character.
When we support Unicode, that will be an unsigned Unicode character.
But I don't think the code above will do the right thing if
`MR_UnsignedChar' is not `unsigned char'.

Likewise in ML_MD5Final() and probably elsewhere.

> +:- pragma foreign_code("C",
> +		allocate_context(Ctxt::out),
> +		[will_not_call_mercury, thread_safe], "
> +	MR_incr_hp(Ctxt, ((sizeof(struct ML_MD5Context) 
> +		+ sizeof(MR_Word) - 1) / sizeof(MR_Word)));
> +").

You should use the MR_bytes_to_words() macro.

> +:- pragma foreign_code("C",
> +		md5update(Ctxt::in, Str::in, Length::in), 
> +			[will_not_call_mercury, thread_safe], "
> +	ML_MD5Update((struct ML_MD5Context *) Ctxt,
> +		(MR_UnsignedChar *) Str, (unsigned) Length);
> +").

Here you're calling functions that were declared `static'
from `pragma foreign_code' which could be inlined into
different compilation units.

> +:- pragma foreign_code("C",
> +		md5final(Digest::out, Ctxt::in), 
> +			[will_not_call_mercury, thread_safe], "
> +	unsigned char d[16];

Please use a more meaningful identifier than `d'.

> +:- pragma foreign_code("MC++",
> +		md5final(Digest::out, Ctxt::in), 
> +			[will_not_call_mercury, thread_safe], "
> +	unsigned char d __gc[];

Please use a more meaningful identifier than `d'.

> +	MR_list_nil(Digest);
> +	for (i = 15; i >= 0; i--) {
> +		MR_list_cons_msg(Digest, (MR_Integer) d[i], Digest);

Why is that MR_list_cons_msg() rather than just MR_list_cons()?

> +:- pragma promise_pure(string__md5_bytes/1).
> +
> +string__md5_bytes(String) = Digest :-
> +	impure allocate_context(Context),
> +	impure md5init(Context),
> +	impure md5update(Context, String, string__length(String)),
> +	impure md5final(Digest, Context).

I suggest
	s/allocate_context/md5_allocate_context/g
	s/md5init/md5_init/g
	s/md5update/md5_update/g
	s/md5final/md5_final/g

> +string__md5(String) = string__md5_bytes_to_string(string__md5_bytes(String)).
> +
> +string__md5_bytes_to_string(MD5Result) =
> +        string__to_lower(string__append_list(list__map(
> +                (func(X) =
> +                        ( X < 16 ->
> +                                "0" ++ string__int_to_base_string(X, 16)
> +                        ;
> +                                string__int_to_base_string(X, 16)
> +                        )
> +                ), MD5Result)
> +        )).

Why not just use `string__format("%02x", i(X))'?

	string__md5_bytes_to_string(MD5Result) =
		string__append_list(list__map(
			(func(X) = string__format("%02x", i(X))), MD5Result)).

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list