[From newsgroup cs.mercury] Additional config tests

Bert Thompson aet at cs.mu.oz.au
Fri Mar 28 23:06:09 AEDT 1997


This follow failed so I'm mailing it to m-dev...

(Of late, news has been flaking out nastily often.) 

------- Forwarded Message

Newsgroups: cs.mercury
Subject: Re: For review: Additional configuration tests
References: <5hd3uu$lbk at mulga.cs.mu.OZ.AU> <5hd8u6$q16 at mulga.cs.mu.OZ.AU> <5hdg5l$3f at mulga.cs.mu.OZ.AU> <5hdk62$3po at mulga.cs.mu.OZ.AU>

Fergus,

Thanks for the follow-up. You've made a lot of excellent points.

Quick summary:
	- We do need the -fixed- format constants in the bytecode file
	  for portability. (Call them Int64 and Float64.)
	- We -must- represent Float64 as `double' (or some primitive
	  64-bit float) since the alternative is to emulate some
	  IEEE-754 operations in software.
	- We could represent Int64 as either:
		- `long long' (or some other primitive 64-bit integral type)
		  Pros: Fast, simple
		  Cons: Maybe not portable. (Works on all platforms we
			currently support, however.) Is ANSI?
		or
		- array of eight bytes.
		  Pros: Portable.
		  Cons: Slower, messier.

Gory details follow...

fjh at mundook.cs.mu.OZ.AU (Fergus Henderson) writes:

|aet at fengshui.cs.mu.oz.au (Bert Thompson) writes:

|>Here's the summary:
|>	- We want to store Mercury `int' and `float' constants in
|>	  the bytecode file.
|>	- We want bytecode files to be portable between different
|>	  architectures.
|>	- Hence we define a fixed byte-format for these constants
|>	  in the bytecode file.

|Agreed.

|>	- We want C typedefs for these fixed-format types so that
|>	  it's easier to read them in.
|>	- This requires us to know the platform-dependent bit-representations
|>	  of some C types.

|This is where I disagree.

Fair enough. Let's examine these points a bit more.

|BTW, thanks for the detailed justification of the bytecode format, that
|was very good -- please save it as a comment in the source code somewhere.

|>When reading a bytecode file, it is simpler to read these values into
|>variables of type `Int64' or `Float64'.

|Lets take it one at a time, starting with `Int64'.  First, because of
|endianness problems, in general you won't be able to read the values in
|directly as an `Int64' -- instead you will have to read them in byte by
|byte, and then convert.  

Yep. That's what I'm doing. (Recall in the previous article, I specified 
the endianness of constants in the bytecode file.)

This will not be difficult, it is just the
|inverse of what happens in bytecode_gen.m.  So I don't see any need
|for an integer type with _exactly_ 64 bits.  

The reason for -exactly- 64-bits is this: we need a portable -fixed-
format for Mercury `int' constants in the bytecode file. We also
want a C representation of this 64-bit type to facilitate reading,
writing and converting these Int64s to and from Mercury `int's. 

The important point to note is this: We have to convert between
Int64s and Mercury `int's and between Float64s and Mercury `float's.
We are using Int64s and Float64s anyway in the bytecode file. The
questions really are a) do we dignify them with a name in C? and b)
how do we represent them in C?

Here's where the conversions happen:
	`int' to Int64: Mercury code --> bytecode
	Int64 to `int': bytecode --> interpretation using same heap 
					as Mercury runtime.
This is important. You should mull over this a little. Remember that
the bytecode interpreter will link with the runtime, libmer.so.
Hence we must convert Int64s back to `int's so they live happily
with other `int's in the heap and stack. The bytecode interpreter
may also talk to other compiled Mercury shlibs. (Of course, all this
applies to floats as well.)

In theory we could represent Int64 using an array of eight bytes,
but it carries more overhead compared with using `long long'.
You have an excellent point below where you say `what if a C compiler
doesn't have the `long long' type?'. I consider this to be a tradeoff.
Certainly I have no objection to using the array; it's just a little
more circuitous and slower. (We'll need to do a little trivial
2's-complement hacking to convert between Int64 (8-byte array representation)
and the Mercury `int' type, for instance.)

If you think we should wear this in preference to possibly losing 
portability, then that's fair enough. Personally, I think it's 
pretty unlikely that we'll hit a compiler thus limited on any platform 
we're interested in. Also, there are other much more significant 
system dependencies in Mercury. 8^)

I'd really prefer to use the `long long' since a more complex scheme
eats into my debugger implementation time.

Note that floating point, as you point out, is not so simple.
We really do need a 64-byte fixed float format, since the alternative
is to emulate some IEEE-754 operations in software. 8^(

(One of the operations is a 64-bit to 32-bit IEEE float conversion.
This is non-trivial to do in software. (That was a euphemism, in case
you wondered! 8^))

An integer type with
|_at least_ 64 bits might make it slightly easier to check for overflow,
|but it should be easy enough to detect overflow as you go, e.g. by
|counting the number of leading zero bits, subtracting that from 64,
|and comparing the result with sizeof(Integer) * CHAR_BIT, so even
|that seems unnecessary.

We don't really need to check for overflow, since we assume the
Mercury constant put into the bytecode has already been checked
by the compiler. (Well, in fact if we interpret bytecode on
a platform where Mercury `int' is 32-bit, we need to check that
an Int64 value from the bytecode file doesn't overflow. But that's
really a different problem and inevitable since sizeof Mercury `int' 
may differ from platform to platform.)

|`Float64' seems more reasonable, because the alternative is a lot more
|work.  However, you should not assume that `Float64' will necessarily
|be a C `double' -- the configure script should try `float' and `long
|double' too.

Amen. 8^)
(See my comment above.)

|>|>To this end, I've added some tests in the configuration. The changes
|>|>files are configure.in and runtime/conf.h.in.
|>
|>|Where's the log message?
|>
|>--------------------------------------------------
|>Estimated hours taken: 2
|>
|>The bytecode file must store constants in Mercury programs in a
|>manner that is platform-independent. Specifically:
|>	- Mercury `int' constants are stored as 64-bit, 2's-complement,
|>	  big-endian quantities
|>	- Mercury `float' constants are stored as 64-bit, IEEE-754
|>	  big-endian floating point quantities.
|>
|>The code that reads these quantities is neater if we can create
|>typedefs such as `int64' and `float64' that represent the above
|>entities.
|>
|>To create such typedefs, we need sizeof information on C types.
|>This is the purpose of the #defines:
|>	LONG_LONG_IS_64_BITS
|>	DOUBLE_IS_64_BITS
|>	SHORT_IS_16_BITS
|>
|>configure.in
|>runtime/conf.h.in
|>--------------------------------------------------

|It is better if the log message first states what was changed, and then why.
|Your log message is good on the "why", but poor on the "what".
|Can you please revise it?

Sure. Good suggestion.

|Also, `SHORT_IS_16_BITS' is unexplained.

|>|>+ AC_MSG_CHECKING(whether long long is 64-bits)
|>|>+ AC_CACHE_VAL(mercury_cv_int64,
|>
|>|The name `int64' is misleading.
|>
|>I chose it since the 64-bit quantity is used to represent Mercury
|>`int's in the bytecode. 

|Well, that's great, but it is still misleading, so use a different one, e.g.
|`mercury_cv_long_long_is_64_bits'.  (Ditto for `float64'.)

Ok. I'll do that.

|>|You should use
|>
|>|	#include <limits.h>
|>| 	int main() {
|>| 		if (sizeof(long long) * CHAR_BIT == 64)
|>|			...
|>
|>Correct. I assumed we have no interest in any brain-damaged architecture
|>where CHAR_BIT != 8. Even so, good point.

|We're not likely to port to such architectures, but it's better to
|avoid such assumptions if you can, and anyway I think the code
|using CHAR_BIT is clearer, because the magic numbers match.

Very true. I'll make the change.

|>|>+ #undef	LONG_LONG_IS_64_BITS
|>|>+ #undef	DOUBLE_IS_64_BITS
|>|>+ #undef	SHORT_IS_16_BITS
|>|>  
|>|>  #endif /* CONF_H */
|>
|>|These macros should be documented.
|>
|>They do exactly what they say. LONG_LONG_IS_64_BITS is defined in
|>exactly the case that `long long' is 64 bits.

|The case that `long long' is exactly 64 bits, or the
|case that `long long' is at least 64 bits?

Exactly 64 bits.

|What about if the C compiler doesn't support `long long'?

Good point. See my comment above on this.

|Also, I think it would be helpful if you document which part of the
|Mercury system uses them.

This is used only by the bytecode components of the compiler.
Most of it will be in the C code I'm writing. A little will be
in some pragma C in bytecode.m.

I'll document this better in a revised log message.


Anyway, I'd really like to get this resolved real soon now, since it's
eating into my debugger implementation time.

The sooner we have a Mercury debugger, the sooner the Prolog debuggers
are just a bad memory. 8^)

Cheers,
Bert

------- End of Forwarded Message




More information about the developers mailing list