[mercury-users] Addendum to: need help: Mercury+C/C++?

Fergus Henderson fjh at cs.mu.OZ.AU
Wed Sep 2 13:04:24 AEST 1998


On 01-Sep-1998, tcklnbrg <at at ingenuity-sw.com> wrote:
> tcklnbrg wrote:
> > The lexer.c file contains one global var, "char * tokenStr",
> 
> Addendum:  The above is wrong:  it is "string * tokenStr"
> 
> This means that lexer.c, and presumably et.al. 
> must be compiled with a c++ compiler,
> in this case g++ rather than gcc.  The "string", above is the
> g++/std/ string.

OK, that complicates things a little.  The Mercury implementation does
not (yet) support a direct C++ interface, it only has a C interface.
So if you want to interface with C++ code from Mercury, you need to
build a C wrapper around your C++ code.


To compile the C++ file lexer.c, you need to add the following to
your Mmakefile:

	CXX = g++
	CXXFLAGS = -Wall

	lexer.o : lexer.c
		$(CXX) $(CXXFLAGS) -c lexer.c

Actually I suggest calling it `lexer.cxx' or `lexer.cc'
rather than `lexer.c' if it is C++.  Then you could add
a suffix rule to your Mmakefile

	.SUFFIXES: .cxx

	.cxx.o:
		$(CXX) $(CXXFLAGS) -c $<

and there'd be no need for the explicit rule for lexer.o.
(Suffix rules, `.SUFFIXES', and `$<' are features of standard Make.
Mmake is built on top of GNU Make, so you could alternatively use
GNU Make's pattern rules instead.)

Also, you need to build a C wrapper for the C++ code in lexer.c.
This will look something like this:

	/* lexer_interface.h */

	#ifdef __cplusplus
	extern "C" {
	#endif

	const char * C_tokenStr(void);

	char * C_tokenize(char *);
		/* maybe you should have `const' in there somewhere? */

	#ifdef __cplusplus
	}
	#endif


	/* lexer_interface.cxx */

	#include <string>

	extern string tokenStr;

	extern "C" const char * C_tokenStr() {
		// string::c_str() converts a C++ string to a C string
		return tokenStr.c_str();
	}

	extern "C" char * C_tokenize(char *s) {
		return tokenize(s);
	}

The Mercury interface to the C++ function `tokenize'
and the C++ global variable `tokenStr' will look something like this:

	:- pred tokenize(string::in, string::out,
			io__state::di, io__state::uo) is det.

	:- pred tokenStr(string::out, io__state::di, io__state::uo) is det.

These can be implemented using Mercury's `pragma c_code' to
interface with the C interface that you wrapped around the C++ code.

	:- pragma c_header_code("#include ""lexer_interface.h"").

	:- pragma c_code(get_next_token(Stuff::in, Token::out, IO0::di, IO::uo),
		will_not_call_mercury,
	"
		Token = make_aligned_string_copy(C_tokenize(Stuff));
		update_io(IO0, IO);
	").

	:- pragma c_code(tokenStr(Result::out, IO0::di, IO::uo),
		will_not_call_mercury,
	"
		const char *c_tokenStr = C_tokenStr();
		Result = make_aligned_string_copy(c_tokenStr);
		/* free(c_tokenStr); */
		update_io(IO0, IO);
	").

Now there is one tricky part of the above code that I have not yet explained,
and that is the calls to make_aligned_string_copy().
The values returned from C_tokenStr() and C_tokenize()
are not suitable for use as Mercury strings because

	- they do not have the appropriate lifetime; that is,
	  the memory they are stored in may be deallocated
	- they are not guaranteed to remain constant
	- they are not guaranteed to be word-aligned

For C_tokenStr(), the return value is obtained from string::c_str();
according to the C++ standard, this return value from string::c_str()
is only guaranteed to remain valid up until the next time the string is
modified, so we need to make a copy.  The macro make_aligned_string_copy()
creates a copy on the Mercury heap, appropriately aligned.
(The Mercury implementation requires that Mercury strings be word-aligned so
that it can use the bottom two or three bits of the pointer as tag bits.)
This macro was introduced in Mercury version 0.7.2.

For C_tokenize(), it's a bit more complicated, since you did not
specify the lifetime of the return value from tokenize().  If the
return value is allocated on the C or C++ heap, then it may be the
caller's responsibility to deallocate that memory, so you may
need to add a call to free() or `delete' to avoid a memory leak there.
If you need to deallocate the memory using `delete' (or `delete []')
then you will need to add a C interface to the C++ delete, e.g.
	
	extern "C" void C_delete_char_array(char *s) {
		delete [] s;
	}

in lexer_interface.cxx.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3        |     -- the last words of T. S. Garp.



More information about the users mailing list