[m-rev.] for review: genetic algorithm for searching the space of optimizations

Samrith UONG samuong at gmail.com
Wed Feb 8 15:48:38 AEDT 2006


On 2/6/06, Julien Fischer <juliensf at cs.mu.oz.au> wrote:
>
> On Fri, 3 Feb 2006, Samrith UONG wrote:
>
> A few things:
>
> - it would be good if there was a little more documentation, e.g a README
>   file or similar

I've put a first draft of the README file at the bottom of this email.
 Let me know which parts need more work.

> - what sort of results have you obtained thus far?

The program has now found some optimisations that work better than
-O6.  I've also found that, for the programs and inputs that I'm
using, -O6 works better than -O5.  This contradicts the benchmarking
that we did with the compiler in December.  I'm going to test the
optimisations that I get from the GA on the compiler using makebatch
and speedtest to see how they work on a larger piece of code.

> > :- pred main(io, io).
> > :- mode main(di, uo) is det.
> >
>
> You should use predmode syntax here and elsewhere.

OK.

> >         list.map_foldl(genotype.mutation(Flags), Children, NextGenotypes,
> >                 !RNG),
> >
> >         _ = !.RNG
>
> The last line is unecessary, just make the previous line:
>
>         list.map_foldl(genotype.mutation(Flags), Children,
>                 NextGenotypes, !.RNG, _)
>

Changed it.

> > %-----------------------------------------------------------------------------%
> > %
> > % Input/Ouput predicates.
> > %
> >
>
> That's not very descriptive.  "Code for reading configuration files" or
> similar would be better.
>

Changed it.

> > :- pred read_config_file(string, list(float), list(string), io, io).
> > :- mode read_config_file(in, out, out, di, uo) is det.
> >
>
> For readability purposes it's probably worth introducing (and using)
> the following equivalences somewhere:
>
>         :- type path == string.
>         :- type weighting == float.
>         :- type flag == string.

That's a good idea.  I've done it for weighting and flag, but not for
path.  The path is passed straight to io.open_input/4, which expects a
string.  So if an equivalence type for path == string were defined, it
should belong in mercury/library/io.m.  Otherwise it's a bit hard to
decide where to put it in my program (since there are four predicates
in three files that require a path).

> >     ( if
> >         set.member(Flag, !.Genotype)
> >     then
> >         set.delete(!.Genotype, Flag, !:Genotype)
> >     else
> >         set.insert(!.Genotype, Flag, !:Genotype)
> >     ).
> >
>
> The svset module contains versions of the predicates from the set module with
> the arguments in the right order for state variables.

Changed it.

> > :- pred fitness(list(float), phenotype, fitness).
> > :- mode fitness(in, in, out) is det.
> >
>
> Why can't this be a function?
>
> > :- pred fitness_to_string(fitness, string).
> > :- mode fitness_to_string(in, out) is det.
> >
>
> And this?

Changed them.  But what's wrong with predicates?

Sam.

==> README <==
                                mcflags

1 Introduction

Mercury provides a large number of compile-time options that enable or
disable optimisations to the code being generated.  It can be difficult
to determine which ones to apply, especially since different options may
work better with different programs or different inputs.  This program
attempts to find the optimal set of optimisation flags to be passed to
mmc(1).  It works using a genetic algorithm; see the following URLs for
details on how genetic algorithms work:

             http://en.wikipedia.org/wiki/Genetic_algorithm
     http://www.cs.cmu.edu/Groups/AI/html/faqs/ai/genetic/top.html

The program is invoked using the mcflags script.  It may be useful (if
something goes wrong) to turn on the verbose flag and pipe the output to
a file:

                  ./mcflags -v 2>&1 | tee mcflags.out

The program stores its output in the directory MCFLAGS.  Each generation
has its own subdirectory within MCFLAGS.  For example, information about
the first generation is stored in MCFLAGS/1.

There are a number of files in each of these directories.  The most
useful one is the file "ladder".  This file contains a table listing all
of the individuals in a population.  Each individual is given a number,
between 1 and the population size, in the first column of the table.
The second column contains the "fitness" of an individual.  The third
column contains the "genotype" of an individual, which is the list of
compiler options that are passed to the compiler.

The program is intended to be general enough to test different programs,
as well as different programming languages and compilers.  The following
sections describe how to configure mcflags to do this.

2 Configuring for your network

mcflags has been written to utilise multiple hosts on a network.  In
order to configure mcflags for your network, the mcflags.conf file needs
to contain information about all the hosts to be used.

Configuring mcflags.conf involves defining a set of variables and
assigning them values.  The syntax is the same as defining shell
variables in /bin/sh.

The following variable must be defined:

	num_hosts: the number of hosts available for benchmarking.  This
		may include the host from which mcflags is being run, if
		it is also being used to run the benchmarks.

For each host, the following variables need to be defined.

	host$i: the name of the host we are connecting to.  This is
		passed as a command-line argument to ssh(1).  Make sure
		you have a copy of the host's SSH public key in your
		cache before you run mcflags.

	workspace$i: the path to a directory containing mcflags.  Note
		that nothing is written to this directory, it is just
		used to access the dotime and evaluate.conf files from
		a remote host.  It can be the same on all hosts, as long
		as this directory is also mounted on all of the remote
		machines.

	benchmarks$i: similar to workspace$i, except it contains the
		benchmarks directory from CVS.  Note that each host must
		have its own benchmarks directory.

	path$i: the path to the directory containing the compiler.  This
		is pre-pended to $PATH in the evaluate script.

3 Configuring for your software

The evaluate.conf file allows the user to change the benchmarking
information gathered by mcflags.  This is useful if mcflags is being
used to test different compilers, and/or different programs.

The syntax is identical to that of mcflags.conf (see section 2).

The following variable must be defined:

	num_progs: the number of programs used for benchmarking.  Note
		that if the number of programs is changed, the way that
		"fitness" is evaluated will need to be changed.  This
		should be done by defining a new set of "Weightings" in
		evolve.conf (see section 4).

For each program, the following variables need to be defined.

	prog$i: the full path to the executable for the program (which
		may not yet be built).  This may or may not be contained
		under "$benchmarks" (see section 2).

	clean$i: the command used to completely clean up the source
		directory (e.g., "make realclean").

	compile$i: the command used to compile the program.  Note that
		you may assume there is a $flags shell variable which
		gives the optimization flags passed to the compiler.

	run$i: the command used to run the program.

4 Configuring parameters for the genetic operators

If the intention is to, for example, optimise for space instead of time,
then it may be necessary to modify certain parameters of the genetic
operators.  This can be achieved by modifying the evolve.conf file,
which allows the user to tweak the parameters of certain genetic
operators including the fitness operator and the mutation operator.

The syntax of evolve.conf is a bit different to that of mcflags.conf and
evaluate.conf.  Currently, evolve.conf must contain two terms that can
be read by io.read/3.

The first term contains the "Weightings" used by phenotype.fitness/2.
This parameter is coupled with the number of programs being tested (by
default 5, see section 3 on evaluate.conf).  For each program, there are
three measurements taken by the software:  the time taken to compile,
the resulting executable size and the time taken to run the executable.
Because of this, the list must be of length $num_progs * 3.

The following examples give cause mcflags to search for a set of flags
that will compile in the least amount of time,

			[ 1.0, 1.0, 1.0, 1.0, 1.0,
			  0.0, 0.0, 0.0, 0.0, 0.0,
			  0.0, 0.0, 0.0, 0.0, 0.0 ].

weight compile times and run times equally,

			[ 0.5, 0.5, 0.5, 0.5, 0.5,
			  0.0, 0.0, 0.0, 0.0, 0.0,
			  0.5, 0.5, 0.5, 0.5, 0.5 ].

or optimise for space, while ignoring the second program in the set.

			[ 0.0, 0.0, 0.0, 0.0, 0.0,
			  1.0, 0.0, 1.0, 1.0, 1.0,
			  0.0, 0.0, 0.0, 0.0, 0.0 ].

The second term contains a complete list of flags that can be passed to
the compiler.  This is used by genotype.mutation/5, which toggles a
random flag in an individual's genotype.

5 Further modifying the genetic operators

The genetic operators themselves can be modified directly.  They are
implemented in the following functions/predicates:

	phenotype.fitness/2
	phenotype.selection/6
	genotype.crossover/6
	genotype.mutation/5

--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list