[m-rev.] for review: genetic algorithm for searching the space of optimizations

Julien Fischer juliensf at cs.mu.OZ.AU
Mon Feb 6 14:12:30 AEDT 2006


On Fri, 3 Feb 2006, Samrith UONG wrote:

A few things:

- it would be good if there was a little more documentation, e.g a README
  file or similar

- what sort of results have you obtained thus far?

> %-----------------------------------------------------------------------------%
> % vim: ft=mercury ts=4 sw=4 et
> %-----------------------------------------------------------------------------%
> % Copyright (C) 2005-2006 The University of Melbourne.
> % This file may only be copied under the terms of the GNU General
> % Public License - see the file COPYING in the Mercury distribution.
> %-----------------------------------------------------------------------------%
> %
> % File: evolve.m.
> % Main author: samrith.
> %
> % This program implements part of a genetic algorithm to determine an
> % optimal set of optimisation flags to be passed to the Mercury compiler
> % for a given program.
> %
> % It expects to be given two pieces of data as input.  The first is a
> % list of genotypes, where each genotype is a set of strings representing
> % optimisation flags.  The second is a list of phenotypes, where each
> % phenotype is a list of benchmarks.  These are read from the files
> % MCFLAGS/$n/genotypes and MCFLAGS/$n/phenotypes, respectively.
> %
> % The program will then determine the next set of genotypes, which it
> % will write in the file MCFLAGS/$n+1/genotypes.  It will also create
> % the file MCFLAGS/$n+1/flags, which contains the flags passed to mmc
> % to compile a benchmark program.
> %
> % Note that this program does not perform the actual benchmarking tests,
> % nor does it control the evolution over multiple generations.  These
> % tasks are handled by the mcflags shell script.
> %
> %-----------------------------------------------------------------------------%
>
> :- module evolve.
> :- interface.
>
> :- import_module io.
>
> :- pred main(io, io).
> :- mode main(di, uo) is det.
>

You should use predmode syntax here and elsewhere.

...

>
> :- implementation.
>
> :- import_module genotype.
> :- import_module phenotype.
> :- import_module tausworthe3.
>
> :- import_module bool.
> :- import_module char.
> :- import_module float.
> :- import_module getopt.
> :- import_module int.
> :- import_module list.
> :- import_module require.
> :- import_module set.
> :- import_module std_util.
> :- import_module string.
>
> %-----------------------------------------------------------------------------%
>
> main(!IO) :-
>
>         % Process any command line arguments.
>         %
>     io.command_line_arguments(Args, !IO),
>     OptionOps = option_ops_multi(short_option, long_option, option_default),
>     getopt.process_options(OptionOps, Args, _, Result),
>     (
>         Result = ok(OptionTable),
>
>         getopt.lookup_string_option(OptionTable, config_file,
>                 PathToConfigFile),
>         getopt.lookup_string_option(OptionTable, genotypes, PathToGenotypes),
>         getopt.lookup_string_option(OptionTable, next_genotypes,
>                 PathToNextGenotypes),
>         getopt.lookup_string_option(OptionTable, ladder, PathToLadder),
>         getopt.lookup_string_option(OptionTable, phenotypes, PathToPhenotypes),
>
>         getopt.lookup_int_option(OptionTable, first_seed, FirstSeed),
>         getopt.lookup_int_option(OptionTable, second_seed, SecondSeed),
>         getopt.lookup_int_option(OptionTable, third_seed, ThirdSeed)
>     ;
>         Result = error(ErrorMessage),
>         require.error(ErrorMessage)
>     ),
>
>         % Read the input files.
>         %
>     read_config_file(PathToConfigFile, Weightings, Flags, !IO),
>     genotype.read_genotypes(PathToGenotypes, Genotypes, !IO),
>     phenotype.read_phenotypes(PathToPhenotypes, Phenotypes, !IO),
>
>         % Apply the genetic operators to the genotypes.
>         %
>     some [!RNG] (
>         Tausworthe3Seed = tausworthe3_seed(FirstSeed, SecondSeed, ThirdSeed),
>         !:RNG = init_tausworthe3,
>         seed(Tausworthe3Seed, !RNG),
>
>             % We ensure that the following condition is held:
>             %
>             %   length(Genotypes) / 2 = length(Mothers) = length(Fathers).
>             %
>             % Since the crossover/6 predicate creates two children for
>             % every two parents, the population will remain constant
>             % over generations.  This is a simple way of avoiding
>             % extinction.
>             %
>             % Note that map_2in_2out_foldl/7 will simply ignore leftover
>             % elements if the two lists (Mothers and Fathers) are of
>             % unequal lengths.
>             %
>         list.map(phenotype.fitness(Weightings), Phenotypes, Fitness),
>         list.map_foldl(phenotype.selection(Genotypes, Fitness),
>                 Genotypes, Parents, !RNG),
>         list.det_split_list(length(Parents) / 2, Parents, Mothers, Fathers),
>         map_2in_2out_foldl(genotype.crossover, Mothers, Fathers,
>                 Sons, Daughters, !RNG),
>         list.append(Sons, Daughters, Children),
>         list.map_foldl(genotype.mutation(Flags), Children, NextGenotypes,
>                 !RNG),
>
>         _ = !.RNG

The last line is unecessary, just make the previous line:

	list.map_foldl(genotype.mutation(Flags), Children,
		NextGenotypes, !.RNG, _)

...

> %-----------------------------------------------------------------------------%
> %
> % Command line argument parsing.
> %
>
> %
> % This section contains all the code for the predicates required by
> % getopt.process_options.
> %
>
> :- type option
>     --->    config_file
>     ;       genotypes
>     ;       next_genotypes
>     ;       ladder
>     ;       phenotypes
>     ;       first_seed
>     ;       second_seed
>     ;       third_seed.
>
> :- pred short_option(char, option).
> :- mode short_option(in, out) is semidet.
>
> short_option('c', config_file).
> short_option('g', genotypes).
> short_option('h', next_genotypes).
> short_option('l', ladder).
> short_option('p', phenotypes).
> short_option('s', first_seed).
> short_option('t', second_seed).
> short_option('u', third_seed).
>
> :- pred long_option(string, option).
> :- mode long_option(in, out) is semidet.
>
> long_option("config-file", config_file).
> long_option("genotypes", genotypes).
> long_option("next-genotypes", next_genotypes).
> long_option("ladder", ladder).
> long_option("phenotypes", phenotypes).
> long_option("first-seed", first_seed).
> long_option("second-seed", second_seed).
> long_option("third-seed", third_seed).
>
> :- pred option_default(option, option_data).
> :- mode option_default(out, out) is multi.
>
> option_default(config_file, string("evolve.conf")).
> option_default(genotypes, string("MCFLAGS/1/genotypes")).
> option_default(next_genotypes, string("MCFLAGS/2/genotypes")).
> option_default(ladder, string("MCFLAGS/1/ladder")).
> option_default(phenotypes, string("MCFLAGS/1/phenotypes")).
> option_default(first_seed, int(0)).
> option_default(second_seed, int(0)).
> option_default(third_seed, int(0)).
>
> %-----------------------------------------------------------------------------%
> %
> % Input/Ouput predicates.
> %
>

That's not very descriptive.  "Code for reading configuration files" or
similar would be better.

> :- pred read_config_file(string, list(float), list(string), io, io).
> :- mode read_config_file(in, out, out, di, uo) is det.
>

For readability purposes it's probably worth introducing (and using)
the following equivalences somewhere:

	:- type path == string.
	:- type weighting == float.
	:- type flag == string.

> read_config_file(Path, Weightings, Flags, !IO) :-
>     io.open_input(Path, OpenResult, !IO),
>     (
>         OpenResult = ok(Stream),
>         io.read(Stream, ReadWeightingsResult, !IO),
>         (
>             ReadWeightingsResult = ok(Weightings),
>             io.read(Stream, ReadFlagsResult, !IO),
>             (
>                 ReadFlagsResult = ok(Flags),
>                 io.close_input(Stream, !IO)
>             ;
>                 ReadFlagsResult = eof,
>                 require.error("unexpected EOF")
>             ;
>                 ReadFlagsResult = error(ErrorMsg, LineNum),
>                 string.format("%d: %s", [i(LineNum), s(ErrorMsg)], Message),
>                 require.error(Message)
>             )
>         ;
>             ReadWeightingsResult = eof,
>             require.error("unexpected EOF")
>         ;
>             ReadWeightingsResult = error(ErrorMsg, LineNum),
>             string.format("%d: %s", [i(LineNum), s(ErrorMsg)], Message),
>             require.error(Message)
>         )
>     ;
>         OpenResult = error(ErrorCode),
>         io.error_message(ErrorCode, ErrorMessage),
>         require.error(ErrorMessage)
>     ).
>
> :- pred print_ladder(string, list(fitness), list(genotype), io, io).
> :- mode print_ladder(in, in, in, di, uo) is det.
>
> print_ladder(Path, Fitness, Genotypes, !IO) :-
>     io.open_output(Path, OpenResult, !IO),
>     (
>         OpenResult = ok(Stream),
>
>         list.map(string.int_to_string, 1..list.length(Fitness), C1),
>         list.map(phenotype.fitness_to_string, Fitness, C2),
>         list.map(genotype.genotype_to_string, Genotypes, C3),
>
>         Table = string.format_table([right(C1), right(C2), left(C3)], " * "),
>         io.write_string(Stream, Table, !IO),
>
>         io.close_output(Stream, !IO)
>     ;
>         OpenResult = error(ErrorCode),
>         io.error_message(ErrorCode, ErrorMessage),
>         require.error(ErrorMessage)
>     ).
>
> %-----------------------------------------------------------------------------%
> %
> % Miscellaneous.
> %
>
>     % map_2in_2out_foldl(Pred, InList1, InList2, OutList1, OutList2, !A).
>     %
>     % This predicate is the same as list.map_foldl, except that it takes
>     % two input lists and two output lists.
>     %
> :- pred map_2in_2out_foldl(pred(L, M, N, O, A, A),
>         list(L), list(M), list(N), list(O), A, A).
> :- mode map_2in_2out_foldl(pred(in, in, out, out, in, out) is det,
>         in, in, out, out, in, out) is det.
>
> map_2in_2out_foldl(_, [],        [],        [],        [],        !A).
> map_2in_2out_foldl(_, [],        [_H | _T], [],        [],        !A).
> map_2in_2out_foldl(_, [_H | _T], [],        [],        [],        !A).
> map_2in_2out_foldl(P, [H0 | T0], [H1 | T1], [H2 | T2], [H3 | T3], !A) :-
>     P(H0, H1, H2, H3, !A),
>     map_2in_2out_foldl(P, T0, T1, T2, T3, !A).
>

...

> %-----------------------------------------------------------------------------%
> % vim: ft=mercury ts=4 sw=4 et
> %-----------------------------------------------------------------------------%
> % Copyright (C) 2005-2006 The University of Melbourne.
> % This file may only be copied under the terms of the GNU General
> % Public License - see the file COPYING in the Mercury distribution.
> %-----------------------------------------------------------------------------%
> %
> % File: genotype.m.
> % Main author: samrith.
> %
> %-----------------------------------------------------------------------------%
>
> :- module genotype.
> :- interface.
>
> :- import_module tausworthe3.
>
> :- import_module io.
> :- import_module list.
>
> :- type genotype.
>
>     % read_genotypes(Path, Genotypes, !IO):
>     %
>     % Reads in a list of genotypes from the given file.  A genotype is
>     % made up of a set of flags.  Each genotype within a file is
>     % separated by a newline character, and each flag within a genotype
>     % is separated by one or more spaces.
>     %
> :- pred read_genotypes(string, list(genotype), io, io).
> :- mode read_genotypes(in, out, di, uo) is det.
>
>     % crossover(Mother, Father, Son, Daughter, !RNG).
>     %
>     % This predicate takes two parent genotypes and randomly selects
>     % optimisation flags from each to create two new child genotypes.
>     % Note that we don't actually care whether a genotype is male or
>     % female; the terms mother and father, and son and daughter are just
>     % used to differentiate the two different parents and children.
>     %
>     % A "cut and splice" algorithm was chosen here as it is simple
>     % to implement and seems to be well suited to the data structure
>     % used.  Many common crossover methods assume the genotype is a
>     % fixed-length bit-array.
>     %
> :- pred crossover(genotype, genotype, genotype, genotype, RNG, RNG)
>         <= random(RNG, Seed).
> :- mode crossover(in, in, out, out, in, out) is det.
>
>     % mutation(Flags, Child, Mutant, !RNG).
>     %
>     % This predicate randomly mutates a genotype.
>     %
>     % This predicate is implemented by choosing a compiler flag at
>     % random and toggling that flag in the child genotype.
>     %
> :- pred mutation(list(string), genotype, genotype, RNG, RNG)
>         <= random(RNG, Seed).
> :- mode mutation(in, in, out, in, out) is det.
>
>     % print_genotypes(Path, Genotypes, !IO):
>     %
>     % Prints out a list of genotypes to the given file, in the same
>     % format as expected by read_genotypes/4.
>     %
> :- pred print_genotypes(string, list(genotype), io, io).
> :- mode print_genotypes(in, in, di, uo) is det.
>
>     % genotype_to_string(Genotype, String).
>     %
>     % Returns a string representation of the genotype.
>     %
> :- pred genotype_to_string(genotype, string).
> :- mode genotype_to_string(in, out) is det.
>
> %-----------------------------------------------------------------------------%
> %-----------------------------------------------------------------------------%
> :- implementation.
>
> :- import_module char.
> :- import_module int.
> :- import_module require.
> :- import_module set.
> :- import_module std_util.
> :- import_module string.
>
> %-----------------------------------------------------------------------------%
>
> :- type genotype == set(flag).
>
> :- type flag == string.
>
> %-----------------------------------------------------------------------------%
> %
> % Reading in a list of genotypes.
> %
>
> %
> % This section contains the implementation of the read_genotypes/4
> % predicate, which parses a file containing the flags to be passed to the
> % compiler.
> %
> % The parser is implemented using Mercury's DCG notation.  The many/4
> % predicate is a higher-order DCG-rule that takes another DCG-rule as its
> % first argument.  It is used to parse a list of things such as a list of
> % genotypes or a list of flags.
> %
>
> read_genotypes(Path, Genotypes, !IO) :-
>     io.open_input(Path, OpenResult, !IO),
>     (
>         OpenResult = ok(Stream),
>         io.read_file(Stream, ReadResult, !IO),
>         (
>             ReadResult = ok(File),
>             ( if
>                 many(genotype, Genotypes0, File, [])
>             then
>                 Genotypes = Genotypes0,
>                 io.close_input(Stream, !IO)
>             else
>                 require.error("parse error while reading genotypes")
>             )
>         ;
>             ReadResult = error(_, ErrorCode),
>             io.error_message(ErrorCode, ErrorMessage),
>             require.error(ErrorMessage)
>         )
>     ;
>         OpenResult = error(ErrorCode),
>         io.error_message(ErrorCode, ErrorMessage),
>         require.error(ErrorMessage)
>     ).
>
> :- pred many(pred(T, list(char), list(char)), list(T), list(char), list(char)).
> :- mode many(pred(out, in, out) is semidet, out, in, out) is semidet.
>
> many(P, Ps) -->
>     ( if
>         P(X)
>     then
>         many(P, Xs),
>         { Ps = [X | Xs] }
>     else
>         { Ps = [] }
>     ).
>
> :- pred genotype(genotype, list(char), list(char)).
> :- mode genotype(out, in, out) is semidet.
>
> genotype(Genotype) -->
>     many(pred(' '::out, in, out) is semidet --> [' '], _DiscardLeadingSpaces),
>     many(flag, Flags),
>     ['\n'],
>     { set.list_to_set(list.map(string.strip, Flags), Genotype) }.
>
> :- pred flag(flag, list(char), list(char)).
> :- mode flag(out, in, out) is semidet.
>
> flag(Flag) -->
>     double_dash(DoubleDash),
>     many(other, Others),
>     { Flag = string.from_char_list(DoubleDash ++ Others) }.
>
> :- pred double_dash(list(char), list(char), list(char)).
> :- mode double_dash(out, in, out) is semidet.
>
> double_dash(DoubleDash) -->
>     ['-', '-'],
>     { DoubleDash = ['-', '-'] }.
>
> :- pred other(char, list(char), list(char)).
> :- mode other(out, in, out) is semidet.
>
> other(Other) -->
>     \+ ['-', '-'],
>     \+ ['\n'],
>     [Other].
>
> %-----------------------------------------------------------------------------%
> %
> % Genetic operators that operate on the genotype.
> %
>
> %
> % This section contains the predicates for the genetic operators that
> % require access to (and knowledge of) the representation of the genotype.
> % These include the crossover (or recombination) and mutation operators.
> %
>
> crossover(Mother, Father, Son, Daughter, !RNG) :-
>     list.map2_foldl(cut, [Mother, Father], PartsOfSon, PartsOfDaughter, !RNG),
>     Son = set.union_list(PartsOfSon),
>     Daughter = set.union_list(PartsOfDaughter).
>
> :- pred cut(genotype, genotype, genotype, RNG, RNG) <= random(RNG, Seed).
> :- mode cut(in, out, out, in, out) is det.
>
> cut(Parent, PartOfSon, PartOfDaughter, !RNG) :-
>     ( if
>         set.count(Parent, NumFlags),
>         NumFlags \= 0
>     then
>         next(NextRandomInt, !RNG),
>         CrossoverPoint = NextRandomInt mod NumFlags
>     else
>         CrossoverPoint = 0
>     ),
>     set.to_sorted_list(Parent, List),
>     list.det_split_list(CrossoverPoint, List, Start, End),
>     set.list_to_set(Start, PartOfSon),
>     set.list_to_set(End, PartOfDaughter).
>
> mutation(Flags, !Genotype, !RNG) :-
>
>     list.length(Flags, NumFlags),
>     next(Next, !RNG),
>     Index = Next mod NumFlags,
>     list.index0_det(Flags, Index, Flag),
>
>     ( if
>         set.member(Flag, !.Genotype)
>     then
>         set.delete(!.Genotype, Flag, !:Genotype)
>     else
>         set.insert(!.Genotype, Flag, !:Genotype)
>     ).
>

The svset module contains versions of the predicates from the set module with
the arguments in the right order for state variables.

> %-----------------------------------------------------------------------------%
> %
> % Printing out a list of genotypes.
> %
>
> %
> % This section contains the implementation of the print_genotypes/4
> % predicate, which creates a file containing the flags to be passed to the
> % compiler.
> %
>
> print_genotypes(Path, Genotypes, !IO) :-
>     io.open_output(Path, Result, !IO),
>     (
>         Result = ok(Stream),
>         map(genotype_to_string, Genotypes, Strings),
>         io.write_list(Stream, Strings, "\n", io.write_string(Stream), !IO),
>         io.nl(Stream, !IO),
>         io.close_output(Stream, !IO)
>     ;
>         Result = error(ErrorCode),
>         io.error_message(ErrorCode, ErrorMessage),
>         require.error(ErrorMessage)
>     ).
>
> genotype_to_string(Genotype, String) :-
>     set.to_sorted_list(Genotype, List),
>     String = string.join_list(" ", List).
>
> %-----------------------------------------------------------------------------%
>
> ==> phenotype.m <==
> %-----------------------------------------------------------------------------%
> % vim: ft=mercury ts=4 sw=4 et
> %-----------------------------------------------------------------------------%
> % Copyright (C) 2005-2006 The University of Melbourne.
> % This file may only be copied under the terms of the GNU General
> % Public License - see the file COPYING in the Mercury distribution.
> %-----------------------------------------------------------------------------%
> %
> % File: phenotype.m.
> % Main author: samrith.
> %
> %-----------------------------------------------------------------------------%
>
> :- module phenotype.
> :- interface.
>
> :- import_module genotype.
> :- import_module tausworthe3.
>
> :- import_module io.
> :- import_module list.
>
> :- type phenotype.
>
> :- type fitness.
>
>     % read_phenotypes(Path, Phenotypes, !IO).
>     %
>     % Reads a list of phenotypes from the given file, and unifies
>     % the list with Phenotypes.
>     %
> :- pred read_phenotypes(string, list(phenotype), io, io).
> :- mode read_phenotypes(in, out, di, uo) is det.
>
>     % fitness(Weightings, Phenotype, Fitness).
>     %
>     % This predicate evaluates the fitness of a genotype given its
>     % phenotype.
>     %
>     % Fitness is defined here as a weighted sum of each element in the
>     % phenotype.  The weightings can be modified by changing the
>     % "Weightings" variable.
>     %
> :- pred fitness(list(float), phenotype, fitness).
> :- mode fitness(in, in, out) is det.
>

Why can't this be a function?

...

>     % fitness_to_string(Fitness, String).
>     %
>     % Gives a string representation of the fitness value.
>     %
> :- pred fitness_to_string(fitness, string).
> :- mode fitness_to_string(in, out) is det.
>

And this?

Julien.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list