[m-rev.] for review: genetic algorithm for searching the space of optimizations
Julien Fischer
juliensf at cs.mu.OZ.AU
Mon Feb 6 14:12:30 AEDT 2006
On Fri, 3 Feb 2006, Samrith UONG wrote:
A few things:
- it would be good if there was a little more documentation, e.g a README
file or similar
- what sort of results have you obtained thus far?
> %-----------------------------------------------------------------------------%
> % vim: ft=mercury ts=4 sw=4 et
> %-----------------------------------------------------------------------------%
> % Copyright (C) 2005-2006 The University of Melbourne.
> % This file may only be copied under the terms of the GNU General
> % Public License - see the file COPYING in the Mercury distribution.
> %-----------------------------------------------------------------------------%
> %
> % File: evolve.m.
> % Main author: samrith.
> %
> % This program implements part of a genetic algorithm to determine an
> % optimal set of optimisation flags to be passed to the Mercury compiler
> % for a given program.
> %
> % It expects to be given two pieces of data as input. The first is a
> % list of genotypes, where each genotype is a set of strings representing
> % optimisation flags. The second is a list of phenotypes, where each
> % phenotype is a list of benchmarks. These are read from the files
> % MCFLAGS/$n/genotypes and MCFLAGS/$n/phenotypes, respectively.
> %
> % The program will then determine the next set of genotypes, which it
> % will write in the file MCFLAGS/$n+1/genotypes. It will also create
> % the file MCFLAGS/$n+1/flags, which contains the flags passed to mmc
> % to compile a benchmark program.
> %
> % Note that this program does not perform the actual benchmarking tests,
> % nor does it control the evolution over multiple generations. These
> % tasks are handled by the mcflags shell script.
> %
> %-----------------------------------------------------------------------------%
>
> :- module evolve.
> :- interface.
>
> :- import_module io.
>
> :- pred main(io, io).
> :- mode main(di, uo) is det.
>
You should use predmode syntax here and elsewhere.
...
>
> :- implementation.
>
> :- import_module genotype.
> :- import_module phenotype.
> :- import_module tausworthe3.
>
> :- import_module bool.
> :- import_module char.
> :- import_module float.
> :- import_module getopt.
> :- import_module int.
> :- import_module list.
> :- import_module require.
> :- import_module set.
> :- import_module std_util.
> :- import_module string.
>
> %-----------------------------------------------------------------------------%
>
> main(!IO) :-
>
> % Process any command line arguments.
> %
> io.command_line_arguments(Args, !IO),
> OptionOps = option_ops_multi(short_option, long_option, option_default),
> getopt.process_options(OptionOps, Args, _, Result),
> (
> Result = ok(OptionTable),
>
> getopt.lookup_string_option(OptionTable, config_file,
> PathToConfigFile),
> getopt.lookup_string_option(OptionTable, genotypes, PathToGenotypes),
> getopt.lookup_string_option(OptionTable, next_genotypes,
> PathToNextGenotypes),
> getopt.lookup_string_option(OptionTable, ladder, PathToLadder),
> getopt.lookup_string_option(OptionTable, phenotypes, PathToPhenotypes),
>
> getopt.lookup_int_option(OptionTable, first_seed, FirstSeed),
> getopt.lookup_int_option(OptionTable, second_seed, SecondSeed),
> getopt.lookup_int_option(OptionTable, third_seed, ThirdSeed)
> ;
> Result = error(ErrorMessage),
> require.error(ErrorMessage)
> ),
>
> % Read the input files.
> %
> read_config_file(PathToConfigFile, Weightings, Flags, !IO),
> genotype.read_genotypes(PathToGenotypes, Genotypes, !IO),
> phenotype.read_phenotypes(PathToPhenotypes, Phenotypes, !IO),
>
> % Apply the genetic operators to the genotypes.
> %
> some [!RNG] (
> Tausworthe3Seed = tausworthe3_seed(FirstSeed, SecondSeed, ThirdSeed),
> !:RNG = init_tausworthe3,
> seed(Tausworthe3Seed, !RNG),
>
> % We ensure that the following condition is held:
> %
> % length(Genotypes) / 2 = length(Mothers) = length(Fathers).
> %
> % Since the crossover/6 predicate creates two children for
> % every two parents, the population will remain constant
> % over generations. This is a simple way of avoiding
> % extinction.
> %
> % Note that map_2in_2out_foldl/7 will simply ignore leftover
> % elements if the two lists (Mothers and Fathers) are of
> % unequal lengths.
> %
> list.map(phenotype.fitness(Weightings), Phenotypes, Fitness),
> list.map_foldl(phenotype.selection(Genotypes, Fitness),
> Genotypes, Parents, !RNG),
> list.det_split_list(length(Parents) / 2, Parents, Mothers, Fathers),
> map_2in_2out_foldl(genotype.crossover, Mothers, Fathers,
> Sons, Daughters, !RNG),
> list.append(Sons, Daughters, Children),
> list.map_foldl(genotype.mutation(Flags), Children, NextGenotypes,
> !RNG),
>
> _ = !.RNG
The last line is unecessary, just make the previous line:
list.map_foldl(genotype.mutation(Flags), Children,
NextGenotypes, !.RNG, _)
...
> %-----------------------------------------------------------------------------%
> %
> % Command line argument parsing.
> %
>
> %
> % This section contains all the code for the predicates required by
> % getopt.process_options.
> %
>
> :- type option
> ---> config_file
> ; genotypes
> ; next_genotypes
> ; ladder
> ; phenotypes
> ; first_seed
> ; second_seed
> ; third_seed.
>
> :- pred short_option(char, option).
> :- mode short_option(in, out) is semidet.
>
> short_option('c', config_file).
> short_option('g', genotypes).
> short_option('h', next_genotypes).
> short_option('l', ladder).
> short_option('p', phenotypes).
> short_option('s', first_seed).
> short_option('t', second_seed).
> short_option('u', third_seed).
>
> :- pred long_option(string, option).
> :- mode long_option(in, out) is semidet.
>
> long_option("config-file", config_file).
> long_option("genotypes", genotypes).
> long_option("next-genotypes", next_genotypes).
> long_option("ladder", ladder).
> long_option("phenotypes", phenotypes).
> long_option("first-seed", first_seed).
> long_option("second-seed", second_seed).
> long_option("third-seed", third_seed).
>
> :- pred option_default(option, option_data).
> :- mode option_default(out, out) is multi.
>
> option_default(config_file, string("evolve.conf")).
> option_default(genotypes, string("MCFLAGS/1/genotypes")).
> option_default(next_genotypes, string("MCFLAGS/2/genotypes")).
> option_default(ladder, string("MCFLAGS/1/ladder")).
> option_default(phenotypes, string("MCFLAGS/1/phenotypes")).
> option_default(first_seed, int(0)).
> option_default(second_seed, int(0)).
> option_default(third_seed, int(0)).
>
> %-----------------------------------------------------------------------------%
> %
> % Input/Ouput predicates.
> %
>
That's not very descriptive. "Code for reading configuration files" or
similar would be better.
> :- pred read_config_file(string, list(float), list(string), io, io).
> :- mode read_config_file(in, out, out, di, uo) is det.
>
For readability purposes it's probably worth introducing (and using)
the following equivalences somewhere:
:- type path == string.
:- type weighting == float.
:- type flag == string.
> read_config_file(Path, Weightings, Flags, !IO) :-
> io.open_input(Path, OpenResult, !IO),
> (
> OpenResult = ok(Stream),
> io.read(Stream, ReadWeightingsResult, !IO),
> (
> ReadWeightingsResult = ok(Weightings),
> io.read(Stream, ReadFlagsResult, !IO),
> (
> ReadFlagsResult = ok(Flags),
> io.close_input(Stream, !IO)
> ;
> ReadFlagsResult = eof,
> require.error("unexpected EOF")
> ;
> ReadFlagsResult = error(ErrorMsg, LineNum),
> string.format("%d: %s", [i(LineNum), s(ErrorMsg)], Message),
> require.error(Message)
> )
> ;
> ReadWeightingsResult = eof,
> require.error("unexpected EOF")
> ;
> ReadWeightingsResult = error(ErrorMsg, LineNum),
> string.format("%d: %s", [i(LineNum), s(ErrorMsg)], Message),
> require.error(Message)
> )
> ;
> OpenResult = error(ErrorCode),
> io.error_message(ErrorCode, ErrorMessage),
> require.error(ErrorMessage)
> ).
>
> :- pred print_ladder(string, list(fitness), list(genotype), io, io).
> :- mode print_ladder(in, in, in, di, uo) is det.
>
> print_ladder(Path, Fitness, Genotypes, !IO) :-
> io.open_output(Path, OpenResult, !IO),
> (
> OpenResult = ok(Stream),
>
> list.map(string.int_to_string, 1..list.length(Fitness), C1),
> list.map(phenotype.fitness_to_string, Fitness, C2),
> list.map(genotype.genotype_to_string, Genotypes, C3),
>
> Table = string.format_table([right(C1), right(C2), left(C3)], " * "),
> io.write_string(Stream, Table, !IO),
>
> io.close_output(Stream, !IO)
> ;
> OpenResult = error(ErrorCode),
> io.error_message(ErrorCode, ErrorMessage),
> require.error(ErrorMessage)
> ).
>
> %-----------------------------------------------------------------------------%
> %
> % Miscellaneous.
> %
>
> % map_2in_2out_foldl(Pred, InList1, InList2, OutList1, OutList2, !A).
> %
> % This predicate is the same as list.map_foldl, except that it takes
> % two input lists and two output lists.
> %
> :- pred map_2in_2out_foldl(pred(L, M, N, O, A, A),
> list(L), list(M), list(N), list(O), A, A).
> :- mode map_2in_2out_foldl(pred(in, in, out, out, in, out) is det,
> in, in, out, out, in, out) is det.
>
> map_2in_2out_foldl(_, [], [], [], [], !A).
> map_2in_2out_foldl(_, [], [_H | _T], [], [], !A).
> map_2in_2out_foldl(_, [_H | _T], [], [], [], !A).
> map_2in_2out_foldl(P, [H0 | T0], [H1 | T1], [H2 | T2], [H3 | T3], !A) :-
> P(H0, H1, H2, H3, !A),
> map_2in_2out_foldl(P, T0, T1, T2, T3, !A).
>
...
> %-----------------------------------------------------------------------------%
> % vim: ft=mercury ts=4 sw=4 et
> %-----------------------------------------------------------------------------%
> % Copyright (C) 2005-2006 The University of Melbourne.
> % This file may only be copied under the terms of the GNU General
> % Public License - see the file COPYING in the Mercury distribution.
> %-----------------------------------------------------------------------------%
> %
> % File: genotype.m.
> % Main author: samrith.
> %
> %-----------------------------------------------------------------------------%
>
> :- module genotype.
> :- interface.
>
> :- import_module tausworthe3.
>
> :- import_module io.
> :- import_module list.
>
> :- type genotype.
>
> % read_genotypes(Path, Genotypes, !IO):
> %
> % Reads in a list of genotypes from the given file. A genotype is
> % made up of a set of flags. Each genotype within a file is
> % separated by a newline character, and each flag within a genotype
> % is separated by one or more spaces.
> %
> :- pred read_genotypes(string, list(genotype), io, io).
> :- mode read_genotypes(in, out, di, uo) is det.
>
> % crossover(Mother, Father, Son, Daughter, !RNG).
> %
> % This predicate takes two parent genotypes and randomly selects
> % optimisation flags from each to create two new child genotypes.
> % Note that we don't actually care whether a genotype is male or
> % female; the terms mother and father, and son and daughter are just
> % used to differentiate the two different parents and children.
> %
> % A "cut and splice" algorithm was chosen here as it is simple
> % to implement and seems to be well suited to the data structure
> % used. Many common crossover methods assume the genotype is a
> % fixed-length bit-array.
> %
> :- pred crossover(genotype, genotype, genotype, genotype, RNG, RNG)
> <= random(RNG, Seed).
> :- mode crossover(in, in, out, out, in, out) is det.
>
> % mutation(Flags, Child, Mutant, !RNG).
> %
> % This predicate randomly mutates a genotype.
> %
> % This predicate is implemented by choosing a compiler flag at
> % random and toggling that flag in the child genotype.
> %
> :- pred mutation(list(string), genotype, genotype, RNG, RNG)
> <= random(RNG, Seed).
> :- mode mutation(in, in, out, in, out) is det.
>
> % print_genotypes(Path, Genotypes, !IO):
> %
> % Prints out a list of genotypes to the given file, in the same
> % format as expected by read_genotypes/4.
> %
> :- pred print_genotypes(string, list(genotype), io, io).
> :- mode print_genotypes(in, in, di, uo) is det.
>
> % genotype_to_string(Genotype, String).
> %
> % Returns a string representation of the genotype.
> %
> :- pred genotype_to_string(genotype, string).
> :- mode genotype_to_string(in, out) is det.
>
> %-----------------------------------------------------------------------------%
> %-----------------------------------------------------------------------------%
> :- implementation.
>
> :- import_module char.
> :- import_module int.
> :- import_module require.
> :- import_module set.
> :- import_module std_util.
> :- import_module string.
>
> %-----------------------------------------------------------------------------%
>
> :- type genotype == set(flag).
>
> :- type flag == string.
>
> %-----------------------------------------------------------------------------%
> %
> % Reading in a list of genotypes.
> %
>
> %
> % This section contains the implementation of the read_genotypes/4
> % predicate, which parses a file containing the flags to be passed to the
> % compiler.
> %
> % The parser is implemented using Mercury's DCG notation. The many/4
> % predicate is a higher-order DCG-rule that takes another DCG-rule as its
> % first argument. It is used to parse a list of things such as a list of
> % genotypes or a list of flags.
> %
>
> read_genotypes(Path, Genotypes, !IO) :-
> io.open_input(Path, OpenResult, !IO),
> (
> OpenResult = ok(Stream),
> io.read_file(Stream, ReadResult, !IO),
> (
> ReadResult = ok(File),
> ( if
> many(genotype, Genotypes0, File, [])
> then
> Genotypes = Genotypes0,
> io.close_input(Stream, !IO)
> else
> require.error("parse error while reading genotypes")
> )
> ;
> ReadResult = error(_, ErrorCode),
> io.error_message(ErrorCode, ErrorMessage),
> require.error(ErrorMessage)
> )
> ;
> OpenResult = error(ErrorCode),
> io.error_message(ErrorCode, ErrorMessage),
> require.error(ErrorMessage)
> ).
>
> :- pred many(pred(T, list(char), list(char)), list(T), list(char), list(char)).
> :- mode many(pred(out, in, out) is semidet, out, in, out) is semidet.
>
> many(P, Ps) -->
> ( if
> P(X)
> then
> many(P, Xs),
> { Ps = [X | Xs] }
> else
> { Ps = [] }
> ).
>
> :- pred genotype(genotype, list(char), list(char)).
> :- mode genotype(out, in, out) is semidet.
>
> genotype(Genotype) -->
> many(pred(' '::out, in, out) is semidet --> [' '], _DiscardLeadingSpaces),
> many(flag, Flags),
> ['\n'],
> { set.list_to_set(list.map(string.strip, Flags), Genotype) }.
>
> :- pred flag(flag, list(char), list(char)).
> :- mode flag(out, in, out) is semidet.
>
> flag(Flag) -->
> double_dash(DoubleDash),
> many(other, Others),
> { Flag = string.from_char_list(DoubleDash ++ Others) }.
>
> :- pred double_dash(list(char), list(char), list(char)).
> :- mode double_dash(out, in, out) is semidet.
>
> double_dash(DoubleDash) -->
> ['-', '-'],
> { DoubleDash = ['-', '-'] }.
>
> :- pred other(char, list(char), list(char)).
> :- mode other(out, in, out) is semidet.
>
> other(Other) -->
> \+ ['-', '-'],
> \+ ['\n'],
> [Other].
>
> %-----------------------------------------------------------------------------%
> %
> % Genetic operators that operate on the genotype.
> %
>
> %
> % This section contains the predicates for the genetic operators that
> % require access to (and knowledge of) the representation of the genotype.
> % These include the crossover (or recombination) and mutation operators.
> %
>
> crossover(Mother, Father, Son, Daughter, !RNG) :-
> list.map2_foldl(cut, [Mother, Father], PartsOfSon, PartsOfDaughter, !RNG),
> Son = set.union_list(PartsOfSon),
> Daughter = set.union_list(PartsOfDaughter).
>
> :- pred cut(genotype, genotype, genotype, RNG, RNG) <= random(RNG, Seed).
> :- mode cut(in, out, out, in, out) is det.
>
> cut(Parent, PartOfSon, PartOfDaughter, !RNG) :-
> ( if
> set.count(Parent, NumFlags),
> NumFlags \= 0
> then
> next(NextRandomInt, !RNG),
> CrossoverPoint = NextRandomInt mod NumFlags
> else
> CrossoverPoint = 0
> ),
> set.to_sorted_list(Parent, List),
> list.det_split_list(CrossoverPoint, List, Start, End),
> set.list_to_set(Start, PartOfSon),
> set.list_to_set(End, PartOfDaughter).
>
> mutation(Flags, !Genotype, !RNG) :-
>
> list.length(Flags, NumFlags),
> next(Next, !RNG),
> Index = Next mod NumFlags,
> list.index0_det(Flags, Index, Flag),
>
> ( if
> set.member(Flag, !.Genotype)
> then
> set.delete(!.Genotype, Flag, !:Genotype)
> else
> set.insert(!.Genotype, Flag, !:Genotype)
> ).
>
The svset module contains versions of the predicates from the set module with
the arguments in the right order for state variables.
> %-----------------------------------------------------------------------------%
> %
> % Printing out a list of genotypes.
> %
>
> %
> % This section contains the implementation of the print_genotypes/4
> % predicate, which creates a file containing the flags to be passed to the
> % compiler.
> %
>
> print_genotypes(Path, Genotypes, !IO) :-
> io.open_output(Path, Result, !IO),
> (
> Result = ok(Stream),
> map(genotype_to_string, Genotypes, Strings),
> io.write_list(Stream, Strings, "\n", io.write_string(Stream), !IO),
> io.nl(Stream, !IO),
> io.close_output(Stream, !IO)
> ;
> Result = error(ErrorCode),
> io.error_message(ErrorCode, ErrorMessage),
> require.error(ErrorMessage)
> ).
>
> genotype_to_string(Genotype, String) :-
> set.to_sorted_list(Genotype, List),
> String = string.join_list(" ", List).
>
> %-----------------------------------------------------------------------------%
>
> ==> phenotype.m <==
> %-----------------------------------------------------------------------------%
> % vim: ft=mercury ts=4 sw=4 et
> %-----------------------------------------------------------------------------%
> % Copyright (C) 2005-2006 The University of Melbourne.
> % This file may only be copied under the terms of the GNU General
> % Public License - see the file COPYING in the Mercury distribution.
> %-----------------------------------------------------------------------------%
> %
> % File: phenotype.m.
> % Main author: samrith.
> %
> %-----------------------------------------------------------------------------%
>
> :- module phenotype.
> :- interface.
>
> :- import_module genotype.
> :- import_module tausworthe3.
>
> :- import_module io.
> :- import_module list.
>
> :- type phenotype.
>
> :- type fitness.
>
> % read_phenotypes(Path, Phenotypes, !IO).
> %
> % Reads a list of phenotypes from the given file, and unifies
> % the list with Phenotypes.
> %
> :- pred read_phenotypes(string, list(phenotype), io, io).
> :- mode read_phenotypes(in, out, di, uo) is det.
>
> % fitness(Weightings, Phenotype, Fitness).
> %
> % This predicate evaluates the fitness of a genotype given its
> % phenotype.
> %
> % Fitness is defined here as a weighted sum of each element in the
> % phenotype. The weightings can be modified by changing the
> % "Weightings" variable.
> %
> :- pred fitness(list(float), phenotype, fitness).
> :- mode fitness(in, in, out) is det.
>
Why can't this be a function?
...
> % fitness_to_string(Fitness, String).
> %
> % Gives a string representation of the fitness value.
> %
> :- pred fitness_to_string(fitness, string).
> :- mode fitness_to_string(in, out) is det.
>
And this?
Julien.
--------------------------------------------------------------------------
mercury-reviews mailing list
post: mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------
More information about the reviews
mailing list