Implementation Matters: Programming Best Practices for

Implementation Matters: Programming Best
Practices for Evolutionary Algorithms
J.J. Merelo, G. Romero, M.G. Arenas, P.A. Castillo,
A.M. Mora, and J.L.J. Laredo
Dpto. de Arquitectura y Tecnologı́a de Computadores. Univ. of Granada, Spain
Abstract. While a lot of attention is usually devoted to the study of
different components of evolutionary algorithms or the creation of heuristic operators, little effort is being directed at how these algorithms are
actually implemented. However, the efficient implementation of any application is essential to obtain a good performance, to the point that
performance improvements obtained by changes in implementation are
usually much bigger than those obtained by algorithmic changes, and
they also scale much better. In this paper we will present and apply
usual methodologies for performance improvement to evolutionary algorithms, and show which implementation options yield the best results
for a certain problem configuration and which ones scale better when
features such as population or chromosome size increase.
The design of evolutionary algorithms (EAs) usually includes a methodology for
making them as efficient as possible. Efficiency is measured using metrics such as
the number of evaluations to solution; implicitly seeking to reduce running times.
However, the same amount of attention is not given to designing an implementation as efficient as possible, even as small changes in it can have a much bigger
impact in the overall running time than any algorithmic improvement. This lack
of interest, or attention, in the actual implementation of algorithms proposed
results in the quality of scientific programming being, on average, worse than
what is usually found in companies [1] or released software.
It can be argued that the time devoted to an efficient implementation can be
better employed pursuing scientific innovation or a precise description of the algorithm; however, the methodology for making improvements in program running
time is well established in computer science: there are several static or dynamic
analysis tools which look at memory and running time (called monitors), and
thus, it can be established how much memory and time the program takes, and
then which parts of it (variables, functions) are responsible for that, for which
This work has been supported in part by the CEI BioTIC GENIL (CEB09-0010)
Programa CEI del MICINN (PYR-2010-13) project, the Junta de Andalucı́a TIC3903 and P08-TIC-03928 projects, and the Jaén University UJA-08-16-30 project.
J. Cabestany, I. Rojas, and G. Joya (Eds.): IWANN 2011, Part II, LNCS 6692, pp. 333–340, 2011.
c Springer-Verlag Berlin Heidelberg 2011
J.J. Merelo et al.
profilers are used. Once this methodology has been included into the design
process of scientific software, it does not need to take much more time than,
say, running statistical tests. In the same way that these tests establish scientific accuracy, an efficient implementation makes results better and more easily
reproducible and understandable.
Profiling the code that implements an algorithm also allows to detect potential bugs, see whether code fragments are executed as many times as they
should, and detect the which parts of the code can be optimized in order to
obtain the most impact on performance. After profiling, the deepest knowledge
on the structure underlying the algorithm will allow a more efficient redesign,
balancing algorithmic with computational efficiency; this deep knowledge also
allows to find out computational techniques that can be leveraged in the search
for new evolutionary techniques. For instance, knowing how a sorting algorithm
scales with population size would allow the EA designer to choose the best option for a particular population size, or eliminate sorting completely using a
methodology that avoids sorting altogether, possibly finding new operators or
selection techniques for EAs.
In this paper, we will comment the enhancements applied to a program written
in Perl [2–4] which implements an evolutionary algorithm, and also a methodology for its analysis, proving the impact of the identification of bottlenecks in
a program, and its elimination through common programming techniques. This
impact can go up to several orders of magnitude, but of course it depends on the
complexity of the fitness function and the size of the problem it is applied to, as
has been proved in papers such as the one by Laredo et al. [5]. In principle, the
methodology and tools that have been used are language-independent, and can
be found in any programing language, however the performance improvements
and the options for changing a program will depend on the language implied.
From a first baseline or straightforward implementation of an EA, we will show
techniques to measure the performance obtained with it, and how to derive a set
of rules that improve its efficiency. Given that research papers are not commonly
focused on detailing such techniques, best programming practices for EAs use to
remain hidden and can not benefit the rest of the community. A typical research
paper do not detail these techniques, so that this knowledge remains hidden and
can not benefit the rest of the community. This work is an attempt to highlight
those techniques and encourage the community to reveal how published results
are obtained.
The rest of this paper is structured as follows: Section 2 presents a comprehensive review of the approaches found in the bibliography. Section 3 briefly describes the methodology followed in this study and discusses the results obtained
using different techniques and versions of the program. Finally, conclusions and
future work are presented in Section 4.
State of the Art
EA implementation has been the subject of many works by our group [6–10]
and by others [11–14]. Much effort has been devoted looking for new hardware
Implementation Matters: Programming Best Practices
platforms to run EAs as GPUs [14] of specialized hardware [15]) than trying to
maximize the potential of usual hardware.
As more powerful hardware is available every year researchers have pursuit the
invention of new algorithms [16–18] forgiving how important efficiency is. There
has been some attempts to calculate the complexity of EAs with the intention of
improving it: by avoiding random factors [19] or by changing the random number
generator [20].
However, even on the most modern systems, EA experimentation can be a
extremely long process because every algorithm run can last several hours (or
days), and it must be repeat several times in order to obtain accurate statistics.
And that just in the case of knowing the optimal set of parameters. Sometimes
the experiments must be repeated with different parameters to discover the
optimal combination (systematic experimentation). So in the following sections
we pay attention to implementation details, making improvements in an iterative
Methodology, Experiments and Results
The initial version of the program is taken from [2], and it is shown in
Tables 1 and 2. A canonical EA with proportional selection, two individual
elite, mutation and crossover is implemented. The problem used is MaxOnes
(also called OneMax)[21], where the function to optimize is simply the number
of ones in a bit-string, with chromosomes changing in length from 16 to 512.
The initial population has 32 individuals, and the algorithm runs for 100 generations. The experiments are performed with different chromosome and population sizes, since the algorithms implemented in the program have different
complexity with respect to those two parameters. These runs have been repeated
30 times for statistical accuracy reasons.
Running time in user space (as opposed to wallclock time, which includes
time spent in other user and system processes) is measured each time a change
is made.
In these experiments, the first improvement tested is to include a fitness cache
[16, 2], that is, a data structure called hash which remembers the values already
computed for the fitness function.
This change trades off memory for fast access, as has been mentioned above,
increasing speed but also the memory needed to store the precomputed values.
This is always a good option if there is plenty of memory available, but if this
aspect is not checked and swapping (virtual memory in other OSs) is activated,
it might imply a huge decrease in performance: parts of program data will start
to be swapped out to disk, resulting in a huge performance decrease. However, a
quick calculation beforehand will tell us if we should worry about this and turn
cache off if that is the case.
It is also convenient to look for the fastest way of computing the fitness
function, using language-specific data structures, functions and expressions1 .
Changes can be examined in the code repository at
J.J. Merelo et al.
Table 1. First version of the program used in the experiments (main program). An
evolutionary algorithm is implemented.
my $chromosome length = shift || 16;
my $population size = shift || 32;
my $generations = shift || 100;
my @population = map(random chromosome($chromosome length), 1..$population size);
map( compute fitness( $ ), @population );
for ( 1..$generations ) {
my @sorted population =sort{$b->{’fitness’}<=>$a->{’fitness’}}@population;
my @best = @sorted population[0,1];
my @wheel = compute wheel( \@sorted population );
my @slots = spin( \@wheel, $population size );
my @pool;
my $index = 0;
do {
my $p = $index++ % @slots;
my $copies = $slots[$p];
for (1..$copies) {
push @pool, $sorted population[$p];
} while ( @pool <= $population size );
@population = ();
map( mutate($ ), @pool );
for ( my $i = 0; $i < $population size/2 -1 ; $i++ ) {
my $first = $pool[rand($#pool)];
my $second = $pool[ rand($#pool)];
push @population, crossover( $first, $second );
map( compute fitness( $ ), @population );
push @population, @best;
As can be seen in the results shown in Figure 1-left, running time of the initial
version grows more than linearly. For big instances of the problem run time can
became to long to be practical. For example, for chromosomes of length 16
running time of the program version used as baseline is half of the third version.
For chromosomes of length 256 the run time difference is an order of magnitude
greater. This lead us to think that optimizing the EA implementation is more
valuable than any other algorithmic change. Also it must be remarked than code
changes have been minimal.
In order to obtain further improvements, a profiler (one of the tools described in
the introduction to this paper) has to be used. In Perl, Devel::DProf carries out
an analysis of the different subroutines; however, a sentence-by-sentence analysis
has to be done. Thus, the Devel::NYTProf Perl module (developed at the New
York Times) is used. The results of applying this profiler show that operators like
crossover and mutation are used quite a lot, and some improvements can be made
over them; however, the function which takes the most time is the one that sorts
the population. Changing the default Perl sort, which implements quicksort [22],
to another version using mergesort [23] meant a small improvement, but using
Sort::Key, the best one available in the Perl language yields the best results.
Implementation Matters: Programming Best Practices
Table 2. First version of the program used in the experiments (subrutines)
sub compute wheel {
my $population = shift;
my $total fitness;
map( $total fitness += $ ->{’fitness’}, @$population );
my @wheel = map( $ ->{’fitness’}/$total fitness, @$population);
return @wheel;
sub spin {
my @slots = map( $ *$slots, @$wheel );
my ( $wheel, $slots ) = @ ;
return @slots;
sub random chromosome {
my $length = shift;
my $string = ’’;
for (1..$length) {
$string .= (rand >0.5)?1:0;
{ string => $string,
fitness => undef };
sub mutate {
my $chromosome = shift;
my $clone = { string => $chromosome->{’string’}, fitness => undef };
my $mutation point = rand( length( $clone->{’string’} ));
substr($clone->{’string’}, $mutation point, 1,
( substr($clone->{’string’}, $mutation point, 1) eq 1 )?0:1 );
return $clone;
sub crossover {
my ($chrom 1, $chrom 2) = @ ;
my $chromosome 1 = { string => $chrom 1->{’string’} };
my $chromosome 2 = { string => $chrom 2->{’string’} };
my $length = length( $chromosome 1 );
my $xover point 1 = int rand( $length -1 );
my $xover point 2 = int rand( $length -1 );
if ( $xover point 2 < $xover point 1 ) {
my $swap = $xover point 1;
$xover point 2 = $swap;
$xover point 1 = $xover point 2;
$xover point 2 = $xover point 1 + 1 if ( $xover point 2 == $xover point 1
my $swap chrom = $chromosome 1;
substr($chromosome 1->{’string’}, $xover point 1, $xover point 2 $xover point 1 + 1,
substr($chromosome 2->{’string’}, $xover point 1, $xover point 2 $xover point 1 + 1) );
substr($chromosome 2->{’string’}, $xover point 1, $xover point 2 $xover point 1 + 1,
substr($swap chrom->{’string’}, $xover point 1, $xover point 2 $xover point 1 + 1) );
return ( $chromosome 1, $chromosome 2 );
sub compute fitness {
my $chromosome = shift;
my $unos = 0;
for ( my $i = 0; $i < length($chromosome->{’string’}); $i ++ ) {
$unos += substr($chromosome->{’string’}, $i, 1 );
$chromosome->{’fitness’} = $unos;
J.J. Merelo et al.
Fig. 1. Log-log plot of running time for different chromosome (left) and population
sizes (right). Solid-line corresponds to the baseline version. (Left) Dashed version uses
a cache, and dot-dashed one changes fitness calculation. (Right) Dashed version changes
fitness calculation, while dot-dashed one uses best-of-breed sorting algorithm for the
population. Values are averages for 30 runs.
Figure 1-right shows how run time grows with population size for a fixed
chromosome size of 128. The algorithm is run 100 times regardless of whether the
solution is found or not. The EA behavior is similarly to the previous analysis.
The most efficient version, using Sort::Key, is an order of magnitude more
efficient than the first attempt and the difference grows with the population size.
Adding up both improvements, for the same problem size, almost two order of
magnitude better results are obtained without changing our basic algorithm.
It should be noted that since these improvements are algorithmically neutral,
they do not have a noticeable impact on results, being statistically indistinguishable from the one obtained by the baseline program.
Conclusions and Future Work
This work shows how good programming practices and a deep knowledge of data
and control structures of a programming language can yield an improvement of
up to two orders of magnitude in an evolutionary algorithm (EA). Our tests
consider a well known problem whose results can be easily extrapolated to others. An elimination of bottlenecks after the profiling of the implementation of
an evolutionary algorithm can give better results than a new algorithm with
different, and likely more complex algorithms or a change of parameters in the
existing algorithm. A cache of evaluations can be used on a wide variety of EA
problems. Moreover, a profiler program can be applied on every implementation,
to detect bottlenecks and concentrate efforts on solving them.
Implementation Matters: Programming Best Practices
From these experiments, we conclude that applying profilers to identify the
bottlenecks of evolutionary algorithm implementations, and then careful and informed programming to optimize those fragments of code, greatly improves running time of evolutionary algorithms without degrading algorithmic performance.
Several other techniques can improve EA performance; for instance
mutithreading can be used to take advantage of symmetric multiprocessing and
multicore machines; message passing techniques can be applied to divide the
work for execution on clusters, and vectorization for execution on a GPU, are
three of the more well known and usually employed, but almost every best practice in programming can be applied successfully to improve EAs. In turn, these
techniques will be incorporated to the Algorithm::Evolutionary [16] Perl
library. A thorough study of the interplay between implementation and the algorithmic performance of the implemented techniques will also be carried out.
1. Merali, Z.: Computational science: Error, why scientific programming does not
compute. Nature 467(7317), 775–777 (2010)
2. Merelo-Guervós, J.J.: A Perl primer for EA practitioners. SIGEvolution 4(4), 12–19
3. Wall, L., Christiansen, T., Orwant, J.: Programming Perl, 3rd edn. O’Reilly &
Associates, Sebastopol (2000)
4. Schwartz, R.L., Phoenix, T., foy, B.D.: Learning Perl, 5th edn. O´Reilly & Associates (2008)
5. Laredo, J., Castillo, P., Mora, A., Merelo, J.: Exploring population structures for
locally concurrent and massively parallel evolutionary algorithms. In: Computational Intelligence: Research Frontiers, pp. 2610–2617. IEEE Press, Los Alamitos
6. Merelo-Guervós, J.J.: Algoritmos evolutivos en Perl. Ponencia presentada en el V
Congreso Hispalinux, disponible en (November 2002),
7. Merelo-Guervós, J.J.: OPEAL, una librerı́a de algoritmos evolutivos en Perl. In:
Alba, E., Fernández, F., Gómez, J.A., Herrera, F., Hidalgo, J.I., Merelo-Guervós,
J.J., Sánchez, J.M. (eds.) Actas primer congreso español algoritmos evolutivos,
AEB 2002, Universidad de Extremadura, pp. 54–59 (February 2002)
8. Arenas, M., Foucart, L., Merelo-Guervós, J.J., Castillo, P.A.: JEO: a framework
for Evolving Objects in Java. In: [24], pp. 185–191,
9. Castellano, J., Castillo, P., Merelo-Guervós, J.J., Romero, G.: Paralelización de
evolving objects library usando MPI. In: [24], pp. 265–270
10. Keijzer, M., Merelo, J.J., Romero, G., Schoenauer, M.: Evolving objects: A general
purpose evolutionary computation library. In: Collet, P., Fonlupt, C., Hao, J.-K.,
Lutton, E., Schoenauer, M. (eds.) EA 2001. LNCS, vol. 2310, pp. 231–244. Springer,
Heidelberg (2002)
11. Fogel, D., Bäck, T., Michalewicz, Z.: Evolutionary Computation: Advanced algorithms and operators. Taylor & Francis, Abington (2000)
12. Setzkorn, C., Paton, R.: JavaSpaces–An Affordable Technology for the Simple
Implementation of Reusable Parallel Evolutionary Algorithms. Knowledge Exploration in Life Science Informatics, 151–160
J.J. Merelo et al.
13. Rummler, A., Scarbata, G.: eaLib – A Java Frameword for Implementation of
Evolutionary Algorithms. Theory and Applications Computational Intelligence,
14. Wong, M., Wong, T.: Implementation of parallel genetic algorithms on graphics
processing units. Intelligent and Evolutionary Systems, 197–216 (2009)
15. Schubert, T., Mackensen, E., Drechsler, N., Drechsler, R., Becker, B.: Specialized
hardware for implementation of evolutionary algorithms. In: Genetic and Evolutionary Computing Conference, Citeseer, p. 369 (2000)
16. Merelo-Guervós, J.J., Castillo, P.A., Alba, E.: Algorithm: Evolutionary, a
flexible Perl module for evolutionary computation. Soft Computing (2009), (to be published)
17. Ventura, S., Ortiz, D., Hervás, C.: JCLEC: Una biblioteca de clases java para
computación evolutiva. In: Primer Congreso Español de Algoritmos Evolutivos y
Bioinspirador, pp. 23–30. Mérida, Spain (2002)
18. Ventura, S., Romero, C., Zafra, A., Delgado, J., Hervás, C.: JCLEC: a Java framework for evolutionary computation. Soft Computing-A Fusion of Foundations,
Methodologies and Applications 12(4), 381–392 (2008)
19. Salomon, R.: Improving the performance of genetic algorithms through derandomization. Software - Concepts and Tools 18(4), 175 (1997)
20. Digalakis, J.G., Margaritis, K.G.: On benchmarking functions for genetic algorithms. International Journal of Computer Mathematics 77(4), 481–506 (2001)
21. Muhlenbein, H.: How genetic algorithms really work: I. mutation and hillclimbing. In: Munner, R., Manderick, B. (eds.) Proceedings of the Second Conference
on Parallel Problem Solving from Nature (PPSN II). pp. 15–25. North-Holland,
Amsterdam (1992)
22. Hoare, C.: Quicksort. The Computer Journal 5(1), 10 (1962)
23. Cole, R.: Parallel merge sort.In: 27th Annual Symposium on Foundations of Computer Science 1985, pp. 511–516 (1986)
24. UPV. In: Actas XII Jornadas de Paralelismo, UPV, Universidad Politécnica de
Valencia (2001)