Information Sciences 471 (2019) 233–251 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins A novel multi-objective evolutionary algorithm with fuzzy logic based adaptive selection of operators: FAME Alejandro Santiago a,b,c,∗, Bernabé Dorronsoro a, Antonio J. Nebro d, Juan J. Durillo e, Oscar Castillo f, Héctor J. Fraire c a School of Engineering, University of Cádiz, Spain Polytechnic University of Altamira, Mexico c Madero City Institute of Technology, Mexico d University of Malaga, Spain e Leibniz Supercomputing Center, Munich, Germany f Tijuana Institute of Technology, Mexico b a r t i c l e i n f o Article history: Received 3 May 2018 Revised 28 August 2018 Accepted 1 September 2018 Available online 4 September 2018 Keywords: Multi-objective optimization density estimation evolutionary algorithm adaptive algorithm fuzzy logic a b s t r a c t We propose a new method for multi-objective optimization, called Fuzzy Adaptive Multiobjective Evolutionary algorithm (FAME). It makes use of a smart operator controller that dynamically chooses the most promising variation operator to apply in the different stages of the search. This choice is guided by a fuzzy logic engine, according to the contributions of the different operators in the past. FAME also includes a novel effective density estimator with polynomial complexity, called Spatial Spread Deviation (SSD). Our proposal follows a steady-state selection scheme and includes an external archive implementing SSD to identify the candidate solutions to be removed when it becomes full. To assess the performance of our proposal, we compare FAME with a number of state of the art algorithms (MOEA/D-DE, SMEA, SMPSOhv, SMS-EMOA, and BORG) on a set of difficult problems. The results show that FAME achieves the best overall performance. © 2018 Elsevier Inc. All rights reserved. 1. Introduction Nowadays, there is a plethora of metaheuristic techniques available for solving complex optimization problems. Some popular examples are Evolutionary Algorithms (EAs), Particle Swarm Optimization (PSO), and Differential Evolution (DE), just to mention a few. The field of multi-objective optimization, i.e., the optimization of problems involving two or more conflicting objective functions, is not an exception [5,6,21]. Some of the most popular algorithms in the field, such as NSGAII [8], SPEA2 [49], MOEA/D [47], or SMS-EMOA [3] are, EAs. We can also find examples of highly competitive multi-objective algorithms whose search engine is not EA-based but is a PSO instead (such as OMOPSO [36] and SMPSO [25]), or a DE [44], or a Scatter Search algorithm (SS) (AbYSS [26]). These algorithms, although inspired from many different sources, share most of their core concepts. Most multi-objective optimization algorithms work on a set of candidate solutions (often called the population) to a target problem, which are ∗ Corresponding author at: Polytechnic University of Altamira, Mexico E-mail addresses: [email protected] (A. Santiago), [email protected] (B. Dorronsoro), [email protected] (A.J. Nebro), [email protected] (J.J. Durillo), [email protected] (O. Castillo), [email protected] (H.J. Fraire). https://doi.org/10.1016/j.ins.2018.09.005 0020-0255/© 2018 Elsevier Inc. All rights reserved. 234 A. Santiago et al. / Information Sciences 471 (2019) 233–251 evolved by the application of stochastic operators. The main differences between these methods are found in the properties and search capabilities of their operators. It has been reported that some operators perform better than others when dealing with different features of the search space. We can find some examples in the multi-objective optimization domain. Deb et al. evaluated the behavior of some operators in solving problems with variable linkages [9]. Their study revealed that the SBX crossover operator performs poorly for this kind of problem, especially when compared to the DE and CPX operators. Iorio and Li [19] also analyzed the suitability of a number of operators for solving rotated problems and those having epistatic interaction between parameters. The search space of real-world optimization problems is not free of variable-linkage, epistasis, rotation, and other relations between their decision variables. Furthermore, these properties could change throughout the search space. Most multi-objective metaheuristics use a static set of variation operators and parametrization. An efficient adaptive selection mechanism to choose an appropriate operator according to the search progress could improve the quality of the Pareto front approximation found. This idea has been explored by proposals such as AMALGAM [41] and Borg [15]. In addition to operator selection, a major issue influencing the quality of the computed results in multi-objective optimization is the use of density estimators to determine the solutions that must be kept (or discarded). Popular examples are the crowding distance (CD) of NSGA-II [8] and the hypervolume (HV) contribution [3]. The former has limitations when facing problems with more than two objectives, while the exponential complexity of the latter makes it unfeasible for many-objective optimization problems (i.e., those having four or more objectives). The design of a low complexity density estimator that works efficiently for multi- and many-objective problems is still an open issue. Decomposition-based algorithms, such as MOEA/D [47] and its derivatives, do not make use of density estimators. Instead, they require the use of a set of weight vectors. Nevertheless, a uniform distribution of the set of weight vectors does not guarantee an even distribution over the Pareto front [35]. Again, finding an optimal set of weight vectors is an open issue [38]. The method proposed in [14] computes it, but it requires the optimal Pareto front, making it unworkable for real-world problems. The main contribution of this paper is the proposal of a new multi-objective algorithm, called FAME (Fuzzy Adaptive Multi-objective Evolutionary algorithm). It implements a pioneer adaptive mechanism for the selection of the operators and an innovative density estimator for an effective promotion of diversity of solutions in the Pareto front. The two methods are also significant contributions to the literature. On the one hand, the new operator selection mechanism makes use of a novel fuzzy logic based strategy to choose the most promising operator to be applied every time. This is the first time, to the best of our knowledge, that fuzzy logic is used for a dynamic adaptation of variation operators in a multi-objective optimization algorithm. In addition, our proposed density estimator, called Spatial Spread Deviation (SSD), has polynomial complexity and offers highly accurate performance for the two- and three-dimensional problems studied: similar to the CD method for two-objective problems, but without its limitations when dealing with three-objective ones. FAME is compared with five representative algorithms from the state of the art, outperforming them at the 95% confidence level in most cases in the selected set of difficult benchmark problems. The structure of this paper is as follows. We first review the related literature in Section 2, including papers presenting (i) novel multi-objective optimization algorithms with adaptive selection of operators and (ii) adaptive selection mechanisms based on fuzzy logic. Sections 3 and 4 describe the fuzzy adaptive mechanism for the selection of the operators and the spatial spread density estimator, respectively. Our proposed algorithm is described in Section 5. Section 6 briefly describes the algorithms from the state of the art chosen for comparison. Section 7 presents the experiments performed, and the results obtained are summarized in Section 8. The performance of the new SSD density estimator and the adaptive selection of operators are evaluated in Sections 9 and 10, respectively. Our paper ends with our main conclusions and lines for future research in Section 11. 2. Literature review In this section, we discuss related work. In particular, we review in Section 2.1 the existing multi-objective optimization algorithms implementing an adaptive mechanism for the selection of the operators. Section 2.2 briefly describes the relevant literature on the use of fuzzy logic in evolutionary algorithms. 2.1. Adaptive multi-objective optimization algorithms The existing adaptive multi-objective algorithms can be classified into two categories, which we refer to as parameter adaptation and operator adaptation. Parameter adaptation consists in modifying the control parameters of the operators used by the algorithm. They are often known as self-adaptive algorithms, and usually include these control parameters as part of the search space. In this category, Deb et al. [7] introduce a self-adaptive version of the SBX crossover operator (SA-SBX) that dynamically adjusts the distribution index of the SBX. Another self-adaptive SBX operator is presented in [45], where the distribution index is dynamically adjusted using feedback information from both a diversity performance metric and the crowding distance. Both operators performed better than the original SBX for NSGA-II. The idea behind operator adaptation is to provide the algorithm with a set of different (typically, variation) operators from which it can choose, according to their expected contribution to the search process. Following this approach, Toscano and Coello propose a micro-genetic algorithm (μGA2) [39] that runs three instances of the algorithm in parallel, using a A. Santiago et al. / Information Sciences 471 (2019) 233–251 235 different crossover operator in each one. At some stages of the search, the worst-performing crossover operator is replaced by the best-performing one. An important limitation for the adaptive capabilities of the algorithm is that crossover operators cannot be used again after being discarded. MOSaDE [16] is the multi-objective version of the state-of-the-art single-objective SaDE algorithm, and it falls into this category too. Four different classical DE strategies are combined through a self-adaptive mechanism that sets their application probabilities according to their success rate in the previous 50 generations (solution A is considered to be better than B if A dominates B, or if they are non-dominated and A is in a less crowded region). The performance of the algorithm is far from the state-of-the-art ones. Indeed, an improved version of MOSaDE with object-wise learning strategies, called OW-MOSaDE [17], obtained an average rank of 9.39 (out of 13 algorithms) for the 13 problems considered in the CEC2009 MOEA competition. Vrugt and Robinson propose in [41] the so-called AMALGAM algorithm, which applies a number of different multiobjective metaheuristics to evolve the population. Each metaheuristic is adaptively used in order to favor those techniques exhibiting the best performance in the last iteration. The algorithms used are NSGA-II, a PSO, a DE, and an adaptive metropolis search (AMS) approach. AMALGAM is compared with NSGA-II on the ZDT benchmark and some classical test problems (Kursawe, Fonseca, Schaffer [6]), and it has been applied to some real-world problems too [18,43]. Finally, Borg [15] is an adaptive MO algorithm specifically designed for many-objective multimodal problems. As in the previous case, it implements a set of recombination operators, and the choice of the operator to apply is made according to their participation in the search process. To this end, it tracks the number of solutions every recombination operator contributed to an -archive of non-dominated solutions. 2.2. Fuzzy evolutionary approaches We can identify two main areas when combining fuzzy logic and EAs: fuzzy control system design (i.e., using EAs to discover fuzzy rule sets) and dynamic adaptive parameters in metaheuristics. Our proposal is related to the latter kind, and we survey the most relevant literature next. As we did not find any example of multi-objective optimization algorithms, our approach seems to be unique, we focus on related papers targeting single-objective optimization. The most salient nature-inspired algorithms which use fuzzy logic for dynamic parameter adaptation are surveyed in [40]. Among them, Melin et al. [22] is an interesting approach, where fuzzy logic is used to dynamically adjust the weight coefficients C1 and C2 in the velocity formula of a PSO. Three PSO versions were proposed, differing in the set of rules, two of them outperforming the original algorithm. In [30] a DE using a fuzzy system adapts the weighting factor F, while the crossover constant CR is fixed. Experimental results show a statistically significant improvement over the fuzzy versions of two emerging metaheuristics, the harmony search (inspired by musical improvisation, [33]) and the bat search algorithm (inspired by the echolocation behavior of bats, [34]). The benefits of implementing fuzzy logic parameter adaptation have been recently demonstrated on other emerging metaheuristics, such as the Gravitational Search Algorithm (GSA) [31], the Fireworks Algorithm (FWA) [2], and the Search Group Algorithm (SGA) [29]. The first generation of fuzzy logic deals very satisfactorily with uncertainty. However, there has been severe criticism about the lack of uncertainty inside its membership functions. In order to cope with this deficiency, a second generation of fuzzy logic, called interval type-2 fuzzy systems, has been proposed. While type-1 membership functions map to a single value, type-2 map to a range, called the “footprint of uncertainty” (FOU). An Ant Colony Optimization (ACO) using type-2 fuzzy logic was recently proposed in [32], in a very similar manner to [22], improving the results of a rank based ACO and an ACO using type-1 fuzzy parameter adaptation. Nevertheless, the robustness of these fuzzy engines relies heavily on the design of the rule set, requiring thorough knowledge and/or optimization to correctly configure the fuzzy engine. None of these approaches are followed in the present paper. Our novel method identifies, at each search stage, which variation operators are more (or less) promising for the evolution of the population, assigning them the right application probability using interval type-1 fuzzy logic. In addition, our proposal is the only one that has been applied to multiobjective optimization algorithms. 3. The fuzzy adaptive mechanism for the selection of the operators This section introduces a mechanism that allows multi-objective EAs applying different recombination operators at different stages of the search process. In particular, the use of different operators is dynamically adjusted according to their contribution to the search in the past. Intuitively, the idea is to favor the use of operators generating higher quality solutions over the use of other operators. Section 3.1 details the Fuzzy Inference System (FIS) used in the proposed adaptive approach, as well as the selection mechanism for the variation operators. After that, Section 3.2 presents the pool of variation operators considered in this paper, and their strengths. 3.1. Fuzzy inference system Our method is based on the Mamdani-Type FIS [37] to compute the probability of applying the different operators. Mamdani FISs are based on completely linguistic fuzzy models using linguistic variables both in the inputs and outputs. Fuzzy 236 A. Santiago et al. / Information Sciences 471 (2019) 233–251 Fig. 1. Mamdani fuzzy inference diagram. sets defined by membership functions are used to represent the linguistic values that are used in the granulation of the input and output spaces of the fuzzy model. Regarding the inference, we use the originally proposed approach by Mamdani based on the “max min” composition: using the minimum operator for implication and maximum operator for aggregation. Fig. 1 shows an example of a Mamdani FIS. In it, the aggregation of the consequents from the rules are combined into a single fuzzy set (output), to be defuzzified (maped to a real value). A widely used defuzzification method, also adopted in this work, is the centroid calculation, which returns the center of the area under the curve. We use triangular shaped membership functions in all inputs and outputs, μA ( x ) = ⎧ 0 if x ≤ a ⎪ ⎪x − a ⎪ ⎪ ⎨ if a ≤ x ≤ b b−a c−x ⎪ ⎪ if b ≤ x ≤ c ⎪ ⎪ ⎩c − b 0 if . (1) c≤x the parameters a and c determine the “corners” of the triangle, and b determines the peak. A membership function μA maps real values of x with a degree of membership 0 ≤ μA (x) ≤ 1. The used granularity levels were: low (a = −0.4, b = 0.0, c = 0.4), mid (a = 0.1, b = 0.5, c = 0.9) and high (a = 0.6, b = 1.0, c = 1.4). In the following, we refer to the set of available variation operators considered in FAME as Operators. The FIS monitors the search process by controlling the Stagnation and Utilization[Op] variables (Op ∈ Operators) in a time window (of size windowsSize). Depending on the values of these variables, the FIS updates the probability of applying every operator, OpProb[Op]. The value of Stagnation is shared among all operators, and represents a measure of the evolution of the search. In particular, 1.0 its value is incremented by windowSize every time a non-successful solution is generated (i.e., one that does not contribute 1.0 to the archive of non-dominated solutions). The value of Utilization[Op] is incremented by windowSize every time the operator Op is used. The size of the time window must be carefully tuned, and depends on the number of operators considered. Using a large size may lead to an irresponsive behavior of the algorithm, while a very low value leads to highly fluctuating probabilities. In both cases, the probabilities are not properly adapted to the different stages of the search. Table 1 describes the set of AND rules we designed for FIS to compute the likelihood of applying every operator. The Mamdani fuzzy system can be obtained with different methods, including the use of evolutionary or metaheuristic approaches, machine learning or designed based on expert knowledge. In this paper, the proposed method was elaborated based on the last approach, expert multi-objective knowledge [1,10,11,26,28], which means the design of the fuzzy rules was made aiming at achieving a known behavior (found by extensive previous experimentation) and then performing an appropriate parametrization to obtain a good performance. The other methods mentioned are outside the scope of this paper. They could be addressed in the future, generalizing the method proposed in this paper. In particular, we believe that using evolutionary or metaheuristic approaches in designing an optimal Mamdani fuzzy system, including the rules and parametrization, could improve the results even more. The alternative of using machine learning is interesting. However, we believe that the requirement of having input-output training data available will complicate achieving a good design of the A. Santiago et al. / Information Sciences 471 (2019) 233–251 237 Table 1 Fuzzy inference system rules for operator Op. AND Antecedents Consequent Stagnation Utilization[Op] OpProb[Op] High High High Mid Mid Mid Low Low Low High Mid Low High Mid Low High Mid Low Mid Low Mid Mid Low Mid High Mid Low Fig. 2. Output surface of the Fuzzy inference system. fuzzy system for real-world problems, where the optimal Pareto front is unknown. The fuzzy system is designed to favor smooth changes in the probabilities of applying the operators to the detriment of rough ones, an idea represented by the surface of the rules in Fig. 2. The surface of a fuzzy model is a graphical representation of the outputs produced by the fuzzy model for different combinations of input data. The idea is similar to plotting a mathematical model, but instead of computing the output by applying a mathematical equation to the input variables, in a fuzzy model the input values are fuzzified, then the inference process is used, and the output values are finally calculated in the defuzzification process. The output of the model is the probability of applying each operator, while the inputs are the usefulness of each operator and the stagnation value in the defined time window. Once FIS assigns a probability to each operator (within a membership degree [0, 1]), a roulette mechanism is used to select the next operator to apply. It first aggregates all probability values into a variable Sum, then it iterates by randomly selecting one operator and subtracting its probability from Sum, until Sum ≤ 0. Then the last visited operator is selected. 3.2. Variation operators pool The pool of operators used in FAME is composed of two recombination operators (Simulated Binary Crossover (SBX), Differential Evolution (DE)) and two mutation operators (Polynomial mutation (PM), Uniform Mutation (UM)). They are operators that are very well-known in the related literature. The idea behind considering the same number of recombination and mutation operators is to attempt to find an equilibrium between exploitation and exploration. Adding more operators could make the algorithm less responsive to the needs of the search process. We carefully selected the operators so that we 238 A. Santiago et al. / Information Sciences 471 (2019) 233–251 can cover both linear and nonlinear movements. As is commonly done in the field of continuous optimization, the operators selected perform smooth changes in the variables. The operators SBX and PM are used in NSGA-II, SPEA2, and in many other algorithms, such as scatter search [26]. UM has been successfully applied in algorithms as OMOPSO [36] and Borg [15]. In addition, DE is currently one of the most effective techniques for solving continuous optimization problems in both the single- and multi-objective domains [4,20,23,46]. We adopt here the scheme DE/rand/1/bin, widely used in the literature. The UM implementation in this paper adds a random value from within [-0.05, 0.05] to the original variable: a linear movement at the variable level, while PM uses a distribution index ηc > 1.0, producing a nonlinear movement at the variable level. 4. The spatial spread density estimator The proposed density estimator is based on the mathematical definition of moment (μn ), given in Eq. (2) for discrete variables, as a measure of the shape of a set of points. μn = ∞ ( x i − c ) n p( x i ) , (2) i=1 where, xi is one of the values that the random variable can take, p(xi ) is the probability of xi , n is the order of the moment, and c is the reference point with respect to which the moment is defined. There are different definitions of moment, such as the raw moment or the central moment. The raw moment, also known as the absolute moment, or the moment of a function, is the moment with respect to the origin, when c = 0. The most popular moment is the central moment, used in statistics as a measure of central tendency. The central moment substitutes the value of c with the expected value or arithmetic mean x. For a uniform random variable, when the order is 2, it is easy to argue that the central moment for n points is the variance: μ2 = n 1 (xi − x )2 . n (3) i=1 Note also that Eq. (3) is widely used to compute variance independently of the probability density functions. As with the sample mean and sample central moments, where the data consists of a sample of size n without knowledge of the probability density function, the sample moments are estimators of the moments. According to the law of large numbers, the sample mean for a large number of observations is likely to be close to the expected value (a raw moment in the origin of order 1). We make use of these concepts to develop the SSD, using the absolute moment or moment about a point. The datapoints used are the normalized Euclidean distances between pairs of solutions in the objective space. We choose the difference between the maximum (Dmax ) and minimum (Dmin ) distance in the set of points as our reference. The main idea is that the distances between neighboring solutions in the objective space should be similar. When we minimize the moment of order 2 to (Dmax − Dmin ), the distances between points become equidistant. This is done for each solution i in SSD, as defined in Eq. 4. μ2 = n 1 n−1 2 Di, j − (Dmax − Dmin ) , (4) j=1, j=i minimizing μ2 may lead towards undesirable Pareto front approximations composed of clusters of similar solutions, while leaving other areas uncovered. An example is shown in Fig. 3a, where we present the Pareto front approximation found by a version of the steady-state NSGA-II [24], in which its CD density estimator was replaced by the μ2 metric. To solve this problem, we include a penalization function in the SSD with the aim of avoiding solutions with similar fitness values having a good SSD value. We define the SSD value of solution i as n 1 n−1 SSDi = Di, j − (Dmax − Dmin ) 2 j=1 j=i + k+1 (Dmax − Dmin )/Di, j , (5) j=2 where k defines the neighbors to be considered in the penalization function. We set it to the number of objectives of the problem. The diversity of solutions in the Pareto front approximation greatly improves with the penalization function defined in SSD compared to using the μ2 metric itself. The differences can be visually appreciated in the example displayed in Fig. 3. The procedure to compute SSD for a set of non-dominated solutions A is carefully described in Algorithm 1. Every solution is assigned an SSD fitness value, which measures its contribution to the diversity of the solution set A. Initially, this SSD fitness value is set to 0 (lines 3–5) for each solution except for those having the minimum and maximum values in each objective, whose SSD fitness value is set to −∞ (lines 6–9). These represent the extremes of the Pareto front, and we are A. Santiago et al. / Information Sciences 471 (2019) 233–251 239 Fig. 3. Pareto front approximations found by Steady-State NSGA-II implementing μ2 (left) or SSD (right) as density estimator. Results correspond to a representative run to solve the ZDT1 problem, after 25,0 0 0 function evaluations. Algorithm 1 Computation of the Spatial Spread Deviation. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: function Compute_SDD(A) n = |A| for i ∈ A do A[i].SSD = 0 end for for m = 1 to k do Sort (A, m ) A[1].SSD = A[n].SSD = −∞ end for Compute the matrix of normalized distances M Find Dmax and Dmin between all the pairs of solutions for i ∈ A do temp = 0 for ∀ j = i ∈ Archive do t emp = t emp + (M[i][ j] − (Dmax − Dmin ))2 end for temp = temp n−1 √ [i].SSD = A[i].SSD + temp end for for i ∈ A do Sort (M[i] ) temp = 0 for j = 2 to k + 1 do t emp = t emp + (Dmax − Dmin )/(M[i][ j] ) end for A[i].SSD = A[i].SSD + temp end for end function always interested in keeping them. After the initialization, the procedure computes the normalized Euclidean distances Di, j between each pair of solutions in the objective space. These distances are stored in a matrix M (of size n2 , where n = |A|). The maximum and minimum of these distances are stored in the variables Dmax and Dmin , respectively (line 11). The matrix M and the variables Dmax and Dmin are then used to update the SSD fitness of every solution (lines 12–19), as indicated in Eq. (4). Solutions with close SSD fitness values are penalized in lines (20–27), as defined in Eq. (5). The complexity of SSD method is determined by the loop in lines 20–27. In every iteration of the loop, a row of M is sorted. In addition, it requires performing a loop on the k nearest solutions for computing the penalty (k ≤ n). The complexity of the sorting (O(nlogn)) is higher than the complexity of this loop (O(k)). Therefore, the complexity of the whole procedure is O(n2 log n). 240 A. Santiago et al. / Information Sciences 471 (2019) 233–251 5. Fuzzy adaptive multi-objective evolutionary algorithm with Spatial Spread Deviation (FAME) The main features of FAME can be summarized as: • • • FAME applies different variation operators to generate new solutions, dynamically determining, during the search, the operator to use each time, employing the mechanism described in Section 3. FAME makes use of the new SSD density estimator, presented in Section 4, to maintain the diversity of an external archive with the best non-dominated solutions found during the search. FAME follows a steady-state population update scheme. The reasons for the first two features are discussed in Sections 3 and 4, respectively. Regarding the use of the steadystate scheme, it has been shown to enhance the performance of several state-of-the-art multi-objective evolutionary algorithms [3,12,15,24,46,47]. Although it introduces some computational overhead (the density estimators have to be recalculated after adding every new solution in the population), the algorithm can benefit from up-to-date information to make the appropriate decisions [12,24]. Algorithm 2 includes the pseudo-code of FAME. The inputs of the algorithm are: the problem to solve, the termination Algorithm 2 Pseudo-code of FAME. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: Step 1 Initialization Initialize(Pop) Archive.add (Pop) SetOperatorsP rob(1.0 ) window = Stagnation = 0 Step 2 Main loop while Stopping criterion not reached do Step 2.1 Parents selection for i = 1 to |Parents| do if RandomDouble(0, 1 ) ≤ β then Parents[i] ← T ournamentSSD(Archive ) else Parents[i] ← T ournamentSSD(Pop) end if end for Step 2.2 Reproduction Operator ← Roulette(OpP rob) O f f spring ← Operator (Parents ) E valuate(O f f spring) OpUse = U pdateUse(Operator ) window + + Step 2.3 Archive Update if !(Archive.add (O f f spring)) then Stagnation = U pdateStagnation() end if Step 2.4 Call to the Fuzzy Inference System if window == windowSize then U pdateOpP rob(OpUse, Stagnation ) window = Stagnation = 0 end if Step 2.5 Population Update NewPop ← Pop.add (O f f spring) f ast _non_dominated_sortSSD(NewPop) Pop ← RemoveW orstSolution(NewPop) end while Step 3 Output return Archive criterion, the set of recombination operators (Operators), the population size (PopulationSize), and the size of the window (windowSize) to be used by the selection mechanism (as presented in Section 3). The output of FAME is the computed Pareto front approximation for the considered problem. The code of the algorithm is composed of three main steps. Step 1 is the initialization phase. Firstly, it creates the main population of the algorithm (Pop) by inserting PopulationSize randomly generated solutions, which are evaluated and assigned an SSD fitness of 0.0. All the non-dominated solutions just generated are added to the external archive. The initial probability of applying each recombination operator is set to 1.0. A. Santiago et al. / Information Sciences 471 (2019) 233–251 241 Step 2 is the main loop of FAME. It iterates until the termination condition is reached. This step is divided into smaller substeps: • • • • • In Step 2.1, the parents are chosen. They are selected from either the external archive (Archive) or the main population (Pop), based on a randomized process that is controlled by an external parameter β ∈ [0:1] (it may be tuned to give some preference to elitism over diversity, or the opposite). Once this decision has been made, a tournament method similar to the one used in NSGA-II is applied: the choice is firstly made according to the ranking value, and if solutions are non-dominated, then the decision is driven by the SSD fitness value. Step 2.2 selects and applies a recombination operator. A roulette mechanism (as explained in Section 3) is used to select the operator, which is then used to generate an offspring from the parents chosen in Step 2.1. The use of the chosen 1.0 operator is increased by windowSize (line 16), and the window counter is incremented. Step 2.3 checks whether the offspring should be included in the external archive or not. In the latter case, the Stagnation 1.0 variable is increased by windowSize . Step 2.4 invokes FIS (Section 3) when window == windowSize to update the application probability for all operators, according to their use and the value of Stagnation. It also resets the values of window and Stagnation. Step 2.5 updates the population with the solution created in Step 2.2. Then, f ast _non_dominated_sort is applied to divide the population into different fronts. The solution in the last front with the worst SSD value (when considering only solutions in that front) is removed. Finally, Step 3 returns the solutions contained in the external archive as the output computed by FAME. 6. Algorithms in the comparison This section describes the algorithms that have been considered in our experiments. 6.1. BORG Borg [15] is a steady-state multi-objective evolutionary algorithm for solving many-objective multimodal problems. Borg uses an -box archive, based on the dominance concept, to keep a diverse representation out of all the non-dominated solutions found during the search. Borg also makes use of a metric called -progress, based on the content of this archive, for identifying when the search is stagnated. When stagnation is detected, Borg re-starts the search by creating a new population including random solutions as well as solutions taken from the -box archive. The size of this newly generated population does not need to match the size of the previous one; instead, its size is determined based on the number of solutions contained in the -box archive. Borg also considers several recombination operators, whose use is determined adaptively based on how they have contributed to the search so far. More specifically, their contribution is measured in terms of how many solutions they produced that are in the -box archive. 6.2. SMS-EMOA SMS-EMOA [3] is a steady-state multi-objective EA designed to compute a Pareto front approximation with maximal HV. The HV contribution is used during the search process to determine which solutions are kept or discarded. In every generation, the current population, of size N, and a newly generated solution are considered to update the population. This set of N + 1 solutions is partitioned into different subsets of non-dominated solutions, using the ranking method of NSGAII. The solution from the last subset with the lowest HV contribution is removed. The remaining N solutions become the population of the next generation. 6.3. SMEA SMEA [46] combines evolutionary search and machine learning. More specifically, self-organizing maps, an unsupervised machine learning method, is applied to determine neighboring relationships within the current population of each generation. These relationships are used to determine clusters within which the recombination operator can be applied. As with the previous algorithms, SMEA performs a steady-state evolution. In each generation, one solution is created, and the population of the next generation is built using a procedure similar to the one described for the SMS-EMOA algorithm. 6.4. SMPSOhv SMPSOhv [27] is an extension of the SMPSO [25] multi-objective particle swarm algorithm. The main feature of SMPSO is the use of a velocity constraint mechanism that avoids having the particles often fly beyond the limits of the search space due to having high velocities. All the non-dominated solutions found during the search are inserted into an external archive of limited size. SMPSOhv uses the HV quality indicator to keep the size of this archive within the specified limit. In particular, every time a new insertion makes the archive grow beyond this size, the solution that contributes least to the HV of the Pareto front approximation is removed, following a similar procedure to that in SMS-EMOA and SMEA. 242 A. Santiago et al. / Information Sciences 471 (2019) 233–251 Table 2 Parameter settings. MOEA/D-DE SMEA Population size: Differential evolution: Polynomial mutation: Neighborhood size: 101/210 for bi/tri-objective MOPs pr = 1.0, F = 0.5, CR = 1.0 Pm = 1/n, nm = 20 20, Prob. mating: 0.9 10 0/20 0 for bi/tri-objective MOPs pr = 1.0, F = 0.9, CR = 1.0 Pm = 1/n, nm = 20 5, Prob. mating: 0.9 SMS-EMOA SMPSOhv Population size: Polynomial mutation: Recombination: 10 0/20 0 for bi/tri-objective MOPs Pm = 1/n, nm = 20 SBX: pr = 0.9, ηc = 20.0 10 0/20 0 swarm/archive bi/tri-objective MOPs Pm = 1/n, nm = 20 Max/Min C12 = 2.5/1.5, r12 = 1/0 BORG FAME Population size: Archive size: Selection: Recombination: Max/Min 10 0 0 0/10 0 ∞ Tournament size Max/Min: 200/2 pr = 1.0, DE: CR = 1.0, F = 0.5 SBX: ηc = 20.0 SPX: = 3.0 PCX: wζ = 0.1, wη = 0.1 UNDX: σζ = 0.5, ση = 0.35 pm = 1/n, nm = 20 pm = 1/n Init/Max 20 0/20 0 0 25/100 for bi/tri-objective MOPs 10 0/20 0 for bi/tri-objective MOPs Tournament size: 5, β = 0.9 pr = 1.0, DE: CR = 1.0, F = 0.5 SBX: ηc = 20.0 Polynomial mutation: Uniform mutation: Window size: pm = 0.30, nm = 20 pm = 0.30 14 6.5. MOEA/D-DE MOEA/D [47] is a steady-state multi-objective evolutionary algorithm that employs an aggregation approach to decompose a multi-objective optimization problem into a set of single-objective optimization ones–called subproblems from now on. Each solution in the population aims at optimizing one of these subproblems. To this end, a neighboring structure is defined among all the solutions within the population. The resulting neighbors define the mating pools for the recombination operator and the subproblems the generated solution can optimize. The version used in this paper is MOEA/D-DE [20], which employs differential evolution as the recombination operator and Tchebycheff as decomposition method. 7. Experimental setup This section details our experimental setup. There is in the literature a large number of benchmark problems, proposed to assess the performance of multi-objective algorithms. The most well-known ones are the ZDT, DTLZ, WFG, LZ [20], and GLT [46] problem families. Among these, we have adopted the two latter ones, which are the most recently proposed ones. They are characterized by having complex Pareto sets, which make them harder to solve than the other problem sets. For the evaluated algorithms, we took the implementations provided by the jMetal framework [28], except for BORG and SMEA, for which the implementations of the original authors were used. All these algorithms were configured as suggested in the original papers in which they were presented. Table 2 summarizes these configurations. The parameter values for FAME were determined after a preliminary experimental phase. The values for the population and archive of non-dominated solutions sizes were fixed to 25/100 and 10 0/20 0, respectively for bi/tri-objective MOPs. The tournament size for selecting parents was set to 5. The solutions for the tournament were chosen from the archive with 90% probability (i.e., β = 0.9). The configuration for the recombination operators was the following: SBX uses a distribution index ηc = 20.0, DE uses a crossover constant CR = 1.0 and a select weighting factor F = 0.5. Regarding the mutation operators, both the PM and UM were applied to every variable with a probability of pm = 0.3. In addition, the former was configured to use ηm = 20.0. The stopping condition was set to 45,0 0 0 and 150,0 0 0 function evaluations for the GLT and LZ problems, respectively, as is commonly done in the literature. The Pareto front approximations produced by the different algorithms were compared by means of three well-known indicators: Additive Epsilon (I + ) [50], Generalized Spread (IGS ) [48], and Hypervolume (IHV ) [42]; as metrics for the convergence of the solutions, their diversity, or both, respectively. The indicators I + and IGS are to be minimized, while IHV must be maximized. Each algorithm solved every problem for 100 independent runs, and the quality of the obtained fronts was computed according to the three mentioned indicators. The results are presented in this paper as the median, x˜, and interquartile range, IQR, of these values, as measures of location (or central tendency) and statistical dispersion, respectively. In the tables, the best and second best values for every problem are emphasized with dark and light gray background, to allow easily identifying the best performing algorithms. The Wilcoxon signed-rank test was applied to assess the statistical confidence in the pairwise comparisons between FAME and the state-of-the-art algorithms, at the 95% confidence level: ‘’ is used when FAME is statistically better than the corresponding algorithm, while ‘ ’ means the opposite. The symbol ‘–’ is used when there is no statistical difference. Finally, the Friedman statistical test was applied to analyze the overall performance of the algorithm across all considered problems [13]. A. Santiago et al. / Information Sciences 471 (2019) 233–251 2 Pareto Front MOEA/D-DE 1.5 1.5 1 1 F2 F2 2 0.5 0 243 Pareto Front FAME 0.5 0 0.2 0.4 0.6 F1 0.8 1 0 0 0.2 0.4 0.6 0.8 1 F1 Fig. 4. Best Pareto fronts reported by MOEA/D-DE and FAME, for GLT4 according IHV . 8. Results We show in Table 3 the median and interquartile range of the three considered quality indicators for all the evaluated benchmark problems. Overall, it is possible to see at a glance that FAME stands out in our comparison, as it achieves the best or second best figures in most of the cases. When focusing on IHV , MOEA/D-DE stands as the best competitor to FAME. It outperforms our approach in five problems (namely, LZ09F2, LZ09F4, LZ09F9, GLT1, and GLT4), but it is outperformed by FAME for seven other problems, with statistical significance. In the comparison with the other four algorithms, FAME is only significantly outperformed by SMPSOhv on LZ09F6. We can see how FAME is better than BORG and SMS-EMOA in all problems with statistical significance. This is also the case for SMEA, where no statistical difference can be guaranteed for one problem. We now analyze one of the cases when FAME is outperformed by another algorithm in terms of IHV . Fig. 4 depicts the best Pareto front approximations computed by MOEA/D-DE and FAME for the GLT4 problem, according to IHV . The figure shows that the front computed by FAME provides a clearly better distribution of points than the one computed by MOEA/DDE in spite of the results in Table 3. Actually, the better performance of MOEA/D-DE for this problem is due to its robustness, as evidenced by its smaller computed value of IQR: 3.5E − 4 against the 2.4E − 1 obtained by FAME. According to I + , FAME achieves the best values nine out of fifteen problems, and four second-best values. It outperforms SMS-EMOA and BORG in all problems with statistical significance. SMPSOhv and SMEA outperform FAME in two problems each: LZ09F4 and GLT3, respectively, LZ09F6 and GLT5. MOEA/D-DE and SMEA are the best algorithms after FAME, each providing two best results. If we consider the pairwise comparison of FAME with the rest of the algorithms (15 problems and 5 algorithms, making a total of 75 pairwise comparisons), our proposal computes better results with statistical significance in 65 cases, and it is outperformed by another algorithm in only 6 cases. In the remaining four comparisons, no statistical significance was found. Focusing on IGS , FAME also stands out as the best performing algorithm, beating the others in 60 out of the 75 pairwise comparisons. In this case, SMEA appears as its strongest competitor. In the comparison FAME versus SMEA, the former obtains better IGS values in five problems while it is outperformed by the latter in five cases, all at the 95% significance level. The reason for the performance of SMEA may be the fact that the neighborhood relationships are updated every time a new solution is added to the population, therefore allowing the algorithm to gather information about the distribution of the solutions at every search stage. The MOEA/D-DE algorithm, which was the second most competitive algorithm according to IHV and I + , performs poorly in terms of IGS . The use of weight vectors, although providing accurate results, seems to compromise the diversity of the solutions, as demonstrated by the fact that MOEA/D-DE cannot achieve the best result for any of the evaluated problems. It attracts attention that FAME performs poorly on the GLT3 problem in terms of this indicator (it is statistically worse than all the evaluated algorithms). To further investigate this result, Fig. 5 shows the Pareto fronts with the best IGS value for FAME, compared to the three best-performing algorithms for this problem. Despite its worse IGS values, the computed approximations cover the whole Pareto front, while BORG only found nine solutions (although well distributed), the front found by MOEA/D-DE presents a highly dense region in the elbow of the front, and SMS-EMOA can only find two solutions on the right-hand side of the elbow. If we focus on the three-objective problems studied (LZ09F6, GLT5, and GLT6), we can see that FAME also outperforms the other algorithms in terms of convergence and diversity of the computed approximations to the Pareto front. Only SMPSOhv beats FAME in terms of IHV and I + on LZ09F6. An example of the fronts computed by FAME and SMEA is shown in Fig. 6. In this case, the best Pareto fronts computed by each algorithm according to the IGS metric for the GLT6 problem are plotted. The figure visually illustrates the clearly better Pareto front approximation computed by FAME. Finally, we also analyzed the overall performance of the algorithms, for all studied problems. Table 4 shows the Friedman ranks of the six algorithms for the three considered indicators, with 95% significance. 244 A. Santiago et al. / Information Sciences 471 (2019) 233–251 Table 3 Median and IQR of the 6 algorithms over 100 independent runs in terms of IHV , I + and IGS . Dark/light gray emphasizes the best/second-best results. A. Santiago et al. / Information Sciences 471 (2019) 233–251 1 245 1 Pareto Front BORG 0.8 0.6 0.6 F2 F2 0.8 Pareto Front FAME 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 0 1 0 0.2 0.4 F1 0.6 0.8 1 F1 1 1 Pareto Front MOEA/D-DE 0.6 0.6 F2 0.8 F2 0.8 Pareto Front SMS-EMOA 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 F1 0.6 0.8 1 F1 Fig. 5. Best Pareto fronts provided by the three best performing algorithms, as well as FAME, for GLT3 problem according to IGS . Fig. 6. Best Pareto fronts provided by SMEA and FAME for GLT6 according to IGS . Table 4 Average rankings of the algorithms. IHV I + IGS Algorithm Rank Algorithm Rank Algorithm Rank FAME MOEA/D-DE SMEA SMPSOhv SMS-EMOA BORG 1.78 2.57 3.32 3.90 4.63 4.79 FAME MOEA/D-DE SMEA SMPSOhv BORG SMS-EMOA 1.75 2.85 3.22 3.52 4.77 4.90 FAME SMEA SMPSOhv MOEA/D-DE BORG SMS-EMOA 2.09 3.02 3.21 4.11 4.16 4.42 246 A. Santiago et al. / Information Sciences 471 (2019) 233–251 Table 5 Median and IQR of FAME using CD vs. SSD density estimators over 100 independent runs. Dark/light gray background emphasizes the best/secondbest results. We can see that FAME is clearly the best performing algorithm, in terms of all studied indicators. The second-best algorithm in terms of IHV and I + is MOEA/D-DE, followed by SMEA, SMPSOhv, in that order. In terms of IGS the ranking changes and SMEA stands as the second best algorithm, followed by SMPSOhv, MOEA/D-DE, BORG, and SMS-EMOA. 9. Performance analysis of SSD density estimator Many multi-objective algorithms use density estimators for computing well distributed approximations to the Pareto front. One commonly adopted option is based on the HV contribution of each solution to the computed approximation. Although it often provides accurate results, its exponential complexity with the number of objectives [42] compromises the scalability of the algorithm. Another option we can find in the literature is the Pareto strength and raw fitness from SPEA2 [49]. It presents lower complexity than HV, O(n3 ), but we still consider it high. The Crowding Distance (CD) used by NSGA-II [8] is a low complexity alternative (O(k nlogn)) [8], but its performance decreases for problems of three or more objectives. The SSD density estimator proposed in this paper is a suitable solution that can be adopted by other algorithms in the literature. Its complexity O(n2 logn) is slightly higher than that of CD. However, our proposal is highly competitive with CD in two-dimensional problems, and more suitable for three-dimensional ones, as we show in this section. We now compare the SSD and CD density estimators. To this end, we configure FAME in two ways: one with the SSD technique the other with the CD technique, to manage the diversity of solutions in the archive/population. The latter configuration will be referred to as FAME-CD, while the former is called FAME (as before). The results obtained by these two configurations are summarized in Table 5 using the median and IQR values after 100 independent runs for the indicators IHV , I + , and IGS . As in the previous section, the symbol ‘’ is used to show that our algorithm statistically outperforms the compared one (in this study, FAME-CD), with 95% significance. The opposite case is indicated with ‘ ’, and no statistical difference is indicated with ‘ - ’. In terms of the IHV metric, we can see that both algorithms perform similarly, statistically outperforming the other one in four problems. FAME-CD outperforms FAME in three problems from the LZ benchmark and one from the GLT benchmark, whereas FAME is superior in two problems from the LZ benchmark and two more from GLT benchmark. No statistical differences were found in seven problems. It is important to note that FAME offers statistically better performance than FAME-CD in all the three-objective problems (LZ09F6, GLT5, GLT6). In terms of the I + metric, a similar behavior can be observed. FAME-CD is superior in three problems (LZ09F1, GLT1, and GLT2), while FAME statistically outperforms it in three problems too: LZ09F6, GLT5, and GLT6. No statistical differences A. Santiago et al. / Information Sciences 471 (2019) 233–251 247 Table 6 Global percentage of failed contributions per operator (one independent run). Problem DE SBX PM UM LZ09F1 LZ09F2 LZ09F3 LZ09F4 LZ09F5 LZ09F6 LZ09F7 LZ09F8 LZ09F9 GLT1 GLT2 GLT3 GLT4 GLT5 GLT6 Average 26.93% 3.50% 3.70% 5.06% 4.30% 6.21% 6.01% 2.83% 2.83% 8.69% 5.38% 30.96% 9.34% 14.88% 16.44% 9.80% 14.42% 5.69% 8.21% 10.26% 11.22% 18.09% 4.74% 4.78% 4.82% 4.32% 4.34% 13.63% 4.05% 16.14% 19.09% 9.59% 12.10% 11.19% 9.68% 9.39% 7.72% 20.60% 0.92% 0.97% 12.97% 3.69% 8.48% 7.45% 4.78% 32.70% 29.48% 12.37% 17.17% 20.15% 23.86% 24.84% 21.12% 25.11% 0.90% 0.79% 23.30% 8.46% 17.52% 12.88% 8.82% 42.86% 38.38% 18.81% were found in the other nine problems studied. Using the SSD density estimator also lead to statistically better performance for all three-objective problems, according to I + . The tie between FAME and FAME-CD in terms of IHV and I + is broken when considering IGS . In terms of the diversity of solutions in the Pareto front approximations, FAME-CD can only outperform FAME in one single bi-objective problem, namely LZ09F7, whereas our proposed algorithm was found to provide statistically better results in seven problems: four bi-objective problems from the LZ benchmark (LZ09F1, LZ09F2, LZ09F4, LZ09F5) and three from the GLT (GLT2, GLT5, GLT6). FAME statistically outperformed FAME-CD in the GLT5 and GLT6 three-objective problems, and no statistical differences were found for the other three-dimensional problem, LZ09F6. To summarize, the experimental results show a significant improvement in the distribution of the solutions found when using SSD, as measured by the IGS metric. The results are similar regarding the other two metrics. The meaningful improvements in IGS , together with the similar performance in accuracy, and the superior performance for three-objective problems, justify our decision to design SSD and implement it in FAME. Although the complexity of SSD is greater than that of CD, it is polynomial, therefore lower and more scalable than other common alternatives, such as those based on HV contributions. We conclude that SSD stands out as a highly competitive new player for the estimation of the density of the solutions in the Pareto front approximation of multi-objective algorithms. The experimental results from Section 8 include algorithms using different environmental selection schemes, such as hypervolume contributions (SMS-EMOA), weight vectors (MOEA/DDE), and − box (Borg), besides SSD (FAME), which is a strong competitor. 10. Analysis of the adaptation of the operators in FAME A key issue in the design of FAME is the implementation of an adaptive mechanism for the choice of the variation operators to be applied each time. In this section, we get an insight into the choices made by the algorithm during the evolution, and its impact on the overall performance. We have kept track of the contribution of the operators to the archive of non-dominated solutions in every time window when solving all the problems considered in this research. Although this obviously changes for different runs due to the stochastic behavior of the algorithm, it is possible to observe a similar pattern in each problem and independent run. We monitor the contributions (i.e., the number of solutions generated that are incorporated in the archive of non-dominated solutions) of every operator in the time window, before calling FIS. We compute, for every operator, the percentage of failed offspring solutions as the difference between the number of solutions it generated and those that contributed to the archive. These values are shown in Table 6 for every operator in all problems, after one single run. The adaptive operators choice is not an easy task: an optimal adaptation would mean that all solutions generated during the search by every operator are contributed to the archive. FAME makes a remarkable adaptation in GLT1 and GLT4, where the percentage of failed offspring solutions is less than 10% (a desirable behavior) for all operators. The average percentage of failed offspring solutions of all operators for all problems is roughly between 10% and 19%, meaning that over 80% of all generated solutions during the search contributed to the archive. The evolution of the contribution of the operators is also graphically studied for some representative problems. In order to graphically present the adaptive behavior, only a sample of 100 time windows is considered (out of over 10,0 0 0 time windows for the LZ benchmark, and 3,0 0 0 for GLT). Therefore, we have plotted one value every 107 (32) time windows for the LZ (GLT) problems. We now graphically analyze the contribution to the performance of FAME of the mechanism for selecting operators. The plots show the use of the different operators and the evolution of the HV of the solutions in the archive. The y axis shows both the percentage of use of the operator (solid gray line) and the percentage of use of the operator leading to an update in the archive (dashed line). If the values of the two lines are the same, then 100% of the generated offspring contributed 248 A. Santiago et al. / Information Sciences 471 (2019) 233–251 Fig. 7. Contribution of operators in FAME for problem LZ09F8 and evolution of IHV . to the archive. The x axis presents the time step of the evolution, discretizing all the evaluations performed into 100 steps (from 1 to 101). The circles represent the IHV values, normalized by the extreme points of the optimal Pareto fronts. Fig. 7 shows the case of problem LZ09F8. For this problem, the number of offspring solutions that do not contribute to the archive is less than 1% for the UM and PM operators, and never higher than 5% for the others. The good predictions made for the UM and PM operators is captured in the plots at the bottom, showing an almost perfect match between the predictions and the real contributions of the two operators, all along the run. In the case of DE and SBX (upper plots), there is also an accurate match between predictions and actual contributions, with only a few gaps in the early stages of the search. We would like to emphasize how the contribution of SBX is null at the beginning, until time step 5 (i.e., generation 280), when it starts contributing. FAME is able to detect the change, accurately predicting its contribution in the next steps. These accurate predictions allow the algorithm to keep improving the Pareto front approximation over the whole run. As in the case of LZ09F8, the selection mechanism used in FAME also offers a highly accurate performance for GLT1, as depicted in Fig. 8. The percentage of generated solutions that do not contribute to the archive is less than 10% for all operators. In this case, we can draw the same conclusions as before: the hypervolume value gradually improves through the search, driven by the capacity of FAME to accurately predict the contributions of the considered operators in the different stages of the search. We would like to emphasize how FAME detects, at around time step 50, that the contribution of DE suddenly changes from almost 0% to over 60%, increasing at the same time the contribution of PM, and significantly reducing it in the case of SBX and UM, down to around 0%. FAME is able to predict all these simultaneous changes at that point, showing a highly adaptive behavior. We now analyze the cases in which FAME obtains the worst percentages in Table 6. They correspond to LZ09F1, GLT3, GLT5, and GLT6. For all of them we can observe a similar behavior: a quick convergence of the Pareto front approximation to highly accurate HV values, which gets stuck during the rest of the run, negatively impacting on the prediction errors (only a few of the newly generated solutions from that point contribute to the archive). Although the prediction errors are high for these four problems, FAME is extremely competitive with the compared state-of-the-art algorithms, significantly outperforming them in 44 out of the 60 cases (comparisons against five algorithms, for three performance metrics and four problems), as shown in Table 3. A representative example is shown in Fig. 9 for the GLT5 problem. It can be seen how the archive converges to less than 10% error from the optimum in a few time steps, making it extremely difficult to have new solutions contribute. A. Santiago et al. / Information Sciences 471 (2019) 233–251 Fig. 8. Contribution of the operators in FAME for problem GLT1 and evolution of IHV . Fig. 9. Contribution of the operators in FAME for problem GLT5 and evolution of IHV . 249 250 A. Santiago et al. / Information Sciences 471 (2019) 233–251 11. Conclusions and Future Research In this paper we have proposed a new multi-objective EA, called FAME (standing for Fuzzy Adaptive Multi-objective Evolutionary algorithm). It was shown to clearly outperform five algorithms from the state of the art on the well-known LZ and GLT benchmarks, containing both two- and three-dimensional problems. The comparison was made in terms of three considered quality metrics: the well known hypervolume, additive epsilon, and generalized spread functions. For every metric, FAME stands out as the best algorithm of ones included in the comparision. FAME implements two innovative components that notably contribute to the excellent performance of the algorithm in the studied benchmarks: The first one is that it implements an adaptive selection of operators based on a fuzzy logic engine to predict, among a set of recombination and mutation operators, those that are most likely to contribute to the improvement of the current Pareto front approximation, assigning them different probabilities of being employed, according to the predicted values. These predictions are based on the contributions of the different operators to the search during the previous generations. The second one is that we designed a pioneering effective density estimator with polynomial complexity, called Spatial Spread Deviation (SSD), that is used to manage the non-dominated solutions of the approximate Pareto front. Thorough studies were made to analyze the contribution of the two novel components to the performance of the algorithm, which can be adopted by other multi-objective optimization algorithms with the expectation of improving their performance. This will be addressed in future research. In Addition, we plan to evaluate the performance of the algorithms when solving combinatorial problems, as well as considering real-world applications. Acknowledgments A. Santiago and H. Fraire thank Consejo Nacional de Ciencia y Tecnología and Tecnolgico Nacional de México for contracts [360199, 60 02.16-P] and [280 081] (Redes Temáticas). B. Dorronsoro acknowledges the Spanish Ministerio de Economía y Competitividad and European Regional Development Fund for the support provided under contracts [TIN2014-60844-R] (SAVANT project) and [RYC-2013-13355]. References [1] E. Alba, B. Dorronsoro, Cellular Genetic Algorithms, Springer–Verlag, 2008. [2] J. Barraza, P. Melin, F. Valdez, C. González, Fireworks algorithm (FWA) with adaptation of parameters using fuzzy logic, in: Nature-Inspired Design of Hybrid Intelligent Systems, Springer–Verlag, 2017, pp. 313–327. [3] N. Beume, B. Naujoks, M. Emmerich, SMS-EMOA: Multiobjective selection based on dominated hypervolume, Eur. J. Oper. Res. 181 (3) (2007) 1653–1669. [4] B. Bokovi, J. Brest, Protein folding optimization using differential evolution extended with local search and component reinitialization, Inf. Sci. 454–455 (2018) 178–199. [5] P. Chakraborty, S. Das, G.G. Roy, A. Abraham, On convergence of the multi-objective particle swarm optimizers, Inf. Sci. 181 (8) (2011) 1411–1425. [6] C.A.C. Coello, G.B. Lamont, D.A.V. Veldhuizen, Evolutionary Algorithms for Solving Multi-Objective Problems, Springer–Verlag, 2006. [7] K. Deb, S. Karthik, T. Okabe, Self-adaptive simulated binary crossover for real-parameter optimization, in: Genetic and Evolutionary Computation Conference, ACM, 2007, pp. 1187–1194. [8] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182–197. [9] K. Deb, A. Sinha, S. Kukkonen, Multi-objective test problems, linkages, and evolutionary methodologies., in: Genetic and Evolutionary Computation Conference, ACM, 2006, pp. 1141–1148. [10] B. Dorronsoro, G. Danoy, A.J. Nebro, P. Bouvry, Achieving super-linear performance in parallel multi-objective evolutionary algorithms by means of cooperative coevolution, Comput. Oper. Res. 40 (6) (2013) 1552–1563. [11] J. Durillo, A. Nebro, C. Coello, J. García-Nieto, F. Luna, E. Alba, A study of multiobjective metaheuristics when solving parameter scalable problems, IEEE Trans. Evol. Comput. 14 (4) (2010) 618–635. [12] J. Durillo, A. Nebro, F. Luna, E. Alba, On the effect of the steady-state selection scheme in multi-objective genetic algorithms, in: Evolutionary Multi-Criterion Optimization, Springer–Verlag, 2009, pp. 183–197. [13] S. García, D. Molina, M. Lozano, F. Herrera, A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: A case study on the CEC’2005 special session on real parameter optimization, J. Heuristics 15 (6) (2008) 617–644. [14] I. Giagkiozis, R.C. Purshouse, P.J. Fleming, Generalized decomposition, in: International Conference on Evolutionary Multi-Criterion Optimization, Springer–Verlag, 2013, pp. 428–442. [15] D. Hadka, P. Reed, Borg: An auto-adaptive many-objective evolutionary computing framework, Evol. Comput. 21 (2) (2013) 231–259. [16] V.L. Huang, A.K. Qin, P.N. Suganthan, M.F. Tasgetiren, Multi-objective optimization based on self-adaptive differential evolution algorithm, in: IEEE Congress on Evolutionary Computation, 2007, pp. 3601–3608. [17] V.L. Huang, S.Z. Zhao, R. Mallipeddi, P.N. Suganthan, Multi-objective optimization using self-adaptive differential evolution algorithm, in: IEEE Congress on Evolutionary Computation, 2009, pp. 190–194. [18] J. Huisman, J. Rings, J. Vrugt, J. Sorg, H. Vereecken, Hydraulic properties of a model dike from coupled Bayesian and multi-criteria hydrogeophysical inversion, J. Hydrol. 380 (1) (2010) 62–73. [19] A.W. Iorio, X. Li, Solving rotated multi-objective optimization problems using differential evolution, in: Australian Conference on Artificial Intelligence, 2004, pp. 861–872. [20] H. Li, Q. Zhang, Multiobjective optimization problems with complicated Pareto sets, MOEA/D and NSGA-II, IEEE Trans. Evol. Comput. 2 (12) (2009) 284–302. [21] L. Mart, J. Garca, A. Berlanga, J.M. Molina, A stopping criterion for multi-objective optimization evolutionary algorithms, Inf. Sci. 367–368 (2016) 700–718. [22] P. Melin, F. Olivas, O. Castillo, F. Valdez, J. Soria, M. Valdez, Optimal design of fuzzy classification systems using PSO with dynamic parameter adaptation through fuzzy logic, Expert Syst. Appl. 40 (8) (2013) 3196–3206. [23] M. Ming, R. Wang, Y. Zha, T. Zhang, Pareto adaptive penalty-based boundary intersection method for multi-objective optimization, Inf. Scie. 414 (2017) 158–174. [24] A. Nebro, J. Durillo, On the effect of applying a steady-state selection scheme in the multi-objective genetic algorithm NSGA-II, in: Nature-Inspired Algorithms for Optimization, Springer–Verlag, 2009, pp. 435–456. A. Santiago et al. / Information Sciences 471 (2019) 233–251 251 [25] A. Nebro, J. Durillo, J. García-Nieto, C. Coello, F. Luna, E. Alba, SMPSO: A new PSO-based metaheuristic for multi-objective optimization, in: Multi-Criteria Decision-Making, 2009, pp. 66–73. [26] A. Nebro, F. Luna, E. Alba, B. Dorronsoro, J.J. Durillo, A. Beham, AbYSS: Adapting Scatter Search to Multiobjective Optimization, IEEE Transactions on Evolutionary Computation 12 (4) (2008) 439–457. [27] A.J. Nebro, J.J. Durillo, C.A.C. Coello, Analysis of leader selection strategies in a multi-objective particle swarm optimizer, in: Congress on Evolutionary Computation, 2013, pp. 3153–3160. [28] A.J. Nebro, J.J. Durillo, M. Vergne, Redesigning the jMetal multi-objective optimization framework, in: Genetic and Evolutionary Computation Conference, ACM, 2015, pp. 1093–1100. [29] S.F.H. Noorbin, A. Alfi, Adaptive parameter control of search group algorithm using fuzzy logic applied to networked control systems, Soft Comput. (2017) 1–22. [30] P. Ochoa, O. Castillo, J. Soria, Differential evolution using fuzzy logic and a comparative study with other metaheuristics, in: Nature-Inspired Design of Hybrid Intelligent Systems, Springer-Verlag, 2017, pp. 257–268. [31] F. Olivas, F. Valdez, O. Castillo, Gravitational search algorithm with parameter adaptation through a fuzzy logic system, in: Nature-Inspired Design of Hybrid Intelligent Systems, Springer–Verlag, 2017, pp. 391–405. [32] F. Olivas, F. Valdez, O. Castillo, C.I. Gonzalez, G. Martinez, P. Melin, Ant colony optimization with dynamic parameter adaptation based on interval type-2 fuzzy logic systems, Applied Soft Computing 53 (2017) 74–87. [33] C. Peraza, F. Valdez, O. Castillo, An improved harmony search algorithm using fuzzy logic for the optimization of mathematical functions, in: Design of Intelligent Systems Based on Fuzzy Logic, Neural Networks and Nature-Inspired Optimization, Springer–Verlag, 2015, pp. 605–615. [34] J. Pérez, F. Valdez, O. Castillo, A new bat algorithm with fuzzy logic for dynamical parameter adaptation and its applicability to fuzzy control design, in: Fuzzy Logic Augmentation of Nature-Inspired Optimization Metaheuristics: Theory and Applications, Springer, 2015, pp. 65–79. [35] Y. Qi, X. Ma, F. Liu, L. Jiao, J. Sun, J. Wu, MOEA/D with adaptive weight adjustment, Evol. Comput. 22 (2) (2014) 231–264. [36] M. Reyes, C. Coello, Improving PSO-based multi-objective optimization using crowding, mutation and -dominance, in: Evolutionary Multi-Criterion Optimization Conference, Springer-Verlag, 2005, pp. 509–519. [37] S. Roy, U. Chakraborty, Introduction to soft computing: : Neuro-fuzzy and genetic algorithms, Dorling-Kindersley, 2013. [38] A. Santiago, H.J. Fraire Huacuja, B. Dorronsoro, J.E. Pecero, C. Gómez Santillan, J.J. González Barbosa, J.C. Soto Monterrubio, A survey of decomposition methods for multi-objective optimization, in: Recent Advances on Hybrid Approaches for Designing Intelligent Systems, in: Studies in Computational Intelligence, Vol. 547, Springer-Verlag, 2014, pp. 453–465. [39] G. Toscano Pulido, C. Coello Coello, The micro genetic algorithm 2: Towards online adaptation in evolutionary multiobjective optimization, Evolutionary Multi-Criterion Optimization, LNCS, Vol. 2632, Springer–Verlag, 2003. 75–75 [40] F. Valdez, P. Melin, O. Castillo, A survey on nature-inspired optimization algorithms with fuzzy logic for dynamic parameter adaptation, Expert Systems with Applications 41 (14) (2014) 6459–6466. [41] J. Vrugt, B. Robinson, Improved evolutionary optimization from genetically adaptive multimethod search, in: Proc. of the National Academy of Sciences of the USA, Vol. 104, 2007. 708–11 [42] L. While, L. Bradstreet, L. Barone, A fast way of calculating exact hypervolumes, IEEE Trans. Evol. Comput. 16 (1) (2012) 86–95. [43] T. Wöhling, J. Vrugt, Multi-response multi-layer vadose zone model calibration using Markov chain Monte carlo simulation and field water retention data, Water Resour. Res. 47 (4) (2011) 1–19. [44] X. Yu, X. Yu, Y. Lu, G.G. Yen, M. Cai, Differential evolution mutation operators for constrained multi-objective optimization, Appl. Soft Comput. 67 (2018) 452–466. [45] F. Zeng, M. Low, J. Decraene, S. Zhou, W. Cai, Self-adaptive mechanism for multi-objective evolutionary algorithms, in: Int. Conf. on Artificial Intelligence and Applications, 2010, pp. 7–12. [46] H. Zhang, A. Zhou, S. Song, Q. Zhang, X.Z. Gao, J. Zhang, A self-organizing multiobjective evolutionary algorithm, IEEE Trans. Evol. Comput. 20 (5) (2016) 792–806. [47] Q. Zhang, H. Li, MOEA/D: A multiobjective evolutionary algorithm based on decomposition, IEEE Trans. Evol. Comput. 11 (6) (2007) 712–731. [48] A. Zhou, Y. Jin, Q. Zhang, B. Sendhoff, E. Tsang, Combining model-based and genetics-based offspring generation for multi-objective optimization using a convergence criterion, in: Congress on Evolutionary Computation, IEEE, 2006, pp. 3234–3241. [49] E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the Strength Pareto Evolutionary Algorithm, Technical Report, ETH, Switzerland, 2001. [50] E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, V.G. da Fonseca, Performance assessment of multiobjective optimizers: An analysis and review, IEEE Trans. Evol. Comput. 7 (2) (2003) 117–132.