Cooperative coevolutionary hybrid NSGA-II with Linkage Measurement Minimization for Large-scale Multi-objective optimization

Rui Zhong Graduate School of Information Science and Technology
Hokkaido University
Sapporo, Japan
rui.zhong.u5@elms.hokudai.ac.jp Masaharu Munetomo Information Initiative Center
Hokkaido University
Sapporo, Japan
munetomo@iic.hokudai.ac.jp

Abstract

In this paper, we propose a variable grouping method based on cooperative coevolution for large-scale multi-objective problems (LSMOPs), named Linkage Measurement Minimization (LMM). And for the sub-problem optimization stage, a hybrid NSGA-II with a Gaussian sampling operator based on an estimated convergence point is proposed. In the variable grouping stage, according to our previous research, we treat the variable grouping problem as a combinatorial optimization problem, and the linkage measurement function is designed based on linkage identification by the nonlinearity check on real code (LINC-R). We extend this variable grouping method to LSMOPs. In the sub-problem optimization stage, we hypothesize that there is a higher probability of existing better solutions around the Pareto Front (PF). Based on this hypothesis, we estimate a convergence point at every generation of optimization and perform Gaussian sampling around the convergence point. The samples with good objective value will participate in the optimization as elites. Numerical experiments show that our variable grouping method is better than some popular variable grouping methods, and hybrid NSGA-II has broad prospects for multi-objective problem optimization.

Cooperative Co-evolution (CC), Linkage Measurement Minimization (LMM), Large-Scale Multi-Objective Problems (LSMOPs), hybrid NSGA-II

I Introduction

Multi-objective evolutionary algorithms (MOEAs) have successfully solved various multi-objective problems[1]. However, the performance of the traditional MOEAs degrades dramatically as the dimension of the problem increases. When the scale of the problem reaches a certain dimension, this type of problem is called large-scale multi-objective optimization problems (LSMOPs). Solving LSMOPs is always tricky, and mainly due to the following aspects: (1). The complexity of optimization problems tends to increase with the increase of dimensions. (2). The search space of large-scale problems increases exponentially with the increase of the dimension, which is known as the curse of dimensionality[2]. Therefore, a number of generic MOEAs have also been developed for solving LSMOPs since 2013[3]. These methods are usually not aimed at a specific type of problem but want to develop a general framework to solve LSMOPs[4]. There are mainly three strategies currently. (1). Variable grouping method based on Cooperative Coevolution (CC)[3]. (2). Methods based on dimensionality reduction or problem transformation[5]. (3). Design new search strategies based on MOEAs[6]. These strategies succeed in solving many-objective optimization problems[7], constrained multi-objective optimization problems[8], and computationally expensive multi-objective optimization problems[9].

This paper mainly adopt CC framework to solve the LSMOPs. Inspired by divide and conquer, the CC framework decomposes the original problem into multiple non-separable sub-problems and applies evolutionary algorithms (EAs) to solve each sub-problem alternately[10]. This strategy has been a huge success in solving large-scale problems. However, several studies[11, 12] have shown that the CC framework is sensitive to problem decomposition strategies. In theory, a perfect variable grouping strategy can exponentially reduce the search space without losing optimization accuracy, while a poor grouping strategy will mislead the direction of optimization. Therefore, how to design the decomposition strategy has become a popular research topic.

In addition, unlike large-scale single-objective optimization problems (LSSOPs), in LSMOPs, the same variables in different objective functions may have different interactions. Currently, there are many variable grouping methods were extended from LSSMOPs to group the variables in LSMOPs, such as Fast interdependency identification (FII) for LSMOPs[13], Differential Grouping (DG) for LSMOPs[14], MOEA-DVA[15], and other methods. Although these methods show great performance in grouping accuracy, computational cost and local interaction identification make these algorithms limited. What is more, many studies[11, 12] in LSSOPs show that high-accuracy grouping is not equivalent to high-performance optimization, and ignoring some weak interactions can accelerate the optimization, especially under the fitness evaluation times (FEs) limitation.

Besides, In order to find the global optimum among the fitness landscape, the heuristic algorithm should be equipped with two major characteristics to ensure finding the global optimum. These two main characteristics are exploration and exploitation[16]. Exploration is the ability of an algorithm to search whole parts of a problem space whereas exploitation is the convergence ability to the best solution near a good solution. However, the No Free Lunch Theorems (NFLT)[17] has proved that there is no algorithm, which can perform general enough to solve all optimization problems, and a single optimizer cannot balance the exploitation and exploration well in optimization, thus, hybrid evolutionary algorithms and memetic algorithms have become a popular research topic.

In this paper, we propose a novel variable grouping method, named Linkage Measurement Minimization (LMM) and apply hybrid Nondominated Sorting Genetic Algorithm II (hNSGA-II) as the sub-problem optimizer (CC-hNSGA-LMM). In the grouping stage, our proposal allows an automatic decomposition that treats the variable grouping problem as a combinatorial optimization problem, and we design a linkage measure function to evaluate the variable grouping performance. In the sub-problem optimization stage, we introduce a Gaussian sampling operator based on a estimated convergence point combined with NSGA-II to balance the exploitation and exploration in optimization. Specifically, the main contributions of this paper are as follows.

(1) For variable grouping, we propose a novel automatic variable grouping strategy named LLM and explain the mathmatical mechanism of LLM in detial, which reveals the relationship between our proposal and LINC-R.

(2) For optimizer, we first hypothesize that the individuals around the PF have better fitness, and the participation of elite individuals in optimization can accelerate the convergence. Based on this hypothesis, we estimate a convergence point at every generation of optimization and apply the Gaussian sampling as a local search operator to exploit the potential individuals around the estimated convergence point.

(3) Theoretical analysis and numerical experiments show that our proposal has broad prospects in solving LSMOPs and can solve larger-scale and many-objective optimization problems through simple extension.

The rest of the paper is organized as follows, Section II covers preliminaries and a brief review of grouping methods. Section III introduces our proposal in detail. Section IV shows the experiments and analysis. Section V discusses about the direction of our research in the future, Section VI concludes the paper and shows future directions.

Ii Preliminaries and related works

Many grouping methods have been proposed based on CC for LSSOPs in recent years[11, 12], and meanwhile, some works have been expanded to LSMOPs[13, 4]. In this section, we firstly introduce some preliminaries including the concept of multi-objective problems, separability of functions, estimation of a convergence point, and NSGA-II. Then, we will provide a brief review of grouping methods both in LSSOPs and LSMOPs.

Ii-a Preliminaries

Ii-A1 Multi-objective Problems

Without loss of generality, a Multi-objective Problem (MOP) can be mathematically defined as Eq (1):

	$min F (x) = (f_{1} (x), f_{2} (x), . . ., f_{M} (x)),$		(1)
	$s . t . x \in Ω,$		(1)

Where $f : Ω \to Λ \subseteq R^{M}$ consists of M objectives, $Λ$ is the objective space, $Ω \subseteq R^{D}$ is the decision space, and $x = (x_{1}, x_{2}, . . ., x_{D}) \in Ω$ is a solution consisting of $D$ decision variables. The dominance relation between two solutions can be defined as $(\forall i \in 1, . . ., M, f_{i} (x) ⩽ f_{i} (y) \land (\exists j \in 1, . . ., M, f_{i} (x) \leq f_{i} (y))$ . If this formula is satisfied, we can say that x dominates y. A Pareto optimal solution is a solution that is not dominated by any solution in $Ω$

Ii-A2 Separability of functions

The concept of separability is derived from a biological concept[18]. if a feature at the phenotype level is determined by two or more genes, then we say that these genes have interaction. Then we extend the concept to optimization problems. When $f (x_{1}, x_{2}, . . ., x_{n}) = \sum_{i = 1}^{m} f (x_{i_{1}}, . . ., x_{i_{k}})$ , We call $f (x)$ a partially separable function, and the variables in each sub-problem are called linkage sets[19]. The explicit or implicit interactions exist between every pair of variables in a linkage set, and we call these variables non-separable variables, and there is no interaction between variables in different linkage sets, and these variables are called separable variables. There are two extreme cases of the interaction, when there is no interaction between all variables, $f (x_{1}, x_{2}, . . ., x_{n}) = \sum_{i = 1}^{n} f (x_{i})$ , we call $f (x)$ is a fully separable function. On the contrary, when every pair of variables exist explicit or implicit interaction, we call $f (x)$ a fully non-separable function. Based on CC framework, we want to develop a grouping method which can find the interactions and divide the variables into sub-problems properly.

Ii-A3 Estimation of a convergence point

Estimation of a convergence point was first proposed by Murata. The paper[20] hypothesize that: in a population-based optimization algorithm, all individuals are moving towards the global optimum. Fig.1 shows how the estimation of a convergence point works. Although the estimated point is not exactly on the global optimum in practice due to incorrect directions or inaccurate population movements. However, it has higher possibility that the estimated point is close to the global optimum.

Fig. 1: Moving vector $d_{i} (= o_{i} - p_{i})$ is calculated from a parent (worse) individual $p_{i}$ and its offspring (better) $o_{i}$ . The $⋆$ is the estimated convergence point.

Let us derive how to estimate the convergence point mathematically. First, we define parent (worse) individual $p_{i}$ , offspring (better) $o_{i}$ , and moving vector $d_{i}$ as describe in Fig 1. The unit direction vector of $d_{i}$ is given as $d_{0 i} = \frac{d_{i}}{| | d_{i} | |}$ ,i.e., $d_{0 i}^{T} d_{0 i} = 1$ . $X$ denotes the estimated convergence point, and $p_{i} + t_{i} d_{i}$ represents the expansion from parent individual $p_{i}$ with the direction $d_{i}$ . $L (X, t_{i})$ in Eq (2) becomes the minimum.

min (L (X, t_{i})) = min (n \sum i = 1 | | p_{i} + t_{i} d_{i} - X | |^{2})

(2)

As the minimum line segment from the convergence point $X$ to the expansion line segments is the orthogonal projection from $X$ , we can apply the Eq (3) into Eq (2) to remote $t_{i}$ .

d_{i}^{T} (p_{i} + t_{i} d_{i} - X) = 0 (o r t h o g o n a l c o n d i t i o n)

(3)

Finally, the convergence point $X$ can be calculated by Eq (4). See detail expansion of equations in paper [20].

ˆ X = {n \sum i = 1 (I_{d} - d_{0 i} d_{0 i}^{T})}_{d}^{- 1} {n \sum i = 1 (I_{d} - d_{0 i} d_{0 i}^{T}) p_{i}}

(4)

Ii-A4 Nsga-Ii

Non-dominated Sorting Genetic Algorithm (NSGA-II) is proposed by Deb in 2002 to solve MOPs[21]. Now, NSGA-II is one of the most popular MOEAs with three special characteristics, the fast non-dominated sorting approach, the fast crowded distance estimation procedure, and the simple crowded comparison operator. Deb et al. simulated several test problems from previous studies using the NSGA-II optimization technique and showed exciting results. The pseudocode of NSGA-II is described in Algorithm 1

Input:

P o p u l a t i o n : P

Output:

P a r e t o F r o n t : P F

1 Function NSGA-II( $s$ ):

t \leftarrow 0

Q_{t} \leftarrow \emptyset

P F \leftarrow

nonDominated(

P_{t}

) while not stop criterion do

R_{t} \leftarrow P_{t} \cup Q_{t}

F \leftarrow

fastNonDominateSorting(

R_{t}

)

P_{t + 1} \leftarrow \emptyset

i \leftarrow 1

while $| P_{t + 1} | + | F_{i} | ⩽ s$ do

C_{i} \leftarrow

crowdingDistanceAssigment(

F_{i}

)

P_{t + 1} \leftarrow P_{t} \cup F_{i}

i \leftarrow i + 1

5 end while

F_{i} \leftarrow

Sort(

F_{i}, C_{i}

)

P_{t + 1} \leftarrow P_{t + 1} \cup F_{i} [1 : (N - | P_{t + 1} |)] ⊳

fill

P_{t + 1}

with the

N - | P_{t + 1} |

less crowded individuals of

F_{i}

Q_{t + 1} \leftarrow

Selection(

P_{t + 1}, s

)

Q_{t + 1} \leftarrow

Crossover(

Q_{t + 1}

)

Q_{t + 1} \leftarrow

Mutation(

Q_{t + 1}

)

t \leftarrow t + 1

P F \leftarrow

nonDominated(

A \cup Q_{t}

)

7 end while

return P F

Algorithm 1 NSGA-II

Ii-B A brief review of grouping methods

Based on the idea of divide and conquer, the CC framework was proposed in 1994[10] and has been popularized in large-scale optimization. CC requires decomposing the problem into a set of low-dimensional subproblems, each of which is optimized separately. Since the candidate solutions of each sub-component cannot form a complete solution, representative solutions of other sub-components are required to form a complete solution for evaluation. These representative solutions are known as the context vector[22]. The context vector is updated iteratively and acts as the context in which the cooperation occurs. In this part of the section, we briefly introduce the grouping principle on LSSOPs, which can be classified into three major groups: automatic, semi-automatic, and $m \times k$ -strategy.

Ii-B1 Automatic

In the automatic grouping methods, the formation of sub-components completely depends on the logic of the algorithm. The representative algorithms include LINC-R[18] and LIMD[23], and are extended to DG[24], DG2[25], ERDG[26], and CCVIL[27], etc. respectively. These methods are mainly designed based on perturbations. Eq (5) defines a sample $s$ perturbs $δ$ in $i^{t h}$ -D and $j^{t h}$ -D and both in $i^{t h}$ -D and $j^{t h}$ -D.

	$s = (x_{1}, x_{2}, . . ., x_{n})$		(5)
	$s_{i} = (x_{1}, . . ., x_{i} + δ, . . ., x_{n})$
	$s_{j} = (x_{1}, . . ., x_{j} + δ, . . ., x_{n})$
	$s_{i j} = (x_{1}, . . ., x_{i} + δ, . . ., x_{j} + δ, . . ., x_{n})$

Eq (6) is employed by LINC-R to identify the interaction between $x_{i}$ and $x_{j}$ .

	$\exists s \in P o p :$		(6)
	$Δ_{i} = f (s_{i}) - f (s)$
	$Δ_{j} = f (s_{j}) - f (s)$
	$Δ_{i j} = f (s_{i j}) - f (s)$
	$i f \| Δ_{i j} - (Δ_{i} + Δ_{j}) \| > ϵ$
	$t h e n x_{i} a n d x_{j} a r e n o n s e p a r a b l e$

$ϵ$ is the allowable error. LIMD applied Eq (7) to detect the interaction between $x_{i}$ and $x_{j}$ .

	$\exists s \in P o p :$		(7)
	$i f \neg (f (s) < f (s_{i}) < f (s_{i j}) a n d f (s) < f (s_{j}) < f (s_{i j})$
	$o r f (s) > f (s_{i}) > f (s_{i j}) a n d f (s) > f (s_{j}) > f (s_{i j}))$
	$t h e n x_{i} a n d x_{j} a r e n o n s e p a r a b l e$

When $x_{i}$ and $x_{j}$ not satisfy the simultaneous increase or decrease on at least one individual of population, LIMD identifies $x_{i}$ and $x_{j}$ as nonseparable variables. Notice that we cannot apply LINC-R and LIMD to the whole fitness landscape in practice, and only finite individuals $s$ are checked. In other words, if the condition both in LINC-R and LIMD are not satisfied, then $x_{i}$ and $x_{j}$ are identified as separable variables.

Ii-B2 Semi-automatic

Semi-automatic methods decompose the problem depending on the both algorithm logic and parameters specified by users. Some studies apply statistical detection methods to form the sub-components by a threshold or a set of intervals defined on correlation coefficients with the participation of users, such as AVP2[28] and 4CDE[29]. The fuzzy c-mean algorithm proposal by Fan et al[30]. forms the sub-components with the number of components. This kind of algorithm often consumes fewer FEs to detect the interactions, which is an advantage compared with automatic methods.

Ii-B3 $m \times k$ -strategy

This kind of algorithm requires fewer FEs to form the sub-components than automatic methods and semi-automatic methods. The number and the size of each component are necessary hyperparameters decided by users. And the representative detection principles include Random Grouping[31, 32], Delta Grouping[33], Fitness Difference Partitioning[34, 35], and so on.

CC framework is a well-studied technique for LSSOPs, and in LSMOPs, due to the multiple objective functions, and the variable interactions in different objective functions are often different, so the grouping methods developed for LSSOPs always need to be modified for LSMOPs. Currently, the popular grouping strategies on LSMOPs include Random Grouping[3], Differential Grouping[14], and Variables Analysis[15]. In paper [36], dynamic Random Grouping is applied. The variables are regrouped after each generation of optimization. As same as the Random Grouping in LSSOPs, the probability of two interacting variables being divided in the same sub-problem is quite high in multiple trial experiments. In DG for LSMOPs, the paper[13] applies Eq (6) to all objective functions, and when $x_{i}$ and $x_{j}$ satisfy Eq (6) in all objective functions, they are identified as separable variables. Both Random Grouping and DG were originally proposed to solve LSSOPs, and these methods focus on partitioning decision variables into sub-problems correctly while ignoring the population diversity in the objective space. Therefore, the variable grouping methods based on Random Grouping or DG can easily find some local or global optimal solutions but may not be able to diversify the population along the whole Pareto front. Variable Analysis (VA) is a grouping method for LSMOPs. MOEA/DVA[15] perturbs several times on a random sample. If all the perturbed samples are non-dominated with each other, this variable is considered a position variable, and if each perturbed sample is dominated or dominating the rest samples, this variable is regarded as a distance variable, otherwise, it is regarded as a mixed variable. MOEA/DVA optimizes the different types of variables in order and allocates different computational resources. Numerical experiments show that MOEA/DVA significantly outperforms many other MOEAs on LSMOPs in benchmarks.

Iii CC-hNSGA-LMM

In this section, we will introduce the details of our proposal and the techniques. The flowchart of our proposal is shown in Figure 2.

Our proposal includes the decomposition stage and the optimization stage. Next, we will introduce the flow of our proposal in detail.

In decomposition stage, our previous research[37] regards the decomposition problem as an optimization problem and designed the linkage measurement function based on LINC-R. Next, we will give a simple mathematical explanation of our proposal.

In LINC-R, we can rewrite Eq (6) to Eq (8)

	$\exists s \in P o p :$		(8)
	$i f \| (f (s_{i j}) - f (s_{i})) - (f (s_{j}) - f (s)) \| > ϵ$
	$t h e n x_{i} a n d x_{j} a r e n o n s e p a r a b l e$

And we notice that LINC-R can be understood as the additive form of vector. Eq (8) can also be written to Eq (9)

	$\exists s \in P o p :$		(9)
	$i f \| (f (s_{i j}) - f (s)) - ((f (s_{i}) - f (s)) + (f (s_{j}) - f (s))) \| > ϵ$
	$t h e n x_{i} a n d x_{j} a r e n o n s e p a r a b l e$

Figure 3 shows how original LINC-R and variant LINC-R work on separable variables.

Fig. 3: (a).The original LINC-R works on the separable variables. (b).The variant LINC-R works on the separable variables.

Next, we derive the variant LINC-R to $3$ -D or higher dimensions. In $3$ -D space, the schematic diagram is shown in Figure 4.

Fig. 4: The variant LINC-R works on 3-D space

Similarly, we define the fitness difference in $3$ -D space in Eq (10)

	$s \in P o p :$		(10)
	$Δ f_{i} = f (s_{i}) - f (s)$
	$Δ f_{j} = f (s_{j}) - f (s)$
	$Δ f_{k} = f (s_{k}) - f (s)$
	$Δ f_{i j k} = f (s_{i j k}) - f (s)$

And Eq (11) defines the variant LINC-R in $3$ -D space

	$\exists s \in P o p :$		(11)
	$i f \| Δ_{i j k} - (Δ_{i} + Δ_{j} + Δ_{k}) \| > ϵ$
	$t h e n i n t e r a c t i o n (s) e x i s t i n x_{i}, x_{j}, x_{k}$

Therefore, we can reasonably infer the variant LINC-R in $n$ -D space on Eq (12).

	$\exists s \in P o p :$		(12)
	$i f \| Δ_{1, 2, . . ., n} - (Δ_{1} + Δ_{2} + . . . + Δ_{n}) \| > ϵ$
	$t h e n i n t e r a c t i o n (s) e x i s t i n x_{1}, x_{2}, . . ., x_{n}$

Notice that we only detect the interactions based on the finite individuals, which means when $| Δ_{1, 2, . . ., n} - (Δ_{1} + Δ_{2} + . . . + Δ_{n}) | > ϵ$ is not satisfied at least once in all individuals, then we consider this function is a fully separable function by default. Although this strategy is limited especially for trap functions, it is impossible to check the interactions on the whole fitness landscape, and in LSSOPs, perturbation-based methods often identify the interactions based on 1 sample such as DG, DG2, etc. due to the FEs. Thus, Eq (13) is approximately correct in LSSOPs.

	$\forall s \in P o p :$		(13)
	$i f \| Δ_{1, 2, . . ., n} - (Δ_{1} + Δ_{2} + . . . + Δ_{n}) \| < ϵ$
	$t h e n x_{1}, x_{2}, . . ., x_{n} a r e s e p a r a b l e$

However, when Eq (13) is not satisfied, we only know that interactions exist in some variable pairs, but we cannot know the interactions exist in which pairs, so we can actively to detect the interactions between variables. Taking the $3$ -D space as an example,

	$\forall s \in P o p :$		(14)
	$i f \| Δ_{i j k} - (Δ_{i} + Δ_{j} + Δ_{k}) \| > ϵ$
	$a n d \| Δ_{i j k} - (Δ_{i j} + Δ_{k}) \| < ϵ$
	$t h e n x_{i}, x_{j} a r e n o n - s e p a r b a l e$
	$a n d x_{k} i s s e p a r a b l e f r o m x_{i}, x_{j}$

Therefore, Our target is to apply the heuristic algorithm to find the interactions between all variables as much as possible. According to the above explanation, in the $n$ -D problem, the linkage measurement function in our proposal is designed as Eq (15)

min ((Δ_{1, 2, . . ., n} - m \sum (Δ_{i, . . ., k}))^{2})

(15)

$m$ is the number of sub-problems. Eq (15) is the original linkage measurement function of our proposal[37]. Meanwhile, our further research notice that this linkage measurement function often contains multiple optima especially in separable functions and partially separable function. Therefore, we can attach a reasonable penalty to lead the direction of optimization. Eq (16) defines a linkage measurement function with a penalty term.

min (\frac{| Δ_{1, 2, . . ., n} - \sum_{i, j, . . .}^{m} Δ_{i, j, . .} |}{n u m o f g r o u p})

(16)

Then, we extend Eq (16) to LSMOPs with multiple samples. Eq (17) is our linkage measurement function in this paper.

min (\sum s \in P o p M \sum j = 1 w_{j} \frac{| Δ_{1, 2, . . ., n} - \sum_{i, j, . . .}^{m} Δ_{i, j, . .} |}{n u m o f g r o u p}), M \sum j = 1 w_{j} = 1

(17)

$w_{j}$ is the weight of the $j^{t h}$ objective function, and $M$ is the number of objective functions in LSMOPs. We apply averaging weight in Eq (17).

To optimize this linkage measurement function, we also apply Elitist Genetic Algorithm (EGA)[38]. The pseudocode of decomposition stage is shown in Algorithm 2

Input:

D i m e n s i o n : D; P o p u l a t i o n s i z e : s_{1}; S a m p l e s i z e : s_{2}; G e n e l e n g t h : L; G e n e r a t i o n : T

Output:

T h e b e s t d e c o m p o s i t i o n : E

1 Function LMM( $D, s_{1}, s_{2}, L, T$ ):

t \leftarrow 0

▶

(Initialization) for $i = 0 t o s_{1}$ do

3 for $j = 0 t o D$ do

n \leftarrow randint (0, 2^{L - 1} - 1)

P_{i, t}^{n} \leftarrow j

5 end for

7 end for

E \leftarrow initialCC (D)

S \leftarrow randSamples (s_{2})

F E \leftarrow

Evaluate(

E, S

)

▶

(Fully separable function) if $F E < 0.01$ then

return E

10 end if

F_{t} \leftarrow

Evaluate(

P_{t}^{n}, S

)

E \leftarrow

bestIndividual(

P_{t}^{n}, E

)

▶

(Optimization) while not stop criterion do

P_{t + 1} \leftarrow

Selection(

P_{t}, F_{t}, M

)

P_{t + 1} \leftarrow

Crossover(

P_{t + 1}

)

P_{t + 1} \leftarrow

Mutation(

P_{t + 1}

)

F_{t + 1} \leftarrow

Evaluate(

P_{t + 1}, S

)

P_{t + 1} \leftarrow

Replace(

P_{t + 1}, E

)

E \leftarrow

bestIndividual(

P_{t + 1}, E

)

t \leftarrow t + 1

13 end while

return E

Algorithm 2 Decomposition Stage

The elitist reservation strategy directly replicates the best individual without crossover, mutation, and selection to the next generation. This strategy can prevent the optimal individual from destroying the superior gene and chromosome structure during crossover mutation.

In optimization stage, first we hypothesize that there is much possible for better individuals around the PF individuals. This hypothesis is extended from single objective optimization problems[20]. Based on this hypothesis, we find the PF in every generation of optimization by quick search. The pseudocode of quick search is shown in Algorithm 3.

Input:

P o p u l a t i o n : P; O b j e c t i v e v a l u e : O; n u m o f o b j e c t i v e f u n c t i o n : M

Output:

P a r e t o F r o n t : P F

1 Function QS( $P, O, M$ ):

s \leftarrow size (P)

▶

(All solutions are PF in initialization)

R \leftarrow [1] * s

for $i = 0 t o s$ do

3 if $R_{i}! = 1$ then

4 continue

5 end if

6 for $j = i + 1 t o s$ do

7 if $R_{j}! = 1$ then

8 continue

9 end if

D \leftarrow Dominate (O_{i}, O_{j}, M)

if $D == 1$ then

R_{i} \leftarrow 0

break

12 end if

13 else if $D == - 1$ then

R_{j} \leftarrow 0

15 end if

16 else

17 continue

18 end if

20 end for

22 end for

t \leftarrow 0

for $i = 0 t o s$ do

24 if $R_{i} == 1$ then

P F_{t} \leftarrow P_{i}

t \leftarrow t + 1

26 end if

28 end for

return P F

Algorithm 3 Quick Search

And the function Dominate is realized in Algorithm 4.

Input:

O b j e c t i v e v a l u e o f i^{t h} i n d i v i d u a l : O_{i}; O b j e c t i v e v a l u e o f j^{t h} i n d i v i d u a l : O_{j}; n u m o f o b j e c t i v e f u n c t i o n : M

Output:

D o m i n a t i o n D : 0 (N o d o m i n a t i o n);

- 1 (i^{t h} D o m i n a t e s j^{t h});

1 (j^{t h} D o m i n a t e s i^{t h})

1 Function Dominate( $O_{i}, O_{j}, M$ ):

I J \leftarrow 0

J I \leftarrow 0

D \leftarrow 0

for $k = 0 t o M$ do

3 if $O_{i}^{k} >= O_{j}^{k}$ then

J I \leftarrow J I + 1

5 end if

6 if $O_{j}^{k} >= O_{i}^{k}$ then

I J \leftarrow I J + 1

8 end if

10 end for

11 if $I J == M$ and $J I! = M$ then

D \leftarrow - 1

13 end if

14 else if $I J! = M$ and $J I == M$ then

D \leftarrow 1

16 end if

17 else

D \leftarrow 0

19 end if

return D

Algorithm 4 Domination identification

The target of our quick search is only to find the whole PF in the current population, thus the fast non-dominated sorting in this situation is wasteful and unnecessary. From Algorithm 3, the worst and best time complexity of quick search is $O (M N^{2})$ and $O (M N)$ respectively. $M$ is the number of objective functions, and $N$ is the population size. After the PF is found, we estimate the convergence point with averaging strategy and apply the Gaussian sampling with the mean of the estimated convergence point. Figure 5 and Algorithm 5 shows how our proposal works.

Fig. 5: The averaging strategy to estimate the convergence point approximately.

Input:

D i m e n s i o n : D; P a r e t o F r o n t : P F; S a m p l i n g s i z e : s

Output:

S a m p l e s : S

1 Function EGS( $D, P F, s$ ):

p \leftarrow [0] * D

n \leftarrow size (P F)

▶

(Estimate a convergence point with averaging) for $i = 0 t o D$ do

3 for $j = 0 t o n$ do

p_{i} \leftarrow p_{i} + P F_{i, j}

5 end for

p_{i} \leftarrow p_{i} / n

7 end for

S \leftarrow GaussianSampling (p, s)

return S

Algorithm 5 Estiamte a convergence point and apply Gaussian sampling

Finally, the whole process of our proposal in optimization stage is described on Algorithm 6

Input:

D i m e n s i o n : D; L S M O P : M P; P o p u l a t i o n s i z e : s; M a x i t e r a t i o n : G; D e c o m p o s i t i o n s o l u t i o n : E; n u m o f o b j e c t i v e f u n c t i o n : M

Output:

P a r e t o F r o n t : P F

1 Function OPT( $D, M P, s, G, E$ ):

P s \leftarrow Decomposition (M P, E)

n \leftarrow size (P s)

C P \leftarrow \emptyset

// context population

P F \leftarrow \emptyset

for $i = 0 t o n$ do

P_{i} \leftarrow randPop (s)

for $j = 0 t o G$ do

Q_{i, j} \leftarrow NSGA-II (P_{i})

// Population before non-Dominate sorting

O_{i, j} \leftarrow Evaluate (Q_{i, j}, C P)

P F_{i, j} \leftarrow QS (Q_{i, j}, O_{i, j}, M)

S \leftarrow EGS (D, P F_{i, j}, s / 10)

O S_{i, j} \leftarrow Evaluate (S, C P)

Q_{i, j} \leftarrow Q_{i, j} \cup S

P_{i + 1} \leftarrow fastNonDominateSorting (Q_{i, j})

C P \leftarrow Update (P_{i + 1})

5 end for

P F \leftarrow Update (C P)

7 end for

return P F

Algorithm 6 Optimization Stage

Iv Experiment results and analysis

In this Section, we ran many experiments to evaluate our proposal, CC-hNSGA-LMM. In Section IV-A, we introduce the experiment settings, including benchmark functions, comparing methods, parameters of algorithms, and the performance indicators. In Section IV-B, we provide the experiment results. Finally, we analyze our proposal both in decomposition stage and optimization stage in Section IV-C.

Iv-a Experiment Settings

Iv-A1 Benchmark functions

We conduct our experiments on benchmark functions up to $500$ -D and $1000$ -D. The details of benchmark functions are shown in Table I

Test Suite	Benchmark	Feature of PF	Separability
ZDT[39]	ZDT1	Convex	Separable
	ZDT2	Concave	Separable
	ZDT3	Convex, disconnected	Separable
	ZDT4	Convex	Separable
	ZDT5	Convex	Separable
	ZDT6	Concave	Separable
DTLZ[40]	DTLZ1	Linear	Separable
	DTLZ2	Concave	Separable
	DTLZ3	Concave	Separable
	DTLZ4	Concave	Separable
	DTLZ5	Concave, degenerate	Separable
	DTLZ6	Concave, degenerate	Separable
	DTLZ7	Disconnected	Separable
UF[41]	UF1	Concave	Separable
UF[41]	UF2	Concave	Separable
WFG[42]	WFG1	Convex	Separable
	WFG2	Convex, disconnected	Partially separable
	WFG3	Linear, degenerate	Partially separable
	WFG4	Concave	Separable
	WFG5	Concave	Separable
	WFG7	Concave	Separable

TABLE I: The benchmark functions of our experiment

We did not apply high-dimensional WFG6, WFG8, and WFG9 as our benchmark functions because these functions are not suitable for extending to high dimensions due to high computational cost. All benchmark functions are provide by geatpy[43] and pymoo[44].

Iv-A2 Compareing methods and parameters

We combine our proposal in decomposition with NSGA-II (CC-NSGA-LMM) and comparing with Random Grouping (CC-NSGA-G)[3], Differential Grouping (CC-NSGA-DG)[14], and Monotonicity Detection (CC-NSGA-LIMD)[45] with 30 trial runs. Besides, we compare our ultimate proposal (CC-hNSGA-LMM) with CC-NSGA-LMM to verify the effect of hybrid NSGA-II. Table II shows the parameters of our proposal in the grouping stage, and Table III shows the parameters of sub-problems optimization. The NSGA-II is also provided by geatpy.

Parameter	value
Optimization direction	Minimization
Optimizer	Elitist GA
Population size	20
Max iteration	20
Gene length	6 and 7

TABLE II: The parameters of decomposition optimization

Parameter	value
Dimension	$500$ -D and $1000$ -D
FEs	750,000 and 1,500,000
Optimization direction	Minimization
Optimizer	NSGA-II
Population size	50
Crossover rate	0.9
Mutation rate	0.2

TABLE III: The parameters of sub-problems optimization

Iv-A3 Performance indicators

To evaluate the performance of our proposal, we introduced Hypervolume (HV)[46] and Inverted Generational Distance (IGD)[47] as indicators to evaluate the performance of the algorithms. HV was first presented as the size of the space covered. Given a solution set $A$ and a reference point $r$ , HV can be calculated as Eq (18)

H V (A) = λ (⋃ a \in A {x | a ≺ x ≺ r})

(18)

where $λ$ denotes the Lebesgue measure. IGD is also the most commonly used indicator. given a solution set $A$ and a reference set $R = {r_{1}, r_{2}, . . ., r_{M}}$ , IGD can be defined as Eq (19)

I G D (A, R) = \frac{1}{M} (M \sum i = 1 m i n a \in A d i s (r_{i}, a))

(19)

where $d i s (r_{i}, a)$ denotes the Euclidean distance between $r_{i}$ and $a$ , and a lower IGD value means better performance as same as HV.

Iv-B Performance of our proposal

In this section, the performance of CC-hNSGA-LMM is studied, both on our proposed decomposition method and the introduction of Gaussian sampling based on an estimated convergence point. Experiments are conducted on the benchmark functions presented in Section IV-A1 with 30 independent runs. Besides, we randomly choose one trial run result in 30 trial runs and draw the PF graph within comparing methods and reference sets. Due to space limitations, we select some representative PF graphs in Figure6. The mean of HV and IGD calculated in 30 trial runs are shown in Table IV and Table V. The best solution among CC-NSGA-G, CC-NSGA-DG, CC-NSGA-LIMD, and CC-NSGA-LMM these 4 methods is highlighted with red and orange in $500$ -D and $1000$ -D respectively to show the performance of our proposed decomposition method. Besides, we mark the better solution between CC-NSGA-LMM and CC-hNSGA-LMM with $^{+}$ to show the effect of the introduction of Gaussian sampling based on an estimated convergence point. The FEs consumed in the decomposition stage of DG, LIMD, and LMM are provided in Table VI.

Fig. 6: The representive PF graphs within 5 methods and reference sets in $500$ -D and $1000$ -D.

Func	CC-NSGA-G		CC-NSGA-DG		CC-NSGA-LIMD		CC-NSGA-LMM		CC-hNSGA-LMM
Func	$500$ -D	$1000$ -D	$500$ -D	$1000$ -D	$500$ -D	$1000$ -D	$500$ -D	$1000$ -D	$500$ -D	$1000$ -D
ZDT1	0.913	1.147	0.581	0.710	0.872	1.101	0.158	0.177	0.134 $^{+}$	0.144 $^{+}$
ZDT2	0.918	1.517	0.523	0.784	0.851	1.221	0.075	0.296	0.071 $^{+}$	0.148 $^{+}$
ZDT3	0.499	1.376	0.193	0.714	0.443	0.907	0.177	0.195	0.150 $^{+}$	0.083 $^{+}$
ZDT4	1690.193	1578.118	1006.923	773.871	1615.125	1360.387	149.293	265.902	112.492 $^{+}$	168.336 $^{+}$
ZDT5	1408.709	2175.905	963.814	1511.787	1395.165	2058.472	947.672	1506.865	885.373 $^{+}$	1262.068 $^{+}$
ZDT6	0.431	0.355	0.384	0.422	0.507	0.270	0.335	0.221	0.279 $^{+}$	0.192 $^{+}$
DTLZ1	8.261e10	6.956e11	30228.461	2.403e11	7.210e10	8.988e11	26869.899	2.111e5	1440.654 $^{+}$	9060.585 $^{+}$
DTLZ2	706.197	10266.957	0.073	4269.415	428.909	9674.223	0.077	0.067	0.063 $^{+}$	0.056 $^{+}$
DTLZ3	4.158e12	7767.971	1.780e6	8.773e12	3.251e12	4.632e13	1.545e6	9.983e6	1.271e5 $^{+}$	8.328e5 $^{+}$
DTLZ4	1011.450	14788.623	0.043	50.488	539.192	11120.653	0.059 $^{+}$	0.099	0.060	0.091 $^{+}$
DTLZ5	538.285	7767.971	0.161	4270.747	290.810	6144.751	0.175	0.251	0.148 $^{+}$	0.211 $^{+}$
DTLZ6	5.716e6	5.145e7	4.690e6	4.994e7	3.746e6	4.385e7	2.103e5	1.997e6	4.035e4 $^{+}$	3.554e5 $^{+}$
DTLZ7	1.184	1.886	0.174	0.445	0.970	1.660	0.153	0.364	0.093 $^{+}$	0.317 $^{+}$
UF1	1.232	1.580	0.101	0.388	0.741	1.274	0.095	0.155	0.067 $^{+}$	0.107 $^{+}$
UF2	0.136	1.197	0.105	0.645	0.121	0.849	0.094	0.280	0.089 $^{+}$	0.277 $^{+}$
WFG1	5.090	1.536	4.552	0.841	4.973	0.903	4.179	0.884	3.965 $^{+}$	0.520 $^{+}$
WFG2	0.335	0.625	0.317	0.657	0.269	0.542	0.309	0.604	0.289 $^{+}$	0.587 $^{+}$
WFG3	0.273	5.077	0.288	4.260	0.237	4.036	0.264	3.897 $^{+}$	0.253 $^{+}$	3.957
WFG4	3.698	4.961	2.855	3.579	3.592	3.680	3.032 $^{+}$	3.150	3.057	2.652 $^{+}$
WFG5	0.795	3.219	0.140	1.783	0.692	2.617	0.130	1.323	0.129 $^{+}$	0.981 $^{+}$
WFG7	2.785	6.560	2.233	6.264	2.733	6.283	2.136	6.114	2.010 $^{+}$	5.842 $^{+}$

TABLE IV: The mean of HV among 5 methods in 30 trial runs

Func	CC-NSGA-G		CC-NSGA-DG		CC-NSGA-LIMD		CC-NSGA-LMM		CC-hNSGA-LMM
Func	$500$ -D	$1000$ -D	$500$ -D	$1000$ -D	$500$ -D	$1000$ -D	$500$ -D	$1000$ -D	$500$ -D	$1000$ -D
ZDT1	1.171	1.474	0.627	0.760	1.097	1.379	0.025	0.016	0.011 $^{+}$	0.007 $^{+}$
ZDT2	2.492	3.012	1.533	1.800	2.306	2.842	0.280	0.273	0.205 $^{+}$	0.190 $^{+}$
ZDT3	0.816	0.981	0.448	0.511	0.731	0.899	0.014	0.015	0.008 $^{+}$	0.007 $^{+}$
ZDT4	6106.310	13626.344	3659.787	8258.761	5861.058	13467.469	539.775	1151.956	403.140 $^{+}$	845.605 $^{+}$
ZDT5	10.899	10.470	7.350	8.613	9.116	10.594	7.183 $^{+}$	7.153	7.194	6.221 $^{+}$
ZDT6	1.224	1.152	1.182	1.261	0.976	0.887	1.053	0.816	0.938 $^{+}$	0.784 $^{+}$
DTLZ1	10183.928	21115.183	77.196	3432.782	9639.649	22587.874	76.844	154.833	28.667 $^{+}$	53.545 $^{+}$
DTLZ2	19.094	45.582	0.065	7.940	16.762	45.225	0.067	0.133	0.006 $^{+}$	0.002 $^{+}$
DTLZ3	33931.424	73316.697	307.394	11939.406	31664.480	75494.883	312.767	602.455	131.318 $^{+}$	244.015 $^{+}$
DTLZ4	21.575	51.164	0.129	8.361	17.541	47.096	0.249	0.453	0.271 $^{+}$	0.346 $^{+}$
DTLZ5	19.490	45.704	0.071	7.851	16.962	45.825	0.069	0.127	0.002 $^{+}$	0.003 $^{+}$
DTLZ6	389.946	785.520	375.773	807.134	363.994	792.004	170.141	356.833	94.073 $^{+}$	197.733 $^{+}$
DTLZ7	5.528	6.859	0.146	1.235	4.308	5.981	0.125	0.133	0.036 $^{+}$	0.027 $^{+}$
UF1	1.419	1.651	0.491	0.658	1.043	1.444	0.367 $^{+}$	0.378	0.384	0.323 $^{+}$
UF2	0.445	0.557	0.217	0.278	0.332	0.454	0.076 $^{+}$	0.081 $^{+}$	0.082	0.091
WFG1	3.798	3.870	1.614	1.999	3.697	3.882	1.503	1.502	1.502 $^{+}$	1.405 $^{+}$
WFG2	0.719	0.732	0.709	0.720	0.590	0.586	0.630	0.621	0.610 $^{+}$	0.612 $^{+}$
WFG3	1.538	1.078	0.894	0.895	0.863	0.908	0.883	0.835 $^{+}$	0.871 $^{+}$	0.871
WFG4	0.561	0.496	0.009	0.089	0.739	0.747	0.012 $^{+}$	0.015	0.015	0.009 $^{+}$
WFG5	3.213	0.720	3.020	0.228	3.252	0.712	2.989	0.149	2.835 $^{+}$	0.137 $^{+}$
WFG7	0.656	0.714	0.590	0.649	0.639	0.676	0.587	0.595 $^{+}$	0.568 $^{+}$	0.605

TABLE V: The mean of IGD among 5 methods in 30 trial runs

Func	DG		LIMD		LMM
Func	$500$ -D	$1000$ -D	$500$ -D	$1000$ -D	$500$ -D	$1000$ -D
ZDT1	63250	251500	1974	3972	1503	3003
ZDT2	63250	251500	1974	3972	1503	3003
ZDT3	63250	251500	1974	3972	1503	3003
ZDT4	63250	251500	1974	3972	1503	3003
ZDT5	250500	1001000	1974	3972	1503	3003
ZDT6	244558	989058	1974	3972	1503	3003
DTLZ1	249502	999002	1974	3972	74649	149304
DTLZ2	249502	999002	1974	3972	1503	3003
DTLZ3	249502	999002	1974	3972	74652	149274
DTLZ4	249502	999002	1974	3972	1503	3003
DTLZ5	249502	999002	1974	3972	1503	3003
DTLZ6	1976	3974	1974	3972	74629	149271
DTLZ7	250500	1001000	1974	3972	1503	3003
UF1	250500	1001000	1974	3972	1503	3003
UF2	250500	1001000	1974	3972	1503	3003
WFG1	249502	999002	1974	3972	1503	3003
WFG2	247518	999002	63250	251500	74364	149325
WFG3	247518	999002	63250	251500	73253	148245
WFG4	249502	999002	1974	3972	1503	3003
WFG5	249502	999002	63250	251500	1503	3003
WFG7	248506	997006	1974	3972	1503	3003

TABLE VI: FEs consumed on decomposition stage of three variable grouping methods

Iv-C Analysis

In this Section, we will analyze the effect of the LMM in decomposition and the Gaussian sampling in optimization.

Iv-C1 LMM in decomposition

From Table IV and Table V, we can see our proposed CC-NSGA-LMM outperforms the compared three methods in the majority of test functions. This is mainly due to the following aspects. (1). We calculate the linkage measurement function based on multiple samples. Although this is a necessary condition for the employment of LINC-R and LIMD in low-dimensional space, the linkage is only possible to check the nonlinearity around a sample point in high-dimensional space due to the FEs limitation, such as DG. Therefore, our proposal is more robust for solving large-scale optimization problems based on CC. (2). Although our proposal is based on LINC-R, the existence of the penalty allows our proposal to ignore some weak interactions between variables. This process will increase the error in the optimization stage, it can accelerate the sub-problems optimization, especially under the limitation of FEs.

From Table VI, we notice that in our proposed decomposition method, $1, 503$ and $3, 003$ evaluation times appear in the $500$ -D and $1000$ -D test functions frequently, This is because our proposal identifies these test functions as fully separable, which is consistent with the description of functions in Table I. According to the FEs consumed by DG in the decomposition stage, it can be seen that when FEs are approximately equal to $250, 000$ and $1, 000, 000$ in $500$ -D and $1000$ -D functions respectively, DG identifies the problem as a separable function. And the FEs saved in the decomposition stage allow more FEs to be allocated to optimize the sub-problems, which makes our proposal better than the compared methods.

Meanwhile, we notice that CC-NSGA-DG and CC-NSGA-LIMD outperform our proposal in some $500$ -D test functions, such as DTLZ2-5. But this competitiveness almost disappeared in the $1000$ -D test function. This is because the time complexity of DG and LIMD is $O (M N^{2})$ . $M$ is the population size, and $N$ is the dimension. DG sets $M = 1$ in high-dimensional space. So as the dimension increases, more FEs are necessary to identify the interactions between variables. This is the main reason for the rapid degeneration of DG and LIMD when the dimension approaches to $1000$ -D. And our proposal can control the time complexity by controlling some hyperparameters, which enables our proposal to have more feasibility to extend to larger-scale optimization problems.

However, we notice that in ZDT1-4, DG identifies these functions as partially separable functions, and DG identifies DTLZ6 as completely non-separable functions, which is inconsistent with the separability description in Table I. Let’s take ZDT1 as an example. The formula of ZDT1 is shown in Eq (20)

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ \begin{matrix} min f_{1} (x_{1}) = x_{1} min f_{2} (x) = g (1 - \sqrt{f_{1} / g}) g (x) = 1 + 9 m \sum i = 2 x_{i} / (m - 1) s . t . 0 \leq x_{i} \leq 1 \end{matrix}

(20)

We can see that although $f_{2}$ in Eq (20) is a separable function at the monotonicity level in the limited search space, DG cannot detect this information, which reveals the limitation of DG in identifying the separability of such variables. At the same time, our proposal LMM identifies ZDT1-4 as fully separable functions. Due to the existence of the penalty, which weakens the effect of the fitness difference term in the linkage measurement function, so the penalty term dominates the direction of optimization.

Meanwhile, LMM recognizes DTLZ1, DTLZ3, and DTL6 as partially separable functions, which is because the fitness difference term still occupies a large proportion in the linkage measurement function. In future research, we can adaptively adjust the intensity of the penalty through some methods such as machine learning and reinforcement learning.

Iv-C2 Gaussian sampling based on an estimated convergence point

From Table IV and Table V, we can say the introduction of the Gaussian sampling based on an estimated convergence point can accelerate the convergence of optimization and find better PFs, which proves the hypothesis we proposed in Section I is true. Paper[48] has already proved that only an estimated convergence point may not accelerate the convergence with numerical experiments, especially in multimodal functions. However, our proposal does not rely on the estimated convergence point in excess but considers the area centered on the estimated convergence point as the trust region. This strategy gives more possibilities for exploitation, although it needs to consume some FEs.

V Discuss

The above analysis shows our proposal has broad prospects to solve LSMOPs, however, there are still many aspects for improvement. Here, we list a few open topics for potential and future research.

V-a The design of linkage measurement function

In our proposed variable grouping method, we design a linkage measurement function to lead the direction of decomposition solution search, and there are mainly three important constituent elements of this function: fitness difference term, penalty, and weights of the objective function. To simplify the design of the linkage measurement function, we apply constant penalty and averaging weights. In practice, the intensity of the penalty and the weights of objective functions should be changed based on the feature of LSMOPs. For example, in fully separable LSMOPs, the penalty will play a decisive role to lead the direction of optimization due to the fitness difference term being infinitely close to 0, and the intensity of the penalty will decrease with the increase of interactions between variables. And for weights of objective functions, the importance of different objective functions should be different in practice. Adaptively deciding these hyperparameters based on reinforcement learning or machine learning is our future research topic.

V-B More powerful local search operator

In this experiment, we design a Gaussian sampling operator with the mean of an estimated convergence point and a constant variance as our local search operator and achieved good experimental results. The search space of the optimization problem is often different, and the search space corresponding to the concept of ”local” is also different. Given a simple example, the search space of two problems are $[- 0.5, 0.5]$ and $[- 500, 500]$ respectively, so it is unreasonable to apply a constant as the Gaussian sampling variance for all kinds of optimization problems. Therefore, an adaptive variance determination strategy is necessary. In future research, it is promising research to introduce the CMA-ES[49] as our local search operator.

V-C The scalability of our proposal

This paper mainly contributes two different aspects to accelerate the convergence of LSMOPs. (1). Designing a novel decomposition method. (2). Developing a efficient Gaussian samlping operator based on the estimated convergence point combined with NSGA-II.

For decomposition method, our proposal performed well on $500$ -D and $1000$ -D LSMOPs. In the face of higher dimensional problems, The computational cost of DG, especially on fully separable functions, is completely unacceptable[24]. However, our proposal can control the consumption of FEs by adjusting some hyperparameters, such as the length of genes (number of groups), the maximum iteration of decoposition optimization, etc. And the previous research shows that the proposed variable grouping method can identify the interactions between variables in noisy environments. Therefore, our proposed grouping method has stronger scalability than the current popular variable grouping methods theoretically. In the future, the introduction of our proposal to solve higher-dimensional LSMOPs, constraint LSMOPs, and real-world LSMOPs is our research direction.

For the Gaussian sampling operator, experimental results show that our proposal can accelerate the convergence and find better PF in solving LSMOPs. In theory, our designed Gaussian sampling operator has strong scalability to combine with GA, DE, and PSO in single-objective evolutionary algorithms (SOEAs), and MOEA/D, NSGA in MOEAs with simple modification. In the future, the combination of our proposal with novel EAs to solve more complex optimization problems is a promising research topics.

Vi Conclusion

In this paper, we extend our previous research on the variable grouping method for LSSOPs to LSMOPs and design a Gaussian sampling operator based on an estimated convergence point to accelerate the convergence of the optimizer to find PF. To evaluate our proposal, we conduct our experiments on $500$ -D and $1000$ -D test functions and compare our proposal with popular methods. Experiments show that our proposed variable grouping method is better than the three compared methods, and our proposed hNSGA can significantly accelerate the convergence in optimization. At the end of this paper, we list some interesting topics which can improve our algorithm. Finally, our proposal has broad prospects for addressing LSMOPs.

Vii Acknowledgement

This work was supported by JSPS KAKENHI Grant Number JP20K11967.

References

[1] R. T. Marler and J. S. Arora, “Survey of multi-objective optimization methods for engineering,” Structural and multidisciplinary optimization, vol. 26, no. 6, pp. 369–395, 2004.
[2] M. Köppen, “The curse of dimensionality,” in 5th online world conference on soft computing in industrial applications (WSC5), vol. 1, 2000, pp. 4–8.
[3] L. M. Antonio and C. A. C. Coello, “Use of cooperative coevolution for solving large scale multiobjective optimization problems,” in 2013 IEEE Congress on Evolutionary Computation. IEEE, 2013, pp. 2758–2765.
[4] Y. Tian, L. Si, X. Zhang, R. Cheng, C. He, K. C. Tan, and Y. Jin, “Evolutionary large-scale multi-objective optimization: A survey,” ACM Comput. Surv., vol. 54, no. 8, 2021.
[5] C. He, L. Li, Y. Tian, X. Zhang, R. Cheng, Y. Jin, and X. Yao, “Accelerating large-scale multiobjective optimization via problem reformulation,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 6, pp. 949–961, 2019.
[6] J.-H. Yi, L.-N. Xing, G.-G. Wang, J. Dong, A. V. Vasilakos, A. H. Alavi, and L. Wang, “Behavior of crossover operators in nsga-iii for large-scale optimization problems,” Inf. Sci., vol. 509, no. C, p. 470â487, jan 2020.
[7] B. Li, J. Li, K. Tang, and X. Yao, “Many-objective evolutionary algorithms: A survey,” ACM Computing Surveys (CSUR), vol. 48, no. 1, pp. 1–35, 2015.
[8] Z. Fan, Y. Fang, W. Li, J. Lu, X. Cai, and C. Wei, “A comparative study of constrained multi-objective evolutionary algorithms on constrained multi-objective optimization problems,” in 2017 IEEE Congress on Evolutionary Computation (CEC), 2017, pp. 209–216.
[9] T. Chugh, K. Sindhya, J. Hakanen, and K. Miettinen, “A survey on handling computationally expensive multiobjective optimization problems with evolutionary algorithms,” Soft Computing, vol. 23, no. 9, pp. 3137–3166, 2019.
[10] M. Potter and K. De Jong, “A cooperative coevolutionary approach to function optimization,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 866 LNCS, pp. 249–257, 1994.
[11] M. N. Omidvar, X. Li, and X. Yao, “A review of population-based metaheuristics for large-scale black-box global optimization: Part a,” IEEE Transactions on Evolutionary Computation, pp. 1–1, 2021.
[12] Omidvar, Mohammad Nabi and Li, Xiaodong and Yao, Xin, “A review of population-based metaheuristics for large-scale black-box global optimization: Part b,” IEEE Transactions on Evolutionary Computation, pp. 1–1, 2021.
[13] M. Li and J. Wei, “A cooperative co-evolutionary algorithm for large-scale multi-objective optimization problems,” in Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2018, pp. 1716–1721.
[14] F. Sander, H. Zille, and S. Mostaghim, “Transfer strategies from single- to multi-objective grouping mechanisms,” in Proceedings of the Genetic and Evolutionary Computation Conference, ser. GECCO ’18. Association for Computing Machinery, 2018, p. 729â736.
[15] X. Ma, F. Liu, Y. Qi, X. Wang, L. Li, L. Jiao, M. Yin, and M. Gong, “A multiobjective evolutionary algorithm based on decision variable analyses for multiobjective optimization problems with large-scale variables,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 2, pp. 275–298, 2015.
[16] S. Mirjalili and S. Z. M. Hashim, “A new hybrid psogsa algorithm for function optimization,” in 2010 International Conference on Computer and Information Application, 2010, pp. 374–377.
[17] D. Wolpert and W. Macready, “No free lunch theorems for optimization,” IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 67–82, 1997.
[18] M. Tezuka, M. Munetomo, and K. Akama, “Linkage identification by nonlinearity check for real-coded genetic algorithms,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3103, pp. 222–233, 2004.
[19] H. Kargupta and D. E. Goldberg, “Blackbox optimization: Implications of search,” in In communication. Submitted in International Journal of Foundation of Computer Science, 1996, pp. 96–63.
[20] N. Murata, R. Nishii, H. Takagi, and Y. Pei, “Analytical estimation of the convergence point of populations,” in 2015 IEEE Congress on Evolutionary Computation, CEC 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., Sep. 2015, pp. 2619–2624.
[21] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: Nsga-ii,” IEEE transactions on evolutionary computation, vol. 6, no. 2, pp. 182–197, 2002.
[22] F. Van den Bergh and A. P. Engelbrecht, “A cooperative approach to particle swarm optimization,” IEEE transactions on evolutionary computation, vol. 8, no. 3, pp. 225–239, 2004.
[23] M. Munetomo and D. E. Goldberg, “Linkage identification by non-monotonicity detection for overlapping functions,” Evolutionary computation, vol. 7, no. 4, pp. 377–398, 1999.
[24] M. Omidvar, X. Li, Y. Mei, and X. Yao, “Cooperative co-evolution with differential grouping for large scale optimization,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 3, pp. 378–393, 2014.
[25] M. N. Omidvar, M. Yang, Y. Mei, X. Li, and X. Yao, “Dg2: A faster and more accurate differential grouping for large-scale black-box optimization,” IEEE Transactions on Evolutionary Computation, vol. 21, no. 6, pp. 929–942, 2017.
[26] M. Yang, A. Zhou, C. Li, and X. Yao, “An efficient recursive differential grouping for large-scale continuous problems,” IEEE Transactions on Evolutionary Computation, vol. 25, no. 1, pp. 159–171, 2021.
[27] W. Chen, T. Weise, Z. Yang, and K. Tang, “Large-scale global optimization using cooperative coevolution with variable interaction learning,” in International Conference on Parallel Problem Solving from Nature. Springer, 2010, pp. 300–309.
[28] H. K. Singh and T. Ray, Divide and Conquer in Coevolution: A Difficult Balancing Act. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 117–138.
[29] Y. Rojas and R. Landa, “Towards the use of statistical information and differential evolution for large scale global optimization,” in 2011 8th International Conference on Electrical Engineering, Computing Science and Automatic Control, 2011, pp. 1–6.
[30] J. Fan, J. Wang, and M. Han, “Cooperative coevolution for large-scale optimization based on kernel fuzzy clustering and variable trust region methods,” IEEE Transactions on Fuzzy Systems, vol. 22, no. 4, pp. 829–839, 2014.
[31] Z. Yang, K. Tang, and X. Yao, “Large scale evolutionary optimization using cooperative coevolution,” Information Sciences, vol. 178, no. 15, pp. 2985–2999, 2008.
[32] M. N. Omidvar, X. Li, Z. Yang, and X. Yao, “Cooperative co-evolution for large scale optimization through more frequent random grouping,” in IEEE Congress on Evolutionary Computation, 2010, pp. 1–8.
[33] M. Omidvar, X. Li, and X. Yao, “Cooperative co-evolution with delta grouping for large scale non-separable function optimization,” 2010, pp. 1–8.
[34] E. Sayed, D. Essam, R. Sarker, and S. Elsayed, “Decomposition-based evolutionary algorithm for large scale constrained problems,” Information Sciences, vol. 316, pp. 457–486, 2015.
[35] G. Dai, X. Chen, L. Chen, M. Wang, and L. Peng, “Cooperative coevolution with dependency identification grouping for large scale global optimization,” in 2016 IEEE Congress on Evolutionary Computation (CEC), 2016, pp. 5201–5208.
[36] A. Song, Q. Yang, W.-N. Chen, and J. Zhang, “A random-based dynamic grouping strategy for large scale multi-objective optimization,” in 2016 IEEE congress on evolutionary computation (CEC). IEEE, 2016, pp. 468–475.
[37] R. Zhong and M. Munetomo, “Random Population-based Decomposition Method by Linkage Identification with Non-linearity Minimization on Graph,” in Transactions on Computational Science & Computational Intelligence. Springer, Accepted.
[38] K. A. De Jong, An analysis of the behavior of a class of genetic adaptive systems. University of Michigan, 1975.
[39] E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective evolutionary algorithms: Empirical results,” Evol. Comput., vol. 8, no. 2, p. 173â195, 2000.
[40] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, Scalable Test Problems for Evolutionary Multiobjective Optimization. London: Springer London, 2005, pp. 105–145.
[41] Q. Zhang, A. Zhou, S. Zhao, P. Suganthan, W. Liu, and S. Tiwari, “Multiobjective optimization test instances for the cec 2009 special session and competition,” Mechanical Engineering, 01 2008.
[42] S. Huband, P. Hingston, L. Barone, and L. While, “A review of multiobjective test problems and a scalable test problem toolkit,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 5, pp. 477–506, 2006.
[43] e. Jazzbin, “geatpy: The genetic and evolutionary algorithm toolbox with high performance in python,” 2020.
[44] J. Blank and K. Deb, “pymoo: Multi-objective optimization in python,” IEEE Access, vol. 8, pp. 89 497–89 509, 2020.
[45] K. Izumiya and M. Munetomo, “Multi-objective Evolutionary optimization based on Decomposition with Linkage Identification considering monotonicity,” in 2017 IEEE Congress on Evolutionary Computation (CEC), 2017, pp. 905–912.
[46] E. Zitzler and L. Thiele, “Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study,” in Proceedings of the 5th International Conference on Parallel Problem Solving from Nature, ser. PPSN V. Berlin, Heidelberg: Springer-Verlag, 1998, p. 292â304.
[47] M. Li and X. Yao, “Quality evaluation of solution sets in multiobjective optimisation: A survey,” ACM Comput. Surv., vol. 52, no. 2, mar 2019.
[48] Y. Pei, J. Yu, and H. Takagi, “Search acceleration of evolutionary multi-objective optimization using an estimated convergence point,” Mathematics, vol. 7, no. 2, 2019.
[49] N. Hansen, “The cma evolution strategy: A tutorial,” arXiv preprint arXiv:1604.00772, 2016.

Cooperative coevolutionary hybrid NSGA-II with Linkage Measurement Minimization for Large-scale Multi-objective optimization

Abstract

I Introduction

Ii Preliminaries and related works

Ii-a Preliminaries

Ii-A1 Multi-objective Problems

Ii-A2 Separability of functions

Ii-A3 Estimation of a convergence point

Ii-A4 Nsga-Ii

Ii-B A brief review of grouping methods

Ii-B1 Automatic

Ii-B2 Semi-automatic

Ii-B3 m×k-strategy

Iii CC-hNSGA-LMM

Iv Experiment results and analysis

Iv-a Experiment Settings

Iv-A1 Benchmark functions

Iv-A2 Compareing methods and parameters

Iv-A3 Performance indicators

Iv-B Performance of our proposal

Iv-C Analysis

Iv-C1 LMM in decomposition

Iv-C2 Gaussian sampling based on an estimated convergence point

V Discuss

V-a The design of linkage measurement function

V-B More powerful local search operator

V-C The scalability of our proposal

Vi Conclusion

Vii Acknowledgement

References

Ii-B3 $m \times k$ -strategy