GENETIC STOCHASTIC METHOD OF GLOBAL EXTREMUM SEARCH FOR MULTIVARIABLE FUNCTION

This article is devoted to the development of stochastic methods of global extremum search. The modiﬁcation of the annealing simulation algorithm [Ermakov and Se-menchikov, 2019] is combined with the covariance matrix adaptation method [Ermakov, Kulikov and Leora, 2017]. In this case, an effective computational approach [Ermakov and Mitioglova, 1977] is used for modeling the multivariate normal distribution. The special algo-rithms of covariance matrices adaptation are suggested to avoid the obtaining a local extremum instead of a global one. The methods proposed are successfully applied to the problem of nonlinear regression parameters calculation. This problem often arises in physics and mathematics and may be reduced to global extremum search. In particular case considered the extremum of ravine function of 14 variables was found.


Introduction
Computational and machine learning methods, among them extremum search algorithms, are widely discussed in the literature [Ermakov, Kulikov and Leora, 2017;Hansen and Ostermeier, 1996;Igel, Hansen and Roth, 2007;Sergeenko, Yakunina and Granichin, 2020;Granichin, Volkov, Petrov and Volkova, 2021]. The problems of finding the extremum of smooth convex functions with a small number of variables are well studied [Zhiglyavsky, 1985;Nesterov, 2010]. Stochastic methods are especially effective in solving problems of large dimensions.
For uniformly distributed trial points, the probability of getting into ε-neighborhood of the extremum point has the order ε 1/s , where s is the dimension of the search area. An optimization problem with a large number of variables becomes especially difficult for multi-extreme and ravine functions. Therefore, only adaptive methods such as simulated annealing [Metropolis, Rosenbluth, Teller A. and Teller E., 1953] and related genetic algorithms can count on success. Two examples may be presented.
A. Let us consider an objective function f (X): f (X)>0, |f (X)|≤M , X∈D⊂E s , where M is a constant, D is a bounded domain in Euclidian space E s . The sequence of probability densities may be introduced [Vladimirova and Ovsyannikov, 2019] : T dX is normalization constant. The sequence (1) converges weakly as T →∞ to the Dirac δ-function concentrated at the global minimum point of the function f (X). The objective function is supposed to have a unique minimum, otherwise the situation gets more complicated. The extremum search algorithm uses simulation by the Metropolis method [Metropolis, Rosenbluth, Teller A. and Teller E., 1953;Vladimirova and Ovsyannikov, 2019]. B. One of the adaptive methods that have been successfully used in recent years is the covariance matrix method, which was originally proposed in 1977 in [Ermakov and Mitioglova, 1977]. This method was used to find the maximum of target function f (X), and another density of the type (1) was proposed: It is assumed that f (X)>0, the integral D f ν (X) dX exists and is finite for any finite integer degree ν. If the global maximum of the function f (X) is unique, then the densities (2) also weakly converge (as ν→∞) to the δ-function concentrated at the point of the global maximum f (X). The behavior of density (2) in the general case can be complicated. To simplify the simulation of density (2), the authors of the paper [Ermakov and Mitioglova, 1977] replaced it with a similar densitȳ Here k is iteration number; K is the total number of search stages; C k is normalization constant; N (X, X max , B (k) ) is normal distribution density; the mean X (k) max is the estimate of f (X) extremum point calculated at (k−1)-th iteration; B (k) is covariance matrix estimate obtained at (k−1)-th iteration. At every iteration this matrix reflects the properties of the function f (X). The disadvantage of this method is the need to decompose B (k) into the product of two triangular s 2 -dimensional matrices to simulate the normal density N (X, X (k) max , B (k) ) at each stage of the search. Some additional difficulties can arise when the covariance matrix is close to degenerate.
In [Ermakov and Semenchikov, 2019], the method is proposed that is free from these drawbacks. This method was successfully applied to beam dynamics optimization problem reduced to quality criterion minimization in search area of dimension 84. [Vladimirova, Zhdanova, Rubtsova and Edamenko, 2020]. Now in our paper we suggest essential modifications of the method mentioned to avoid too rapid contracting of the scattering ellipsoid and obtaining local extremum instead of global one.

Global extremum search method
We will consider the problem of extremum point finding for a function f (X) defined in a bounded domain D : X∈D⊂E s . Suppose we are talking about a global minimum, which is attained at a single point X min from the region D: The method for solving problem (4) is to construct a sequence of sample point generations to provide a condensation of points in the neighborhood of the minimum point. So the algorithm consists of a number of stages. Let us present the iterative part of algorithm.
0-th generation. If we have no a priori information on the location of global extremum point and the behavior of objective function, we will scan D using the uniform distribution of s-dimensional vectors and calculate f (X) at the points obtained. Let N 0 be the number of trial points composing the zero generation. After that we choose among them m "best" points X 1 , X 2 , . . ., X m corresponding to the smallest values of the objective function, and determine the minimum point X k-th generation. At k-th stage we calculate N k points of the k-th generation by formulae using the "best" points X 1 , X 2 , . . ., X m of previous generation. In formulae (6) After that, we perform the selecting of m "best" points of k-th generation and determine the minimum point Z After renaming Z to X, the next iteration can be performed.
Covariance matrix estimation. Let us estimate the parameters of normal distribution at k-th stage.
The results (10), (11) allow us to draw the following conclusions.
2) The estimates of covariance matrix elements depend only on "best" points X 1 , X 2 , . . . , X m of the previous generation.
3) The method presented for random vectors Z j , j=1, N k simulation allows not to calculate and store the covariance matrix (which is difficult for large dimension s of search space). There is no need to decompose the covariance matrix, for example, into triangular ones. (If the matrix is close to degenerate, its decomposition would be difficult.) The convergence of this algorithm for a unimodal objective function is proved in [Ermakov and Semenchikov, 2019].
Covariance matrix adaptation. Consider the formulae (6) for new generation constructing.
Remark 1. Instead of the vectorX, it is more efficient to use X (k−1) min that is "the best" sample point of previous (k−1)-th generation.
Remark 2. With search stage number increasing, the scattering ellipsoid of normally distributed vectors Z j , j=1, N k quickly contracts to a point and the search practically does not occur. Therefore, the following two options are proposed.
1) At each stage we will multiply standard normal random variables η (i) j , i=1, m, j=1, N k in equations (6) by some constant σ. The optimization experience shows that this constant can depend on the stage number: σ=σ(k). As a result, the scattering ellipsoid of vectors Z j , j=1, N k , will cover a sufficient part of the domain D and the search will continue. The value σ(k) is to be determined specially for each objective function f (X).
2) When modeling, we divide the vectors Z j , j=1, N k into L groups, consisting of N We apply different constants σ (l) (k), l = 1, L for different groups (l is the group number). So the formulae for sample points simulation take the form: Further, we select m l "the best" sample points from N (l) k points of l-th group, l = 1, L. It is clear that m 1 + · · · + m L = m. After renaming Z to X, we have the set X 1 , X 2 , . . ., X m of selected points for the next stage.
Remark 3. To isolate the global extremum, the objective function is raised to some power p, and the problem under study takes the form: f p (X) → min.
The effectiveness of the modifications proposed is confirmed by numerical results.
3 Parameters estimating problem for nonlinear parametric regression Consider the exponential regression with parameter vector θ=(β 1 , λ 1 , . . ., β n , λ n ) of the dimension s=2n. Let the vector y=(y 1 , . . ., y I ) be the result of observations of the experiment at the preassigned points x i , i=1, I, where I>0 is a given number; vector x=(x 1 , . . ., x I ) is regression plan. We will estimate the regression parameter vector θ using the least squares method:θ = arg min Here P is some compact in Euclidean space E s . The observation values y i , i = 1, I, are simulated as follows. Letθ be some given vector of parameters. We will introduce where ε i , i=1, I, are random measurement errors. These values are independent normally distributed and centered: To find the solutionθ of extremal problem (14), genetic algorithm with covariance matrix adaptation is chosen. For a given plan x=(x 1 , . . ., x I ) and result vector y=(y 1 , . . ., y I ), the objective function f (θ)= I i=0 (y i −η(x i , θ)) 2 depends only on s-dimensional vector of parameters θ=(β 1 , λ 1 , . . ., β n , λ n ).
Such a problem arises in the field of analysis of the experiment results in various fields of science: mathematics, physics, chemistry, biology, etc.
The resulting vector y is obtained using simulation by formula (15) for given vector of parameters The standard deviation of random measurement errors is v=0.02. Parameter vector values belong to compact P : Genetic algorithm parameters. The numerical experiments were performed for the following values of method parameters. The number of individuals at each generation is N =5000. The number of "the best" vectors at every stage is m=100. The number of groups of sample points at each iteration is L=2. The constant multipliers in formulae (12) The results of the genetic algorithm in figures. A number of plots are presented below to show the work of the genetic algorithm under consideration.   (2) k =0 in formulae (12)). In this case the search area contracts to a point.
The next case is as follows. At each iteration, the set of test points was divided in half: k . For the first part σ (1) (k)=1, and for the second one σ (2) (k)=k √ k, where k is the iteration number. Now scattering ellipsoids do not contract to a point, which can be seen in Fig. 3.  All results obtained are gathered in the Table. The best result is marked with the asterisk.

Conclusion
We present genetic stochastic algorithm using normal distribution with covariance matrix adaptation and introduce some modifications. This method allows one to avoid calculating the covariance matrix, and this is a great advantage. We offer a number of modifications, the effectiveness of which is tested on a practical problem. The method in several variants was applied to parameters estimating problem for nonlinear regression. The worst result was obtained for the method [1] without any modifications. In this case one can observe scattering ellipsoid contracting to the point, and trial points cannot leave the vicinity of the local extremum point. First of modifications is introduction of correcting multiplier in sample points modeling formula. Second is using different multipliers for different groups of sample points. (One can consider the using of multipliers to be mutations of individual properties.) As a result, the trial points cover a sufficient part of the domain D and avoid the contraction to local extremum point. These modifications led to significant improvement in the result. Third modification is raising the objective function to a sufficiently large degree, which makes it possible to isolate the global extremum point. This is a useful technique that requires further research. When all individuals were divided into two equal parts with using the factors σ (1) (k)=1, σ (2) (k)=k √ k, where k is generation number, the best value of the objective function was obtained for a given stop criterion ε f =0.1·10 −4 . The number of iterations in all the cases does not exceed 9.
The objective function near a minimum has a ravinetype shape; this is indicated by the elongated shape of the scattering ellipsoid. The proposed modified method successfully coped with minimization problem. Attempts to use the gradient method in this task were unsuccessful.
The research under discussion was presented by oral contribution at the 10th International Scientific Confer-ence on Physics and Control (PHYSCON 2021, 4-8 October, 2021.