More on "stochastic local search" definition
By nojhan on Monday 17 November 2008, 11:24 - Anecdotal - Permalink
Petr Pošík let an interesting comment on a previous post : "Metaheuristic" or "Stochastic Local Search"?. As bloggers always love worthwhile comments from insightful readers, I copy it here, as an article, along with my answer.
I would like to present here an idea that will take the "stochastic" and "local" idea even more further than Thomas and Dirk. In that view, even GAs are local search techniques! Why?
It has a lot to do with the definition of neighborhood of the current state of the algorithm. The state may be the position of an algorithm in the search space. In classical local search, the state is given by mere 1 point. In population-based techniques, the state is the whole population. New solutions are usually generated only by local modifications of the state, i.e. the search takes place only in the local neighborhood of the algorithm state.
Yes, of course, the local neighborhood of the population as a whole is very broad and such a local search is much less prone to premature convergence and exhibits a kind of global behaviour. But, IMHO, it is still a local search...
I completely understand your view: if there is a non-zero probability of generating any solution in the search space, then the search must eventually find the global optimum, and thus performs a global search.
This is the case for many stochastic local search algorithm in continuous domain which use e.g. normal distribution. But if you set the variance too high, you will get very slow convergence to that global optimum which is in practice not desirable. If you set variance too low, you will quickly find local optimum, but you will have to wait for the global one virtually infinitely long - again undesirable. Or, put it in another way, would you call a "global optimization" technique still "global" if it takes zilions of years to find that global solution?
In practice, I think, for continuous spaces we have to resort to algorithms that exhibit local behavior since we always have some hard constraints on the length of the optimization run. In my eyes, this perfectly justifies the name "stochastic local search".
In fact, as far as I comprehend them, both Dick Thierens and Thomas Stützle shares your point of view. For them, GA are a kind of stochastic local search.
I find the classification in "descent" algorithm and "population" ones quite artificial. For example, simulated annealing is implemented as a descent algorithm, but does actually perform a sampling of the objective function, which is commonly seen as a population algorithm characteristic.
More rigourously, I think that every stochastic metaheuristic does try to avoid local optima. Stochastic processes are always here to do so[1].
Maybe the problem is the definition of local search. In my humble opinion, a local search is a method that search for a local optimum. This can be opposed to global search, where one want to find a global optimum.
You are right when you point out that this is related to ergodicity, as only an ergodic[2] algorithm may converge[3], and thus be able to reliably find a global optimum. Thus, yes, I will say that a true and rigorous optimization method is global if, and only if, it is at least quasi-ergodic[4]. A pure random search is the basic global optimization method, that a good metaheuristic should at least outperforms.
Here, it is the stochastic operators that manipulates the solutions that permits the global behaviour. This is not related to the encoding of the solutions/the neighbourhood structure, even if it is a really crucial part of the algorithm.
Thus, a "stochastic local search" may be defined as a local search seeking a global optimum, which is a paradoxical definition. I'd rather prefer that we keep the implementation and the mathematical bases separated, and thus talk of "stochastic search", or "stochastic metaheuristic".
Or else, one may want to use a paradoxical definition, as a matter of fun, which is also a good reason to do so :-)
Notes
[1] There may be probabilistic choices that are not related to such tasks, but they are not linked to the iterative aspects of the search algorithms, thus not being stochastics
[2] i.e. that can evaluate any solution
[3] i.e. that have a non-null probability of finding the global optimum in a finite time, or to say it differently, that can find the global optimum after a time that may tends towards infinity (for continuous problems, for discrete ones it is bounded by the size of the instance)
[4] i.e. that can reach any solution in a finite number of iterations
Comments
Hi, Johan!
Thanks for "publishing" my comment! :-) But let me comment one more time.
You wrote:
"Here, it is the stochastic operators that manipulates the solutions that permits the global behaviour."
Yes, I completely agree with that. But in that light I do not understand your opinion that
"This is not related to the encoding of the solutions/the neighbourhood structure, even if it is a really crucial part of the algorithm."
The chosen stochastic operators are IMHO directly related with the encoding (since they are defined for one particular representation) and they are directly related to the neighborhood structure (since they themselves define the neighborhood structure!). Having said this, I must also question your following statement:
"Maybe the problem is the definition of local search. In my humble opinion, a local search is a method that search for a local optimum. This can be opposed to global search, where one want to find a global optimum."
Any reasonable algorithm searches for a local optimum. Moreover, before you select the representation and the operators, you cannot say if a point is or is not a local optimum, can you?
The issue of selecting the right representation and operators for a given objective function means that we choose such a neighbborhood structure in which those "local optima" actually are not local optima. We simply always want to have a way to generate new better solution. The ideal state would be if we were able to select such representation and operators so that the problem would have only 1 optimum - the global one.
This just to clarify my view, I do not want to start a flame war here.
Thanks for your postings!
Petr
What I wanted to say is that the _stochastic_ aspects of the algorithms are not linked to the representation, but to the behaviour of the method. This one of the reason why I prefer the term "METAheuristics", which clearly emphasize these aspects. For example, on static optimization problems, the probabilistic aspects of the operators are only here to avoid being stucked in local optima, they are not related to how you define the neighbourhood. Its a matter of choosing rather than representing.
But I completely agree that the representation choice is the first problem to solve carrefully. I also agree with you on the fact that all metaheuristics does not guarantee optimality, thus only giving an approximation. But, they do search for _global_ optimum, even if they cannot find it and find local optimums most of the times, they search for it.
I also agree that the locality of optima is partly defined by the representation (obviously). But there is no contradiction here, metaheuristics are the part of the algorithms that does not change when you change your representation. The probabilistic aspects of the operators are still here for each representation.
For example, you may implement a mutation operator differently, depending on your representation, but there will always be a probabilistic process in it. This probabilistic part is the one that permits to add diversification to an evolution strategies. Selection alone defines a kind of local search, selection with probabilistic processes does not.
A local search have a very simple stopping criterion: stop if you cannot improve the solution using the neighbourhood. The only obvious stopping criterion that a global search have is: stop when you are sure that all ther solutions are less good. For stochastic metaheuristics, we can't even use such criterion, we must rely upon empirical criteria.
To say it simply, a local search cannot be stochastic, or else it is a _global_ search. What matters is the intent.
Anyway, using different terms must not change the fact that, yes, it is generally more rigorous to start a study on a new problem with local search, and then to increase complexity by adding stochastic processes: global searches.
Hmm, Johan, I partly agree with you, but again - I must react to one of your thoughts: "...a local search cannot be stochastic, or else it is a _global_ search."
What is _stochastic_ optimization algorithm? An algorithm that evolves in a different way each time you run it from the same initial conditions, am I right?
Now, suppose a stupid optimization algorithm. Its state (the present best-so-far solution) is x. In each step, the algorithm can generate one of 2 candidate points x-1 and x+1. Which one? That is decided randomly. If one of these points is better, the algorithm accepts it as its current BSF. And iterates over and over until both of the two candidate solutions are worse than the current BSF.
Is this algorithm stochastic? Yes if we agree on the previous definition of what _stochastic_ means. Is this algorithm local? In my opinion yes, even it is stochastic. So here you have an example of stochastic local search.
The implication "stochastic -> global search" is IMHO not right. The neighborhood is finite here (or in better words, the neighborhood does not cover the whole search space) and thus it cannot perform global search.
Hmmm, it seems we have developed pretty philosophycal dialog here... :-) But I like it!
Regards, Petr
I have discussed a lot these issue with one of my colleague, and it appears that he have a point of vue similar to you. We end by consulting the books "Local Search in Combinatorial Optimization", by Aarts and Lenstra, and "Constraint-based Local Search", by Van Hentenryck and Michel.
It appears that, unlike in my own "culture", a local search is defined by the fact that it use a neighbourhood, and performs choices iteratively in this structure.
In my own field, thats what one may call "descent" algorithms, by opposition to "population" ones. What differ between a local and a global search in this domain, is the fact that a local search seek a local optimum, and will stop if it cannot improve the solution using its neighbourhood. On the contrary, a global search will try to find the global optimum, avoiding local ones, with more "smart" strategies.
With the first definition, almost every approximation algorithms are local search, with the second, most of them are global ones.
I think the difference is just a matter of agreeing on definition. Considering the first one, I agree with you on most points.
Concerning your definition of stochastic, I must disagree. IMHO, an algorithm that run differently each times it is ran is probabilistic. If it is iterative and that the probabilistic processes varies according to the iterations (i.e. the preceding state), then it is stochastic. Thus because the stochastic term have a temporal meaning that probabilistic or random may not have.
Anyway, considering the first definition (insisting on the neighbourhood) I then agree with you, a local search may be stochastic.
What bother me with this one is that it does cover too much algorithms to be usefull. Indeed, what are global search then? Enumeration methods? You say that a finite neighbourhood cannot be used to performs global search, why? Aarts and Lenstra distringuishes local search from constructive algorithms, but I wonder how this articulates when considering Ant Colony Optimization... which may be considered as a constructive local search!
One of the advantage I see with the definition I used to employ is that it does not gather most of the methods in one single category. Moreover, I find it more simple: local search = local optimum, descent algorithms = neighbourhood based. Here, you can perform a global search with a finite neighbourhood: you just need to have a quasi-ergodic one.
It ends as a matter of definition and to be franck, consulting the litterature, it seems that you are right, most people seems to use the first definition. I think I need to find who put the less used one in my mind :-)