6

I've recently been exploring Reinforcement Learning (RL) methods in my work and the exploration-exploitation dilemma is always mentioned. It almost feels like the exploration-exploitation dilemma stemmed from RL research. However, when we think about Simulated Annealing (SA) and Genetic Algorithms (GA), isn't there also the notion of exploration and exploitation? Specifically, in SA, you take a random step (explore) with some probability and an improving step otherwise (exploit). In GA, you mutate (explore) with some probability and select good parents for crossover (exploit).

My question: which paper first mentioned this? It feels as though https://pubsonline.informs.org/doi/abs/10.1287/orsc.2.1.71 is when it was first mentioned but I'd like expert opinions on this.

RobPratt
  • 32,006
  • 1
  • 44
  • 84
jkschin
  • 365
  • 2
  • 5
  • Exploration vs. Exploitation is a big thing in Bayesian optimization. Bayesian optimization publications go back to the early 70s. I don't know when exploration vs. exploitation started being considered in that literature, and when it was called as such. – Mark L. Stone Jun 08 '22 at 00:46
  • Well, perhaps "Efficient Global Optimization of Expensive Black-Box Functions" https://www.researchgate.net/profile/Donald-Jones-5/publication/235709802_Efficient_Global_Optimization_of_Expensive_Black-Box_Functions/links/56539a8f08ae4988a7afb723/Efficient-Global-Optimization-of-Expensive-Black-Box-Functions.pdf?origin=publication_detail published in 1998 might have been the 1st publication in which the "modern" version of Bayesian Optimization was considered, which formulated trade off between exploration and exploitation, although not quite called as such. That is after your 1991 paper.. – Mark L. Stone Jun 08 '22 at 01:07
  • You might also want to consider adaptive experimental design for bandit problems going back to the 70s, although you might not necessarily consider that they qualify. – Mark L. Stone Jun 08 '22 at 01:07
  • An earlier paper is Axelrod, R. (1987). The evolution of strategies in the iterated Prisoner's Dilemma. In Genetic algorithms and simulated annealing - https://books.google.ca/books?hl=en&lr=&id=ddc9AAAAIAAJ&oi=fnd&pg=PA1&dq=Axelrod,+R.+(1987).+The+evolution+of+strategies+in+the+iterated+Prisoner%27s+Dilemma.+In+Genetic+algorithms+and+simulated+annealing&ots=0yftOrCGV0&sig=_hVUQVLAKRseTqphRzq0yYBnRMQ#v=onepage&q&f=false – Rob Jun 08 '22 at 01:41
  • 1
    In optimization, one often uses the words "intensification" and "diversification" – fontanf Jun 08 '22 at 08:47
  • It's essentially what is called dual control (simultaneously disturbing a system to understand its dynamics and at the same time controlling the system using the model that is built, and doing this jointly in an optimal fashion). Goes back to the early 60s and Feldbaum. – Johan Löfberg Jun 09 '22 at 19:49

1 Answers1

1

With reference to the paper in the original post "Exploration and Exploitation in Organizational Learning" (E&E) by James G. March, the paper "The Lost Experiment in Exploration and Exploitation" by Donncha Kavanagh provides some interesting historical context preceeding the E&E paper.

2D1C
  • 131
  • 2