Machines Regret Their Actions Too: A Brief Tutorial on Multi-armed Bandits
· Talk · University of Cambridge
Abstract. Multi-armed bandits are a class of sequential decision problems which include uncertainty. One of their defining characteristics is the presence of explore-exploit tradeoffs, which require one to balance taking advantage of information that is known with trying different options in order to learn more information in order to make optimal decisions. In this tutorial, we introduce the problem setting and basic techniques of analysis. We conclude by discussing how explore-exploit tradeoffs appear in more general settings, and how the ideas presented can aid in understanding of areas like reinforcement learning.