In this example, the player will have to allocate a certain amount of money to find the most profitable slots (this is called exploration) and the rest of the budget to invest in these slots (this is called exploitation). One can consider that in each round rewards are drawn from the slot machines that were not chosen, but are neither observed nor collected by the player, see the figure below, where the observed rewards are shown in color. The reward given by the chosen slot machine is a random variable drawn from a certain probability law. The goal is to collect as much money as possible after N draws. The player has a certain budget, for example N coins of 1 dollar, and has to play these coins one by one on the different slot machines. It is this ignorance of profitability that makes it a learning problem. The context is as follows: a player has a choice of several slot machines (also called arms) whose average profitability is not known in advance. It is a statistical learning model, with the aim to make a sequential choice between several actions based on the rewards they generate.Īn example of the application (which gave its name to the model) is the choice between several slot machines, “slot machine” being also called “one-armed bandit” in English.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |