The room of our random experiments
by 308
“Reinforcement Learning is learning what to do - how to map situations to actions - so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them.”
2 main characteristics:
there is a tradeoff between exploration and exploitation: the agent is supposed to take actions that maximize the rewards, but in order to discover the best actions, it has to explore the environment. Neither exploration nor exploitation can be pursued exclusively without failing at the task.
Reinforcement learning relies heavily on the concept of state – as input to the policy and value function. The state is a signal of “how the environment is at a particular time”. Most of the RL methods we consider in the book are structured around estimating value functions, but RL problems can be solved in other ways as well. For example, evolutionary algorithms or simulated annealing can be used to find good policies without estimating value functions. These methods can be advantageous when the seatch space of the policies is sufficiently small, or can be structured so that the good policies are easy to find. Moreover, they can be advantageous when the agent can’t sense the complete state of the environment or the state is misperceived.