The learning environment we will consider is a system composed by two subjects:
the learning agent (or simply the learner) and a dynamic process. At successive time
steps, the agent makes an observation of the process state, selects an action and applies
it back to the process, modifying the state. The goal of the agent is to find out adequate
actions for controlling this process. In order to do that in an autonomous way, it uses a
technique known as Reinforcement Learning.
Reinforcement Learning (RL for short) is learning through direct experimentation.
It does not assume the existence of a teacher that provides ‘training examples’. Instead,
in RL experience is the only teacher. The learner acts on the process to receive signals
(reinforcements) from it, indications about how well it is performing the required task.These signals are usually associated to some dramatic condition — e.g., accomplish-ment of a subtask (reward) or complete failure (punishment), and the learner’s goal is
to optimize its behavior based on some performance measure (usually minimization of
a cost function1). The crucial point is that in order to do that, in the RL framework
the learning agent must learn the conditions (associations between observed states and
chosen actions) that lead to rewards or punishments. In other words, it must learn how
to assign credit to past actions and states by correctly estimating costs associated to
these events. This is in contrast with supervised learning (Haykin, 1999), where the
credits are implicitly given beforehand as part of the training procedure. RL agents are
thus characterized by their autonomy
PENJELASAN
DI ATAS ADALAH GAMBARAN UMUM UNTUK INFO LEBIH LANJUT SILAHKAN KONTAK ADMIN ATAU
KLIK PEMESANAN