Preface
Recently, Prof Song ask me to do some work related to meta-learning plus reinforcement learning. I hardly do some work related to AI. Therefore let me use this topic to begin reviewing and learning some meta-learning and reinforcement leraning principle work! First of all, I will use sutton’s book as our first reference and introduce all computation approach from this book. Then I also take the course fundamentals-of-reinforcement-learning by Martha and Adam in University of Alberta.
I will only write more complex and meaningful formula instead of some classical formula. For example, epsilon greedy will not be introduced here instead I will introduce UCB, SARSA and Contextual bandits.
Also, for the RL code, at begging review of RL principles, I will use courseral’s assignment code. After that, I will use the open-source code from the theory or paper I introduced.
Reinforcement Learning Day 1 (Introduction)
Let me review some basic ideas in Sutton and Barton’s book. The first chapter is introduction. So I only record some important and meaningful sentences here.
“Markov decision processes are intended to include just these three aspects—sensation, action, and goal”
trade-off between exploration and exploitation
Have a fixed problem paradigms.
Methods based on general principles, such as search or learning, were characterized as “weak methods,” whereas those based on specific knowledge were called “strong methods.”
Models are used for planning, by which we mean any way of deciding on a course of action by considering possible future situations before they are actually experienced. Methods for solving reinforcement learning problems that use models and planning are called model-based methods, as opposed to simpler model-free methods that are explicitly trial-and-error learners
The use of value functions distinguishes reinforcement learning methods from evolutionary methods that search directly in policy space guided by evaluations of entire policies.
reinforcement as the strengthening of a pattern of behavior due to an animal receiving a stimulus—a reinforcer—in an appropriate temporal relationship
When a configuration is reached for which the action is undetermined, a random choice for the missing data is made and the appropriate entry is made in the description, tentatively, and is applied. When a pain stimulus occurs all tentative entries are cancelled, and when a pleasure stimulus occurs they are all made permanent. (Turing, 1948)
Overall After I read the first chapter (basically, I have read lots of reinforcement learning related papers), the history of reinforcement learning impressed me, espeically the RL origned from the animal and its behavior. Also, a lot of paper transfer RL to supervised learning and consider RL is one of the supervised learning. In sutton’s Book, we can see he make a lot of progress to extend the history of RL and let it as a signle pattern of artifical intelligence.