Preface
Recently, Prof Song ask me to do some work related to meta-learning plus reinforcement learning. I hardly do some work related to AI. Therefore let me use this topic to begin reviewing and learning some meta-learning and reinforcement leraning principle work! First of all, I will use sutton's book as our first reference and introduce all computation approach from this book. Then I also take the course fundamentals-of-reinforcement-learning by Martha and Adam in University of Alberta.
I will only write more complex and meaningful formula instead of some classical formula. For example, epsilon greedy will not be introduced here instead I will introduce UCB, SARSA and Contextual bandits.
Also, for the RL code, at begging review of RL principles, I will use courseral's assignment code. After that, I will use the open-source code from the theory or paper I introduced.
Reinforcement Learning Day 1 (Introduction)
Let me review some basic ideas in Sutton and Barton's book. The first chapter is introduction. So I only record some important and meaningful sentences here.
-
"Markov decision processes are intended to include just these three aspects—sensation, action, and goal"
-
trade-off between exploration and exploitation
-
Have a fixed problem paradigms.
-
Methods based on general principles, such as search or learning, were characterized as “weak methods,” whereas those based on specific knowledge were called “strong methods.”
-
Models are used for planning, by which we mean any way of deciding on a course of action by considering possible future situations before they are actually experienced. Methods for solving reinforcement learning problems that use models and planning are called model-based methods, as opposed to simpler model-free methods that are explicitly trial-and-error learners
-
The use of value functions distinguishes reinforcement learning methods from evolutionary methods that search directly in policy space guided by evaluations of entire policies.
-
reinforcement as the strengthening of a pattern of behavior due to an animal receiving a stimulus—a reinforcer—in an appropriate temporal relationship
-
When a configuration is reached for which the action is undetermined, a random choice for the missing data is made and the appropriate entry is made in the description, tentatively, and is applied. When a pain stimulus occurs all tentative entries are cancelled, and when a pleasure stimulus occurs they are all made permanent. (Turing, 1948)
Overall After I read the first chapter (basically, I have read lots of reinforcement learning related papers), the history of reinforcement learning impressed me, espeically the RL origned from the animal and its behavior. Also, a lot of paper transfer RL to supervised learning and consider RL is one of the supervised learning. In sutton's Book, we can see he make a lot of progress to extend the history of RL and let it as a signle pattern of artifical intelligence.