0%

ReinforcementLearning-Principle-Day1

Preface

Recently, Prof Song ask me to do some work related to meta-learning plus reinforcement learning. I hardly do some work related to AI. Therefore let me use this topic to begin reviewing and learning some meta-learning and reinforcement leraning principle work! First of all, I will use sutton’s book as our first reference and introduce all computation approach from this book. Then I also take the course fundamentals-of-reinforcement-learning by Martha and Adam in University of Alberta.

I will only write more complex and meaningful formula instead of some classical formula. For example, epsilon greedy will not be introduced here instead I will introduce UCB, SARSA and Contextual bandits.

Also, for the RL code, at begging review of RL principles, I will use courseral’s assignment code. After that, I will use the open-source code from the theory or paper I introduced.

Reinforcement Learning Day 1 (Introduction)

Let me review some basic ideas in Sutton and Barton’s book. The first chapter is introduction. So I only record some important and meaningful sentences here.

  • “Markov decision processes are intended to include just these three aspects—sensation, action, and goal”

  • trade-off between exploration and exploitation

  • Have a fixed problem paradigms.

  • Methods based on general principles, such as search or learning, were characterized as “weak methods,” whereas those based on specific knowledge were called “strong methods.”

  • Models are used for planning, by which we mean any way of deciding on a course of action by considering possible future situations before they are actually experienced. Methods for solving reinforcement learning problems that use models and planning are called model-based methods, as opposed to simpler model-free methods that are explicitly trial-and-error learners

  • The use of value functions distinguishes reinforcement learning methods from evolutionary methods that search directly in policy space guided by evaluations of entire policies.

  • reinforcement as the strengthening of a pattern of behavior due to an animal receiving a stimulus—a reinforcer—in an appropriate temporal relationship

  • When a configuration is reached for which the action is undetermined, a random choice for the missing data is made and the appropriate entry is made in the description, tentatively, and is applied. When a pain stimulus occurs all tentative entries are cancelled, and when a pleasure stimulus occurs they are all made permanent. (Turing, 1948)

Overall After I read the first chapter (basically, I have read lots of reinforcement learning related papers), the history of reinforcement learning impressed me, espeically the RL origned from the animal and its behavior. Also, a lot of paper transfer RL to supervised learning and consider RL is one of the supervised learning. In sutton’s Book, we can see he make a lot of progress to extend the history of RL and let it as a signle pattern of artifical intelligence.