Preface
We start our coursera Sample-based Learning Methods from now on. And in this period, I will still excerpt some sentences from Sutton's book. But this time, I will label my own comprehension red.
ReinforcementLearning-Principle-Day6: Monte-Carlo
Introduction
Monte Carlo methods are ways of solving the reinforcement learning problem based on averaging sample returns. The term “Monte Carlo” is often used more broadly for any estimation method whose operation involves a significant random component.
Monte Carlo methods sample and average returns for each state–action pair much like the bandit methods we explored in Chapter 2 sample and average rewards for each action. The main di↵erence is that now there are multiple states, each acting like a different bandit problem (like an associative-search or contextual bandit) and the different bandit problems are interrelated. That is, the return after taking an action in one state depends on the actions taken in later states in the same episode. Because all the action selections are undergoing learning, the problem becomes nonstationary from the point of view of the earlier state.