
  • Sarsa: On-policy TD Control
  • Q-learning: Off-policy TD Control
  • Maximization Bias and Double Learning
  • Games, Afterstates, and Other Special Cases
  • Summary
  • What is Monte Carlo
  • Using Monte Carlo for Prediction
  • Using Monte Carlo for Action Values
  • Using Monte Carlo methods for generalized policy iteration
  • Solving the Blackjack Example
When I do some system research work, I found it worth understanding the real implementation of every details of every components (file system, memory management, etc.). Thus, I want to start a new chapter here to records every notes and experience of reading books - Understanding the Linux Kernel, Third Edition 3rd Edition by Daniel P. Bovet. Hope after reading this books, I can understand every papers in the OSDI and figure out more useful, novel idea. Not only think without considering any real problems or architecture in the operating system.

We start our coursera Sample-based Learning Methods from now on. And in this period, I will still excerpt some sentences from Sutton’s book. But this time, I will label my own comprehension red.

Reinforcement Learning Day 4 (Policy Evaluation)

  • Abstract
  • Policy Evaluation
  • Policy Improvement
  • Policy Iteration
  • Value Iteration
MetaLearning Learning Note - 4

  • optimization based on meta learning
  • non-parametric few-shot learning
  • properties of meta learning algorithms.
Reinforcement Learning Day 4 (Finite Markov Decision Processes’s Coursera Video Notes)

  • Specifying Policies
  • Value Functions
  • Action-value function
  • Bellman Equation Derivation
  • Intuition - Bellman Eqaution
