0%

  • Sarsa: On-policy TD Control
  • Q-learning: Off-policy TD Control
  • Maximization Bias and Double Learning
  • Games, Afterstates, and Other Special Cases
  • Summary
    Read more »

  • What is Monte Carlo
  • Using Monte Carlo for Prediction
  • Using Monte Carlo for Action Values
  • Using Monte Carlo methods for generalized policy iteration
  • Solving the Blackjack Example
Read more »

Preface

When I do some system research work, I found it worth understanding the real implementation of every details of every components (file system, memory management, etc.). Thus, I want to start a new chapter here to records every notes and experience of reading books - Understanding the Linux Kernel, Third Edition 3rd Edition by Daniel P. Bovet. Hope after reading this books, I can understand every papers in the OSDI and figure out more useful, novel idea. Not only think without considering any real problems or architecture in the operating system.

Read more »

Preface

We start our coursera Sample-based Learning Methods from now on. And in this period, I will still excerpt some sentences from Sutton’s book. But this time, I will label my own comprehension red.

Read more »

Reinforcement Learning Day 4 (Policy Evaluation)

  • Abstract
  • Policy Evaluation
  • Policy Improvement
  • Policy Iteration
  • Value Iteration
    Read more »

MetaLearning Learning Note - 4

  • optimization based on meta learning
  • non-parametric few-shot learning
  • properties of meta learning algorithms.
    Read more »

Reinforcement Learning Day 4 (Finite Markov Decision Processes’s Coursera Video Notes)

  • Specifying Policies
  • Value Functions
  • Action-value function
  • Bellman Equation Derivation
  • Intuition - Bellman Eqaution
    Read more »