0%

Introduction

In particular, we focus on four topics. First, we present a taxonomy of instruction set alternatives and give some qualitative assessment of the advantages and disadvantages of various approaches. Second, we present and analyze some instruction set measurements that are largely independent of a specific instruction set. Third, we address the issue of languages and compilers and their bearing on instruction set architecture. Finally, the “Putting It All Together” section shows how these ideas are reflected in the RISC-V instruction set, which is typical of RISC architectures.

Read more »

Preface

Got two certifications from RL in Alberta, I feel I understand more concepts in RL. Keep going! The third part - I see it's related to the GD & function approximation.

  • Introduction
  • Value-function Approximation
  • The Prediction Objective (VE)

Introduction

Control Problem is the task of improving a policy. So, if we only need to evaluate the state, it's not a control problem.

Can we represent the value function with a tabel? => no; GD / Average award.

The novelty in this chapter is that the approximate value function is represented not as a table but as a parameterized functional form with weight vector w Rd\in R^d .

Read more »

前言

最近很多科研工作要做,但是就是不想做,想整七整八。所以今天来折腾下路由器。极路由1S, 5661A。还是很有意思的。

看着本科16年买的极路由1S,虽然公司倒闭了,但是路由器的开发版,我仔细看了下论坛,应该是现在破解方式比较多的路由器之一。不过实话说2.4G HZ还是有点慢。等有时间了还是想新换一个方便刷机的2.4G, 5G WIFI 6路由器。但我去业界看了下,竟然很少有WIFI 6的 openwrt的系统。Glinet有一个,但是好贵,而且没开始卖。我有点春春欲动想去学了写一个。但是最近实在太忙,还是再放放。澳洲了有空闲时间一定要来试下。

暂时考虑着之后组网会有一个full control的路由器,所以还是把这个作为科学上网的中继路由。简单记录一下心酸的刷机历程。之后很多会员使用可能都需要先在中介科学上网路由刷下DNS再弄.

Read more »

前言

从深入理解Linux内核这本书,我就对操作系统的某些部分,文件系统,系统调用产生了一些兴趣,想知道每个模块是如何组成如此庞大且复杂的操作系统。但自己实现操作系统的时候,汇编代码有着太大难度,所以一直没有时间阅读并仔细分析操作系统的细节。这次通过陈海波教授的操作系统书籍,希望能进一步补全操作系统中自己忽视了的短板,同时能够学会大部分的汇编代码,写出自己的第一个迷你操作系统。

概述

  1. 对硬件进行管理和抽象
  2. 为应用提供服务并进行管理 - 服务应用, 管理应用
Read more »

Recently, because of limit of memory, my macbook will run slowly when I open several applications, coding, and do some paperwork at the same time. Thus, I bought a new macbook pro 2021 - M1 chip. However, there are lots of data need to migrate to the new macbook. And there are some difference between m1 and intel chip. This blogs give some hints for migration between the computer.

1. Erase the old computer

  1. Backup using time machine

  2. Sign out of iCloud

    If using macOS Catalina or later, choose Apple menu  > System Preferences, then click Apple ID. Select Overview in the sidebar, then click Sign Out.

    If using an earlier version of macOS, choose Apple menu  > System Preferences, click iCloud, then click Sign Out.

Read more »

  • 8.4. Prioritized Sweeping
  • 8.5 Expected vs. Sample Updates
  • 8.6 Trajectory Sampling
  • 8.7 Real-time Dynamic Programming
  • 8.8 Planning at Decision Times
  • 8.9 Heuristic Search

8.4. Prioritized Sweeping

In general, we want to work back not just from goal states but from any state whose value has changed.

In this way one can work backward from arbitrary states that have changed in value, either performing useful updates or terminating the propagation. This general idea might be termed backward focusing of planning computations.

for this algorithm, it add a process:

Read more »

  • Preface
  • 8.1 Models and Planning
  • 8.2 Dyna: Integrated Planning, Acting, and Learning
  • 8.3 When the Model Is Wrong
  • What is a Model?
  • Comparing Sample and Distribution Models
  • Random Tabular Q-planning
  • The Dyna Architecture
  • The Dyna Algorithm
  • Dyna & Q-learning in a Simple Maze
  • What if the model is inaccurate?
  • In-depth with changing environments
  • Drew Bagnell: self-driving, robotics, and Model Based RL
  • Week 4 Summary
  • Programming Assignment: Dyna-Q and Dyna-Q+
Read more »

MetaLearning Learning Note - 5

  • Recap

Recap

Read more »

  • Sarsa: On-policy TD Control
  • Q-learning: Off-policy TD Control
  • Maximization Bias and Double Learning
  • Games, Afterstates, and Other Special Cases
  • Summary
  • Sarsa: GPI with TD
  • Sarsa in the Windy Grid World
  • What is Q-learning
  • Q-learning in the Windy Grid World
  • How is Q-learning off-policy
  • Expected Sarsa
  • Expected Sarsa in the Cliff World
  • Generality of Expected Sarsa
  • Week3 Summary
  • Program Assignment
Read more »

  • Abstract
  • TD Prediction
  • Advantages of TD Prediction Methods
  • Optimality of TD(0)

Abstract

TD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas. Like Monte Carlo methods, TD methods can learn directly from raw experience without a model of the environment’s dynamics. Like DP, TD methods update estimates based in part on other learned estimates, without waiting for a final outcome (they bootstrap).

Read more »

  • What is Monte Carlo
  • Using Monte Carlo for Prediction
  • Using Monte Carlo for Action Values
  • Using Monte Carlo methods for generalized policy iteration
  • Solving the Blackjack Example

What is Monte Carlo

The DP need to know the transition probabilities and consumes too much time complexity.

I observe that in MC there is still a gamma γ\gamma. For the reward after these states, we time it a gamma and add it to average the value function of this state.

Read more »

Preface

When I do some system research work, I found it worth understanding the real implementation of every details of every components (file system, memory management, etc.). Thus, I want to start a new chapter here to records every notes and experience of reading books - Understanding the Linux Kernel, Third Edition 3rd Edition by Daniel P. Bovet. Hope after reading this books, I can understand every papers in the OSDI and figure out more useful, novel idea. Not only think without considering any real problems or architecture in the operating system.

Read more »