0%

Tensorflow-Day1-DNN Explain

Posted on 2019-11-26 Edited on 2026-03-23

Deep learning fundamentals with TensorFlow — covering DNN architecture, forward/backward propagation, activation functions, and gradient descent.

Preface

I am learning Cloud Computing, Software Engineer and Operation System. And artificial intelligence is an interesting and full of application value field. I'd like to solve my major field in artificial intelligence or use artificial intelligence to work out my subject's problem.

Reinforcement Learning_WatermelonBook_Summary

Posted on 2019-11-24 Edited on 2026-03-23

Summary of reinforcement learning concepts from Zhou Zhihua's Machine Learning textbook (Watermelon Book), covering core RL theory and algorithms.

Preface

最近吴老师需要入手强化学习，百战不怠呀，从零开始，认真学习！

ReinforcementLearning_WatermelonBook_Summary

周志华西瓜书 Summary

简单介绍什么是强化学习

强化学习由<S,A,P,R> 由state状态空间S，动作控件action，状态转移概率Probability，Reward奖励值四个表示。而这样的描述一个学习过程，我们成为用MDP，马尔可夫决策过程来描述强化学习。而机器的目标则是学会一个策略函数π，而通常而言我们将通过π(s,a)来表示，表示在状态空间S中某个状态s，我们选择动作空间A中a的概率为多少。这就是策略的一般表示

强化学习例子1——摇臂赌博机

考虑这样的一个问题，有一个k个摇杆的赌博机，你并不知道按下摇杆会吐出多少钱，现在的问题是我只有100次机会我应该如何利用来达到最大金钱总数

建模

这个其实也是一个典型的马尔可夫过程。环境是k摇臂赌博机，S是当前我已知按过的摇臂，吐出过多少钱。动作A是我当前应该按下哪个摇杆。而状态转移概率P指的是我当前这个状态，按下某个摇杆到达另一个金钱状态的概率。而Reward很明显为当前的金钱奖励总和。

算法Exploration & Exploitation & Epsilon-Greedy

很明显这个例子，我们有两种选择，选已知的平均奖赏最大的赌博机，和随机选一个摇杆。而我们设置一个值epsilon，当随机正态分布<Epsilon的时候进行explore，在1-epsilon中进行exploit。