Slurm-Day2

Posted on 2020-09-05 Edited on 2026-03-23

Slurm cluster management notes — resource allocation, GPU scheduling, and job arrays for parallel workloads.

Slurm Source Code Install | Cluster Deployment - Day2 slurm's dependcies & proxy

dependencies and proxy installation

At first my bare metal is in my lab' inner network. I cannot let them connect to the www. Therefore I install a ubuntu 18.04 desktop in my computer and use apt to download deb and move deb packages to the bare metal and intall them. (my computer is windows therefore I use apache as proxy)

But I find it's really hard to find all dependcies for a packages. I write a bash for this problem.

1	sudo apt-get download `apt-cache depends gcc-5 \| grep depends \| cut -d: f2 \|tr -d "<>"`

Even I run a bash for all dependecies but I still cannot install gcc. Therefore I give up. I choose to use my computer as a proxy server for all inner-server. After that, my server access Internet. Then I install gcc successfully. Also I install munge for slurm.

Slurm-Day1

Posted on 2020-09-04 Edited on 2026-03-23

Slurm cluster management notes — introduction to job scheduling, partitions, and basic sbatch/srun commands.

Preface

Recently I got 14 bare-metal machine from my lab. I need to deploy my own modified slurm in theis cluster. It's so hard to build a cluster from a bare mental without anything. I decide to record problems I faced and how to solve them.

Slurm Source Code Install | Cluster Deployment - Day1 cluster power up

Bare-metal

The 14 bare-metal server is AMAX server. And The AMAX property is following:

G204-H2

G404-H2

Go to power up the bare mental offline

Fisrt we need to set the power for AMAX server. we plugin the power and open the server. After opening them, we can use "del" or "f11" enter the bios system.

Then the core work of my first step is about the BMC. First I need to know what is BMC. I just cite it from [https://www.servethehome.com/explaining-the-baseboard-management-controller-or-bmc-in-servers/]

Reinforcement Learning-Principle-Day1

Posted on 2020-08-23 Edited on 2026-03-23

Reinforcement learning study notes — introducing MDPs, value functions, Bellman equations, and the fundamental framework of RL.

Preface

Recently, Prof Song ask me to do some work related to meta-learning plus reinforcement learning. I hardly do some work related to AI. Therefore let me use this topic to begin reviewing and learning some meta-learning and reinforcement leraning principle work! First of all, I will use sutton's book as our first reference and introduce all computation approach from this book. Then I also take the course fundamentals-of-reinforcement-learning by Martha and Adam in University of Alberta.

I will only write more complex and meaningful formula instead of some classical formula. For example, epsilon greedy will not be introduced here instead I will introduce UCB, SARSA and Contextual bandits.

Tensorflow-Day1-DNN Explain

Posted on 2019-11-26 Edited on 2026-03-23

Deep learning fundamentals with TensorFlow — covering DNN architecture, forward/backward propagation, activation functions, and gradient descent.

Preface

I am learning Cloud Computing, Software Engineer and Operation System. And artificial intelligence is an interesting and full of application value field. I'd like to solve my major field in artificial intelligence or use artificial intelligence to work out my subject's problem.

Reinforcement Learning_WatermelonBook_Summary

Posted on 2019-11-24 Edited on 2026-03-23

Summary of reinforcement learning concepts from Zhou Zhihua's Machine Learning textbook (Watermelon Book), covering core RL theory and algorithms.

Preface

最近吴老师需要入手强化学习，百战不怠呀，从零开始，认真学习！

ReinforcementLearning_WatermelonBook_Summary

周志华西瓜书 Summary

简单介绍什么是强化学习

强化学习由<S,A,P,R> 由state状态空间S，动作控件action，状态转移概率Probability，Reward奖励值四个表示。而这样的描述一个学习过程，我们成为用MDP，马尔可夫决策过程来描述强化学习。而机器的目标则是学会一个策略函数π，而通常而言我们将通过π(s,a)来表示，表示在状态空间S中某个状态s，我们选择动作空间A中a的概率为多少。这就是策略的一般表示

强化学习例子1——摇臂赌博机

考虑这样的一个问题，有一个k个摇杆的赌博机，你并不知道按下摇杆会吐出多少钱，现在的问题是我只有100次机会我应该如何利用来达到最大金钱总数

建模

这个其实也是一个典型的马尔可夫过程。环境是k摇臂赌博机，S是当前我已知按过的摇臂，吐出过多少钱。动作A是我当前应该按下哪个摇杆。而状态转移概率P指的是我当前这个状态，按下某个摇杆到达另一个金钱状态的概率。而Reward很明显为当前的金钱奖励总和。

算法Exploration & Exploitation & Epsilon-Greedy

很明显这个例子，我们有两种选择，选已知的平均奖赏最大的赌博机，和随机选一个摇杆。而我们设置一个值epsilon，当随机正态分布<Epsilon的时候进行explore，在1-epsilon中进行exploit。