Tech
Multi-Agent Reinforcement Learning | 多智能体强化学习中的信息设计
资源 Information Design in Multi-Agent Reinforcement Learning. Yue Lin, Wenhao Li, Hongyuan Zha, Baoxiang Wang. Neural Information Processing Systems (NeurIPS) 2023. Poster. [论文]...
Multi-Agent Reinforcement Learning | Information Design in Multi-Agent Reinforcement Learning
Resources Information Design in Multi-Agent Reinforcement Learning. Yue Lin, Wenhao Li, Hongyuan Zha, Baoxiang Wang. Neural Information Processing Systems (NeurIPS) 2023. Poster. ...
Economics & Game Theory | Information Design in 10 Minutes
This note provides a brief introduction to the basic concepts of information design. More details can be found in my other note on this topic. “Sometimes, the truth is not good enough.” —...
Economics & Game Theory | Information Design
It is not from the benevolence of the butcher, the brewer, or the baker, that we expect our dinner, but from their regard to their own interest. We address ourselves, not to their humanity but t...
Mathematics | Mathematica Memos
听说python里用sympy也能做一些推导和化简,之后去看看;mathematica占硬盘太多地了 基础 $\epsilon$这种输入是Epsilon,首字母大写 Enter是换行,Shift + Enter是执行 区分大小写,大小写不同的量是两个量 函数调用,参数用中括号框起来 表达式结尾加分号;能让这个表达式的结果不输出 *是逐元素乘法,句号.是线性...
Misc Toolbox | Building My Own PC
资源 装机:【【装机教程】全网最好的装机教程,没有之一】 兼容性:【【收藏血赚】DIY电脑前必须要知道的事!手把手教你检查电脑装机配置单中各硬件兼容性问题!DIY电脑中各硬件兼容性检查指南!新手小白装机前必读!】 装系统:【【装机教程】超详细WIN10系统安装教程,官方ISO直装与PE两种方法教程,UEFI+GUID分区与Legacy+MBR分区】 CPU...
Reinforcement Learning | TRPO Details
The origin paper: Schulman, John, et al. “Trust region policy optimization.” International conference on machine learning. PMLR, 2015. Overview This derivation comes from the Appendix A.1 ...
Multi-Agent Reinforcement Learning | MetaGrad in LIO
Economics & Game Theory | Fairness Versus Reason in the Ultimatum Game
The Game The experimenter assigns a certain sum, and the Proposer can offer a share of it to the Responder. If the Responder (who knows the sum) accepts, the sum is split accordingly between the...
Economics & Game Theory | Evolutionary Game Theory
Basic Symmetric Model with Stochastic Strategies This section is a summary of Chapter 29 of the book “Algorithmic Game Theory”1. Agents (organisms) The number of agents is infini...
Misc Toolbox | Health Tips
平常健康注意 睡眠 不许熬夜,固定睡眠时间 睡前半小时远离刺激,伸展放松 白天见太阳,多走走,别补觉 晚上拉好窗帘 枕头买舒服的 床不要硬的,也不能软到能塌下去且完全包裹的 保持合适温度,稍微冷一点点,卷好被子 让自己感觉沉在被窝里 心态上开始摆烂,不要想事 扮演一个会呼吸的尸体 半粒思诺思,隔天吃 呼吸 清理舌苔有助于缓解过敏性鼻...
Code Utils | Code Visualization
Function Call Graph Not working: pyan3 pycallgraph pycallgraph2 Inheritance Visualization Example 1: See my blog. Example 2: pyreverse -o png -p outputed_diagram main.py Agent.p...
Code Utils | LyPythonToolbox
Resources Github Repo My Full Code Toolbox Install Install: pip install LyPythonToolbox Update: pip install --upgrade LyPythonToolbox Print Tricks lyprint_separator from LyPythonToolb...
Code Utils | Github Memo
Create a Repo Click the green button New on the GitHub repo website. Do not check the Add a README file. Copy the link with the .git extension. Create a directory locally and enter it in a...
Code Utils | Python Project Template
How to Use Download LyPythonProjectTemplate Decompress it. Create a new Github project. See my blog. Copy the contents of LyPythonProjectTemplate into the root folder of your new project....
Misc Toolbox | Research Tips
能力 文献 每天读至少一篇文章保持感觉 多读文献、整理框架,脉络要清晰,要思考和自己工作的联系 定期补知识,有事没事都偶尔看一些,什么感兴趣的都行;我还有很多不会的,不看就永远不会了 但不要沉迷于吸收知识 代码 多看别人的代码,我老是不喜欢看别人的代码和benchmark ...
Misc Toolbox | My Website
这东西懒得花精力,随便记录,就写中文得了 Jekyll 读作”街口” 教程 【转载 - Jekyll - 静态网站生成器教程双语字幕】 本地部署 部署 这样就默认是127.0.0.1:4000,然后只能本地访问 bundle exec jekyll serve --livereload 如果要让局域网里的其他设备都能访问,那就: bundle exec jekyll...
Misc Toolbox | MacOS
Desktop Wallpaper: A Seascape, Shipping by Moonlight - Monet Recommended Tools Hidden Bar Magnet Window Arrangement Control + Option + Enter: Maximize Control + Option + Backspace: Re...
Code Utils | PyTorch Toolbox
Nets Linear / MLP PyTorch Document - Linear Initialization Parameters in_features out_features bias=Ture input.shape: (*, in_features) output.shape: (*, out_...
Code Utils | Python Toolbox
This post was completed with the assistance of ChatGPT-4. Inheritance Inheritance allows a class (known as a child class) to inherit attributes and methods from another class (known as a pare...
Reinforcement Learning | Stable Baseline 3
Getting Started Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. — Stable Baseline3 Docs. Resources [Stable Baseline3 Docs] [...
Multi-Agent Reinforcement Learning | Overcooked: A MARL Task
A Brief Intro This MARL environment remade the Overcooked game on Steam. Some features (game rules) are cut to simplify the situation. This game requires two agents to coordinate to cook. If they...
Misc Toolbox | Tools of Visual Studio Code
This note will be consistently updated. Shortcuts Command + k Command + s: Keyboard Shortcuts GOTO Command + P: Go to file Command + Shift + O: Go to symbol in editor Command + T: Go ...
Misc Toolbox | Interesting Facts
This note will be consistently updated. Society The Rules for Rulers | CGP Grey (YouTube) Animals Octopus vs Underwater Maze | Mark Rober (YouTube) or 【Mark Rober 实验【中文配音】 | 章鱼到底有多聪明?...
Reinforcement Learning | Policy Distillation
Introduction [Paper]: Policy Distillation The following statements from the paper are key to understand this technique: Distillation is a method to transfer knowledge from a teacher model $T$ t...
Machine Learning Basics | HyperNetworks
Introduction [Paper]: HyperNetworks The following part has not been finished yet. Application in QMIX Illustration from the corresponding paper. The following statements from the paper are...
Machine Learning Basics | Decision Transformers
Decision Transformer Paper: Decision Transformer: Reinforcement Learning via Sequence Modeling - NeurIPS 2021 [Website] [Code] Illustration from the corresponding paper. Illustration fro...
Mathematics | Set
This note will be consistently updated. Related fields: Real Analysis, General Topology, Geometry. Supremum & Infimum The supremum of a nonempty set $X \subset \mathbb{R}$ is the smalle...
Mathematics | Convergence Analysis of Gradient Descent
The following part has not been finished yet. Gradient Descent The goal We want to solve this unconstrained minimization problem [\min _x f(x) \quad \text { s.t. } \quad x \in \mathbb{R}^n ...
Mathematics | Contraction Mapping Theorem
Metric Space Definition of metric space Definition. A metric space is an ordered pair $(M, d)$ where $M$ is a set and $d$ is a metric on $M$, i.e., a function $d: M\times M \to \mathbb{R}$ sa...
Mathematics | A Note on Stochastic Processes
This note partially uses the materials from the notes of MATH2750. Transition Matrix The transition kernel $\mathbf{M}$ is a square matrix of size $\vert S\vert \times \vert S\vert$. $\m...
Multi-Agent Reinforcement Learning | Sequential Social Dilemma
What is Social Dilemma? Definition A social dilemma refers to a situation in which individual actions that seem to be rational and in self-interest can lead to collective outcomes that are undesi...
Economics & Game Theory | Zero-Determinant Strategy
This note aims to summarize the essence of this paper: Press, William H., and Freeman J. Dyson. “Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent.” Proceed...
Economics & Game Theory | Classic Games
This note will be consistently updated. Prisoner’s Dilemma Two members of a criminal organization are arrested and imprisoned. Each prisoner is in solitary confinement with no means of com...
Economics & Game Theory | A Memo on Game Theory
This note will be consistently updated. Rationality A rational player is one who chooses his action, to maximize his payoff consistent with his beliefs about what is going on in the game.1 ...
Multi-Agent Reinforcement Learning | Fictitious Self-Play and Zero-Shot Coordination
Fictitious Play Fictitious play is a learning rule. In it, each player presumes that the opponents are playing stationary (possibly mixed) strategies. At each round, each player thus best ...
Reinforcement Learning | Policy Gradient Details
The only way to make sense out of change is to plunge into it, move with it, and join the dance. — Alan Watts. Bellman Equations [V(s_t) = \mathbb{E}\left[ r_t + \gamma\cdot V(s_{t+1}) \ri...
Machine Learning Basics | Sequence-to-Sequence Models
NLP Terms NLP = Natural Language Processing Embedding In a general sense, “embedding” refers to the process of representing one kind of object or data in another space or format. It involves m...
Multi-Agent Reinforcement Learning | MARL Basics
This note has not been finished yet. One may check my writing schedule. Markov Models MDP Markov decision process $(S, A, \mathcal{P}, R, \gamma)$ Single-agent, full...
Code Utils | Computation Graph Visualization
PyTorchviz Basics Install brew install graphviz (or here) pip install torchviz Documentation: Github Official examples: Colab If a node represents a backward fu...
Mathematics | Dynamic Epistemic Logic
Three logicians walk into a bar. The bartender asks: “Do you all want a drink?” The first logician says: “I don’t know.” The second logician says: “I don’t know.” The third logician says: “Yes.”...
Multi-Agent Reinforcement Learning | Theory of Mind and Markov Models
We do not see things as they are, we see them as we are. — Anaïs Nin. What is Theory of Mind? In psychology, theory of mind refers to the capacity to understand other people by ascribing me...
Mathematics | Theoretical Computer Science (TCS)
This note will be consistently updated. What is TCS? (Wikipedia) Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspect...
Mathematics | Principal Component Analysis
很久以前的笔记 介绍 PCA is a widely used dimensionality reduction technique that projects high-dimensional data into a lower-dimensional space, while retaining as much of the data’s variance as possib...
Multi-Agent Reinforcement Learning | MARL Tasks
This note will be consistently updated. List StarCraft II SMAC (StarCraft Multi-Agent Challenge). SMAC is WhiRL’s environment for research in the field of collaborative multi-age...
Reinforcement Learning | RL Toolbox
This note will be consistently updated. PPO Tricks There are a total of 37 tricks, among which 13 are relatively core. PPO paper The 37 Implementation Details of Proximal Polic...
Code Utils | Misc Code Toolbox
This note will be consistently updated. Tmux 太久没连服务器连这个怎么用都快忘了…不要想太复杂的操作,我用这个的原因就只有两个,第一个原因是用这个在服务器上运行python文件后,我再断开服务器的连接,这个还能在后台跑;第二个原因是,可以只用ssh连服务器一次就可以用tmux来用多个shell,比如同时跑两个python文件,这个应噶...
Misc Toolbox | Paper Toolbox
This note will be consistently updated. Frequently Referenced Papers Classic RL milestones Atari Go Poker video games bioinformatics economics MARL Expressions Cool The c...
Mathematics | Math Toolbox
This note will be consistently updated. Optimization Basics The standard form for an optimization problem (the primal problem) is the following: [\begin{aligned} &\min\limits_{x} \quad...
Misc Toolbox | English Toolbox
This note will be consistently updated. 5 Principles and 7 Actions This part is summarized from this talk. Principles 1: Focus on langurage content that is relevant to you Informat...
Robotics | Swinging Search and Crawling Control
Please be aware that the video accompanying this article may take some time to load, depending on the speed of your internet connection to GitHub. A snake-inspired path planning algorithm base...
Robotics | RHex-T3: A Mobile Robot, with Hybrid Leg Design
Please be aware that the videos accompanying this article may take some time to load, depending on the speed of your internet connection to GitHub. Innovative design and simulation of a transf...
Misc Toolbox | Markdown Syntax
Adapted from this post 文字样式 强制不翻译 <span translate="no"></span> 下划线 这是<u>带下划线的文本</u> 折叠 点击展开/收起 这里是折叠起来的内容。 删除 删除 ~~删除~~ Spoiler Prevention Have a goo...