Technology
Economics & Game Theory | Information Design in 10 Minutes
A brief introduction from the perspective of BCE (Bayes correlated equilibrium), with some examples.
Economics & Game Theory | Information Design
A sender with informational advantage wants to send messages to steer a receiver's action policy. They may have different objects. Information design is to optimize the sender's signaling schemes.
Code Utils | Llama Memo
Sources Ollama. It is easy to use. Hugging Face. It provides tokenizer in Python API. Ollama Installation Download the app on the website. Install the app. Run ollama run llama3 in...
Multi-Agent Reinforcement Learning | PSRO: Policy-Space Response Oracles
Lanctot, Marc, et al. "A unified game-theoretic approach to multiagent reinforcement learning." Advances in neural information processing systems 30 (2017).
Machine Learning Basics | Prompt Optimization
The following part has not been finished yet. Verbalized Machine Learning Verbalized Machine Learning: Revisiting Machine Learning with Language Models 两个大模型 一个大模型用来执行任务 一个大模型用来输出前一个大模...
Machine Learning Basics | Scaling Laws & GPT-3
Machine Learning Basics | GPT-1 & GPT-2
The following part has not been finished yet. GPT-1 Resources Radford, Alec, et al. (OpenAI) “Improving language understanding by generative pre-training.” (2018). GPT-1 = Decoder (in Tran...
Machine Learning Basics | Transformer
Encoder-Decoder Variable-length inputs Truncation and Padding Dive Into Deep Learning 10.5.3 Relation Network A blog ICLR 2017 Embedding. Enco...
Machine Learning Basics | A Quick Guide to LLMs
LLM = Large Language Model
Economics & Game Theory | Bargaining
An Extensive-Form Game Model Ultimatum Two people use the following procedure to split \(1\) object: Player 1 offers Player 2 some amount \(0 \leq x \leq 1\) If Player 2 accepts the outcome ...
Economics & Game Theory | Extensive-Form Games and Subgame Perfect Equilibrium
Resources This post uses the material from the following works: MIT 6.254 2010: Game Theory With Engineering Applications MIT 14.12 2012: Economic Applications Of Game Theory Tadelis, Steve...
Math Miscs | Linear Algebra
Learning resources Linear Algebra Done Right (Book). Videos by Gilbert Strang (reposted on bilibili). The Matrix Cookbook (Book). Inverse A square matrix $A$ is invertible = $A$ has...
Math Miscs | Mathematica Memos
听说python里用sympy也能做一些推导和化简,之后去看看;mathematica占硬盘太多地了 基础 $\epsilon$这种输入是Epsilon,首字母大写 Enter是换行,Shift + Enter是执行 区分大小写,大小写不同的量是两个量 函数调用,参数用中括号框起来 表达式结尾加分号;能让这个表达式的结果不输出 *是逐元素乘法,句号.是线性代数的...
Misc Toolbox | Building My Own PC
资源 装机:【【装机教程】全网最好的装机教程,没有之一】 兼容性:【【收藏血赚】DIY电脑前必须要知道的事!手把手教你检查电脑装机配置单中各硬件兼容性问题!DIY电脑中各硬件兼容性检查指南!新手小白装机前必读!】 装系统:【【装机教程】超详细WIN10系统安装教程,官方ISO直装与PE两种方法教程,UEFI+GUID分区与Legacy+MBR分区】 CPU...
Reinforcement Learning | TRPO Details
The origin paper: Schulman, John, et al. “Trust region policy optimization.” International conference on machine learning. PMLR, 2015. Overview This derivation comes from the Appendix A.1 of ...
Multi-Agent Reinforcement Learning | MetaGrad in LIO
Yang, Jiachen, et al. “Adaptive incentive design with multi-agent meta-gradient reinforcement learning.” arXiv preprint arXiv:2112.10859 (2021). Jiachen学长的LIO的后续,我们叫LIO2,或者是Adaptive LIO,他们这个叫ince...
Economics & Game Theory | Fairness Versus Reason in the Ultimatum Game
The Game The experimenter assigns a certain sum, and the Proposer can offer a share of it to the Responder. If the Responder (who knows the sum) accepts, the sum is split accordingly between the...
Economics & Game Theory | Evolutionary Game Theory
Basic Symmetric Model with Stochastic Strategies This section is a summary of Chapter 29 of the book “Algorithmic Game Theory”1. Agents (organisms) The number of agents is infini...
Code Utils | Code Visualization
Function Call Graph Not working: pyan3 pycallgraph pycallgraph2 Inheritance Visualization Example 1: See my blog. Example 2: pyreverse -o png -p outputed_diagram main.py Agent.p...
Code Utils | LyPythonToolbox
Resources Github Repo My Full Code Toolbox Install Install: pip install LyPythonToolbox Update: pip install --upgrade LyPythonToolbox Print Tricks lyprint_separator from LyPythonToolb...
Code Utils | Github Memo
Create a Repo Click the green button New on the GitHub repo website. Do not check the Add a README file. Copy the link with the .git extension. Create a directory locally and enter it in a...
Code Utils | Python Project Template
How to Use Download LyPythonProjectTemplate Decompress it. Create a new Github project. See my blog. Copy the contents of LyPythonProjectTemplate into the root folder of your new project....
Misc Toolbox | MacOS Workspace
Desktop Wallpaper: A Seascape, Shipping by Moonlight - Monet False Knees - Joshua Basic Tools Hidden Bar Wins and Magnet Window Arrangement Wins: https://wins.cool/html/index.html Reco...
Code Utils | PyTorch Toolbox
Nets Linear / MLP PyTorch Document - Linear Initialization Parameters in_features out_features bias=Ture input.shape: (*, in_features) output.shape: (*, out_...
Code Utils | Python Toolbox
This post was completed with the assistance of ChatGPT-4. Inheritance Inheritance allows a class (known as a child class) to inherit attributes and methods from another class (known as a parent ...
Reinforcement Learning | Stable Baseline 3
Getting Started Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. — Stable Baseline3 Docs. Resources [Stable Baseline3 Docs] [...
Multi-Agent Reinforcement Learning | Overcooked: A MARL Task
A Brief Intro This MARL environment remade the Overcooked game on Steam. Some features (game rules) are cut to simplify the situation. This game requires two agents to coordinate to cook. If they...
Misc Toolbox | Tools of Visual Studio Code
This note will be consistently updated. Shortcuts Command + k Command + s: Keyboard Shortcuts GOTO Command + P: Go to file Command + Shift + O: Go to symbol in editor Command + T: Go to ...
Reinforcement Learning | Policy Distillation
Introduction [Paper]: Policy Distillation The following statements from the paper are key to understand this technique: Distillation is a method to transfer knowledge from a teacher model $T$ t...
Machine Learning Basics | HyperNetworks
Introduction [Paper]: HyperNetworks The following part has not been finished yet. Application in QMIX Illustration from the corresponding paper. The following statements from the paper are...
Machine Learning Basics | Decision Transformers
Decision Transformer Paper: Decision Transformer: Reinforcement Learning via Sequence Modeling - NeurIPS 2021 [Website] [Code] Illustration from the corresponding paper. Illustration fro...
Math Miscs | Set
This note will be consistently updated. Related fields: Real Analysis, General Topology, Geometry. Supremum & Infimum The supremum of a nonempty set $X \subset \mathbb{R}$ is the smallest ...
Math Miscs | Convergence Analysis of Gradient Descent
The following part has not been finished yet. Gradient Descent The goal We want to solve this unconstrained minimization problem [\min _x f(x) \quad \text { s.t. } \quad x \in \mathbb{R}^n .]
Math Miscs | Contraction Mapping Theorem
Metric Space Definition of metric space Definition. A metric space is an ordered pair $(M, d)$ where $M$ is a set and $d$ is a metric on $M$, i.e., a function $d: M\times M \to \mathbb{R}$ sa...
Math Miscs | A Note on Stochastic Processes
This note partially uses the materials from the notes of MATH2750. Transition Matrix The transition kernel $\mathbf{M}$ is a square matrix of size $\vert S\vert \times \vert S\vert$. $\math...
Multi-Agent Reinforcement Learning | Sequential Social Dilemma
What is Social Dilemma? Definition A social dilemma refers to a situation in which individual actions that seem to be rational and in self-interest can lead to collective outcomes that are undesi...
Economics & Game Theory | Zero-Determinant Strategy
This note aims to explain the parts omitted in this paper: Press, William H., and Freeman J. Dyson. “Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent.” Procee...
Economics & Game Theory | Classic Games
This note will be consistently updated. Prisoner’s Dilemma Two members of a criminal organization are arrested and imprisoned. Each prisoner is in solitary confinement with no means of commun...
Economics & Game Theory | A Memo on Game Theory
This note will be consistently updated. Rationality A rational player is one who chooses his action, to maximize his payoff consistent with his beliefs about what is going on in the game.1 ...
Multi-Agent Reinforcement Learning | Fictitious Self-Play and Zero-Shot Coordination
Fictitious Play Fictitious play is a learning rule. In it, each player presumes that the opponents are playing stationary (possibly mixed) strategies. At each round, each player thus best ...
Reinforcement Learning | Policy Gradient Details
The only way to make sense out of change is to plunge into it, move with it, and join the dance. — Alan Watts. Bellman Equations [V(s_t) = \mathbb{E}\left[ r_t + \gamma\cdot V(s_{t+1}) \right...
Machine Learning Basics | RNNs
NLP Terms NLP = Natural Language Processing Embedding In a general sense, “embedding” refers to the process of representing one kind of object or data in another space or format. It involves m...
Multi-Agent Reinforcement Learning | MARL Basics
This note has not been finished yet. Markov Models MDP Markov decision process $(S, A, \mathcal{P}, R, \gamma)$ Single-agent, fully observable, and dynamic. P...
Code Utils | Computation Graph Visualization
PyTorchviz Basics Install brew install graphviz (or here) pip install torchviz Documentation: Github Official examples: Colab If a node represents a backward fu...
Math Miscs | Dynamic Epistemic Logic
Three logicians walk into a bar. The bartender asks: “Do you all want a drink?” The first logician says: “I don’t know.” The second logician says: “I don’t know.” The third logician says: “Yes.” ...
Multi-Agent Reinforcement Learning | Theory of Mind and Markov Models
We do not see things as they are, we see them as we are. — Anaïs Nin. What is Theory of Mind? In psychology, theory of mind refers to the capacity to understand other people by ascribing menta...
Math Miscs | Theoretical Computer Science (TCS)
This note will be consistently updated. What is TCS? (Wikipedia) Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects o...
Math Miscs | Principal Component Analysis
很久以前的笔记 介绍 PCA is a widely used dimensionality reduction technique that projects high-dimensional data into a lower-dimensional space, while retaining as much of the data’s variance as possible....
Multi-Agent Reinforcement Learning | MARL Tasks
This note will be consistently updated. List StarCraft II SMAC (StarCraft Multi-Agent Challenge). SMAC is WhiRL’s environment for research in the field of collaborative multi-agent ...
Reinforcement Learning | RL Toolbox
This note will be consistently updated. PPO Tricks There are a total of 37 tricks, among which 13 are relatively core. PPO paper The 37 Implementation Details of Proximal Policy O...
Code Utils | Misc Code Toolbox
This note will be consistently updated. Tmux 太久没连服务器连这个怎么用都快忘了…不要想太复杂的操作,我用这个的原因就只有两个,第一个原因是用这个在服务器上运行python文件后,我再断开服务器的连接,这个还能在后台跑;第二个原因是,可以只用ssh连服务器一次就可以用tmux来用多个shell,比如同时跑两个python文件,这个应噶就是终...
Math Miscs | Math Toolbox
This note will be consistently updated. Optimization Basics The standard form for an optimization problem (the primal problem) is the following: [\begin{aligned} &\min\limits_{x} \quad f_...