A Memo on Game Theory
This note will be consistently updated.
Rationality
A rational player is one who chooses his action, to maximize his payoff consistent with his beliefs about what is going on in the game.1
- “self-interested” refers to the agent being concerned only about its own expected payoff, and
- “rational” means that when it believes one action’s payoff is greater than another’s, the agent will choose the action with higher expected payoff.
Bounded rationality is the idea that rationality is limited when individuals make decisions, and under these limitations, rational individuals will select a decision that is satisfactory rather than optimal.2
Normal-form game
Definition 3.31 A normal-form game includes three components as follows:
- A finite set of players, $N = \set{1,2,\ldots,n}$.
- A collection of sets of pure strategies, $\set{S_1, S_2, \ldots, s_n}$.
- A set of payoff functions, $\set{v_1,v_2,\ldots, v_n}$, each assigning a payoff value to each combination of chosen strategies, that is, a set of functions $v_i: S_1\times S_2, \times \dots \times S_n \to \mathbb{R}$ for each $i\in N$.
Each of the players $i\in N$ must simultaneously choose a possible strategy $s_i\in S_i$.1
In my understanding, it’s like $(I,\set{A^i}_{i\in I}, \set{R^i}_{i\in I})$ in the notation of MARL.
Game Tree
Definition 3.13 A tree is a set of nodes and directed edges connecting these nodes such that
- there is an initial node, for which there is no incoming edge;
- for every other node, there is exactly one incoming edge;
- for any two nodes, there is a unique path that connect these two nodes.
And:
- The edges are actions.
- Each non-terminal node has been defined with who is to perform the action at this moment.
- Payoffs for each player are defined at each terminal node.
Information Set
Definition 3.43 An information set is a collection of nodes such that
- the same player $i$ is to move at each of these nodes;
- the same moves are available at each of these nodes.
In my understanding, the nodes in an information set share the same parent node, and they are at the same depth of the tree. The player taking actions at the current depth cannot figure out which node it is at. It cannot see the opponent’s previous move.
Definition 3.53 An information partition is an allocation of each non-terminal node of the tree to an information set; the starting node must be “alone”.
In game theory, an information set represents the information available to a player at a particular point in the game. Specifically, it denotes every decision point the player could be at, given what they know so far. An information set is used in games of imperfect information, where players do not have complete information about the actions previously taken (for example, in card games where some cards are hidden).
Here’s a more formal definition:
An information set for a player in a game is a set of decision points such that:
- The player must make a decision at each point in the set.
- When the play of the game reaches an information set for a player, that player must choose an action without knowing which particular decision point within the set they are at.
This means that if two decision points are in the same information set, the player must have the same set of actions available at both, and cannot distinguish between these decision points based on past play.
In simpler terms, an information set groups together all the situations a player might be in which are indistinguishable from each other from that player’s perspective. This concept is crucial in designing strategies for games of imperfect information.
— Generated by ChatGPT-4
Extensive-Form Games
Often credit this paper:
H. W. Kuhn. Extensive games and the problem of information. Contributions to the Theory of Games, 2:193–216, 1953.
The extensive-form representation of a game contains all the information about the game explicitly, by defining who moves when, what each player knows when he moves, what moves are available to him, and where each move leads to, etc. This is done by use of a game tree and information sets–as well as more basic information such as players and the payoffs.3
Definition 3.33 (Extensive form) A Game consists of
- a set of players,
- a tree,
- an allocation of non-terminal nodes of the tree to the players,
- an informational partition of the non-terminal nodes, and
- payoffs for each player at each terminal node.
In my understanding, the nodes in extensive-form games are not equivalent to the states in MDP. Because the nodes are not Markovian.
I’m so confused. What should the corresponding quantity of “state” be? This could be a gap between the MARL and Game Theory frameworks.
Information: Complete v.s. Perfect
I’ve looked up these two concepts so many times already, but every time I read about them, I forget shortly afterward. 😅
Complete Information
A game of complete information requires that the following four components be common knowledge among all the players of the game:1
- all the possible actions of all the players,
- all the possible outcomes,
- how each combination of actions of all players affects which outcome will materialize, and
- the preferences of each and every player over outcomes.
Preferences describe how the player ranks the set of possible outcomes, from most desired to least desired.1
In my understanding, a game is with complete information if the utility functions (with their domain) are common knowledge. Generally, rationality is a fundamental assumption, so the fourth point in the definition is usually satisfied.
Each player has full information about others. Common knowledge includes4
- utility functions (including risk aversion),
- payoffs,
- strategies, and
- “types” of players (“private” information).
I’m not sure what strategies being common knowledge looks like. And it reads that Chess is with incomplete information because of this. I think there is a conflict with the definition in the book.
But4
- players may not know the others’ actions (e.g. the initial placement of ships in Battleship), and
- the game may has chance element (card games).
Perfect Information
Definition 7.31 A game of complete information in which every information set is a singleton and there are no moves of Nature is called a game of perfect information.
Each player is perfectly informed of all the events that have previously occurred.5 There is no hidden information.
Examples4 5
- Perfect and complete: Chess, Tic-Tac-Toe, Go.
- Perfect but incomplete: Bayesian game.
- Complte but imperfect: Card games, where each player’s cards are hidden from other players but objectives are known. The dealing event is not public (imperfect).
Social Choice Function
It aims to align the interests of all agents.
- Utilitarian function
- Egalitarian function
Price of Anarchy v.s. Price of Stability
- PoA = Price of Anarchy
- Anarchy = 无政府主义、混乱
- 如果不要中心化的authority来组织,那要付出多大的代价(相比性能下降多少)(我的理解)
- notation
- game $G = (N,S,u)$
- 玩家有$N$个,单个某个玩家记为$i$
- 单个某个玩家的策略集合为$S_i$(这里只假定做的策略是确定性的,只选择某一个动作)
- 单个某个玩家的收益为$u_i:S\to \mathbb{R}$;其中$S=S_1\times \ldots\times S_N$
- PoA衡量“玩家的自私行为”恶化“系统表现”的“效率”
- “系统”指的是所有玩家构成的那个抽象的整体(我的理解)
- “系统表现”根据我们的目的来定义,记为$\text{Welf}:S\to\mathbb{R}$,比如
- social welfare(utilitarian objective):$\text{Welf}=\sum\limits_{i\in N}u_i(s)$;所有玩家收益的和
- fairness(egalitarian objective):$\text{Welf}=\min\limits_{i\in N}u_i(s)$
- 系统表现是我们想控制得到的最大化的目标
- 如果是想最小化某个目标,那就应该是$\text{Cost}$
- 社会困境:个体只注重自己收益期望的优化不一定能让系统表现优化
- 如果有中心化的authority:把所有人作为一个抽象的整体来分析,可以让某些人牺牲之类的,这样比较好分析得到一个$\text{Welf}$很高的$s_{\text{centralized}}$(或多个)
- 如果没有中心化的authority,每个人只优化自己,最终的结果会到达equilibrium,也就是某个$s_{\text{equilibrium}}$(比如常见的Nash equilibria)。equilibria集合记为$\text{Equil}$。
- $\text{PoA} = \frac{\max\limits_{s\in S}\text{Welf}(s)} {\min\limits_{s\in \text{Equil}}\text{Welf}(s)}$,分母是fully decentralized设置下的最坏情况的收益。(也有分子分母反过来定义的说法)
- $\text{PoA} = \frac{\max\limits_{s\in \text{Equil}}\text{Cost}(s)} {\min\limits_{s\in S}\text{Cost}(s)}$
- 另一个概念是PoS,the Price of Stability,
- $\text{PoS} = \frac{\max\limits_{s\in S}\text{Welf}(s)} {\max\limits_{s\in \text{Equil}}\text{Welf}(s)}$,分子和PoA一样,分母是fully decentralized设置下的最好情况的收益。
- $\text{PoS} = \frac{\min\limits_{s\in \text{Equil}}\text{Cost}(s)} {\min\limits_{s\in S}\text{Cost}(s)}$
- 根据定义,$\text{PoA}\ge\text{PoS}\ge1$,wiki里是这么说的,是负数我不知道咋办,比如:分母<0<分子<-分母,这样加绝对值也不对。
- 例子:Prisoner’s Dilemma
- 希望最大化的东西是social walfare,即,我们想优化$\text{Welf}=u_1(s_1,s_2)+u_2(s_1,s_2)$
- 如果有中心化的authority,则最优的情况肯定是$(s_1=\text{Cooperate},s_2=\text{Cooperate})$,此时$\text{Welf}=2b$。
- 如果没有中心化的authority,每个人只优化自己,那么此时对于两个玩家来说,$\text{Defect}$都是占优策略(别人选择合作,那我背叛收益高;别人选择背叛,那我背叛收益高;不论别人怎么选择,我选背叛收益高)。会收敛到$(s_1=\text{Defect},s_2=\text{Defect})$的Nash equilibrium。$\text{Welf}$相比有中心化的情况,降低了。
- $\text{PoA} = \frac{2b}{2c}=\frac{b}{c}$,其跟表中的数值有关。
Shapley Value
In my understanding, the Shapley value reflects how significant an individual’s effective contribution is to the total value of a certain coupling in a group. A real-world application is to determine how to distribute money: those who work more and contribute more effectively to the group should receive more money. It is a fair distribution method.
Given a cooperative game with a set $N$ of players and a value function $v: 2^N \to \mathbb{R}$, which assigns a value to each coalition of players, the Shapley value of player $i$, denoted as $\phi_i(v)$, is defined as:
\[\begin{aligned} \phi_i(v) &= \mathbb{E}_{s\sim \mathcal{P}(N\setminus \{i\})}\left[v(S \cup \{i\}) - v(S) \right] \\ &= \sum_{S \subseteq N \setminus \{i\}} \frac{\vert S\vert! (\vert N\vert - \vert S\vert - 1)!}{\vert N\vert!} \left[v(S \cup \{i\}) - v(S)\right] \end{aligned}\]Where:
- $N$ is the set of all players.
- $S$ is a subset of players not including player $i$.
- $\vert S\vert$ is the number of players in subset $S$.
- $\vert N\vert$ is the total number of players.
- $v(S)$ is the value function, i.e., the value of coalition $S$.
- The term $\left[v(S \cup {i}) - v(S)\right]$ represents the marginal contribution of player $i$ to coalition $S$.
- $\mathcal{P}(N \setminus {i})$ is the power set of $N \setminus {i}$, which includes all possible subsets of it, ranging from the empty set to $N \setminus {i}$ itself. And $S \sim \mathcal{P}(N \setminus {i})$ means that the subset $S$ is uniformly randomly chosen from $\mathcal{P}(N \setminus {i})$.
In words, the Shapley value of player $i$ is the average of its marginal contributions over all possible coalitions.
Regret
Potential Game
Backward Induction
Forward Induction
Solution Concept
Nash equilibrium
Correlated equilibrium
Coarse correlated equilibrium
Subgame perfect equilibrium
Perfect Bayesian equilibrium
Efficiency
Perhaps the central theme of economic theory, efficiency is concerned with the optimal allocation of scarce resources. Efficiency is achieved when some specific criterion is maximized and no allocation of resources could yield a higher value according to that criterion. Usually, efficiency measures, such as Pareto optimality and Hicks optimality, attempt to maximize either individual or communal payoffs.
— From this webpage.
Hicks Optimal
Named after John Hicks, Hicks optimality is a measure of efficiency. An outcome of a game is Hicks optimal if there is no other outcome that results in greater total payoffs for the players. Thus, a Hicks optimal outcome is always the point at which total payoffs across all players is maximized. A Hicks optimal outcome is always Pareto optimal. — From this webpage.
Pareto Optimal
Named after Vilfredo Pareto, Pareto optimality is a measure of efficiency. An outcome of a game is Pareto optimal if there is no other outcome that makes every player at least as well off and at least one player strictly better off. That is, a Pareto Optimal outcome cannot be improved upon without hurting at least one player. Often, a Nash Equilibrium is not Pareto Optimal implying that the players’ payoffs can all be increased.
— From this webpage.
Risk
In game theory, risk refers to the uncertainty about the outcomes of a decision. It’s the possibility that the actual outcome of a decision or an action might differ from the expected outcome. This uncertainty can arise from a variety of sources, such as incomplete information, unpredictable behavior of other players, or random events. — ChatGPT-4.
Risk-Averse Players
These players prefer a certain outcome over a gamble with a higher expected value. They tend to avoid risks and might settle for a lower but guaranteed benefit.
Risk-Seeking Players
In contrast, risk-seeking players prefer taking risks even when the expected value of the gamble is less than the certain outcome. They are attracted to the potential of higher gains.
Risk-Neutral Players
These players are indifferent to risk. They make decisions based solely on expected values, without regard to the variability of outcomes.
The canonical RL agents are risk-neutral.
Von Neumann–Morgenstern Utility Theorem
References
Disclaimer: The above content is summarized from Wikipedia and other sources. Corresponding links or references have been provided.