MARL Tasks
This note will be consistently updated.
List
- StarCraft II
- SMAC (StarCraft Multi-Agent Challenge). SMAC is WhiRL’s environment for research in the field of collaborative multi-agent reinforcement learning (MARL) based on Blizzard’s StarCraft II RTS game.
- PySC2 (StarCraft II Learning Environment). PySC2 is DeepMind’s Python component of the StarCraft II Learning Environment (SC2LE).
- gym-starcraft. An OpenAI Gym interface to StarCraft.
- MAagent
- persuit: predators pursuit preys
- gather: agents rush to gather food
- battle: battle between two armies
- arrange: arrange agents into some characters
- A demo video
- Overcooked
- MiniRTS
- Cards & Chess
- Communication
- Melting Pot (Deep Mind)
- Sequential Social Dilemma (SSD)
- Harvest
- Clean Up
- Level-Based Foraging
Overcooked
This environment is designed to test the human-ai coordination, or used as a zero-shot coordination task.
Resources
Hanabi
Objective
Players work collaboratively to put on a firework show by placing a series of cards in the correct order. The objective is to complete a series of five cards of the same color in ascending numerical order (1 to 5) for each color.
- Cards Facing Outwards: One distinctive rule of Hanabi is that players hold their cards facing outwards, meaning they can see everyone else’s cards but their own. This facilitates a cooperative environment where players rely on each other’s hints to figure out their cards.
- Communication: Communication is restricted to the formal hint-giving process to maintain the game’s difficulty and collaborative spirit.
Components
- Deck: Consists of 50 cards, distributed into 5 different colors (red, yellow, green, blue, white), with each color having numbers from 1 to 5 (1s x3, 2s x2, 3s x2, 4s x2, and 5s x1).
- Hint tokens: 8 in total, used to give hints to other players.
- Fuse tokens: 3 in total, representing the players’ “lives”.
Setup
- Number of players: 2-5 players.
- Hand size: Depending on the number of players, each player starts with a hand of cards (5 cards for 2-3 players, 4 cards for 4-5 players).
- Initial layout: Players hold their cards facing outwards so they can’t see their cards but can see others’.
Gameplay
Players take turns clockwise and can choose to perform one of the following actions on their turn:
Give a hint: A player may give a hint to another player about the contents of their hand. The hint must be about either the color or the number of the cards, not both. It consumes a hint token. It must relate to at least one card in the other player’s hand, and it must give information about all the cards of the chosen characteristic.
Discard a card: A player may choose to discard a card from their hand to regain a hint token. The discarded card is placed in the discard pile and the player draws a new card from the draw pile.
Play a card: A player may choose to play a card from their hand onto the table. If the card successfully fits into a fireworks display (i.e., it is the next number in sequence of the same color), it stays on the table. Otherwise, it goes to the discard pile and a fuse token is removed.
Game Ending Conditions
The game can end in the following ways:
- Successful Completion: Players successfully play all five cards in each color.
- Fuse Tokens Exhausted: Players make three mistakes and lose all fuse tokens.
- Deck Exhaustion: The draw pile is exhausted. Players get one final round before the game ends.
Scoring
At the end of the game, the score is calculated based on the number of cards successfully played in the fireworks displays. The highest possible score is 25 (if all fireworks are successfully completed). The score is determined by summing the highest value of cards played for each color.
Strategy and Tips
- Memory and Deduction: Players must rely heavily on their memory and deduction skills to remember the hints given and to figure out which cards they hold.
- Non-verbal cues: Players are not allowed to give extra hints through non-verbal cues or suggestive comments. All hints must be given using hint tokens officially in one’s turn.
- Hint Efficiency: Given the limited number of hint tokens, players should strive to give hints that convey the maximum amount of information.