Implementation of Multi-Agent Exploitation Models with Social Learning

This repository includes four models based on the copan:EXPLOIT model, that are aiming to develop more complex learning processes. The models simulate agent behavior in a resource harvesting scenario. The agents' decision processes vary across the models, incorporating probabilistic imitation, Q-learning, and more sophisticated reinforcement learning techniques. The original Python/Cython implementation can be found here: https://github.com/wbarfuss/cyexploit.

Two types of codes are given, one with explicit modules for the different parts of the model (e.g. resource pools, agents, trainer, etc.), and another that aims at improving the performance.

Instructions

For development:
1. Clone the project repository: git clone [email protected]:DezrannCAS/Exploit_RL.git && cd Exploit_RL.
2. Create the virtual environment: hatch env create; optionally you can specify the python version using: HATCH_PYTHON=python3.12 hatch env create. Note: for Python 3.13 support, use the numba 0.61.0rc1 pre-release; for Intel GPUs support, use the latest release of PyTorch (PyTorch 2.5).
3. Activate the virtual environment: hatch env shell.
For installation:
1. Clone the project repository: git clone [email protected]:DezrannCAS/Exploit_RL.git && cd Exploit_RL.
2. Build the package: pip install build && python -m build.
3. Install the package: pip install dist/Exploit_RL-<version>-py3-none-any.whl.

Models

1. Probabilistic Imitation Model

This model is simply a restructuring of the original COPAN model without rewiring processes, and using Numba's Just-in-Time compilation for specific functions, intead of Cython. Agents imitate their neighbors' harvesting decisions based on observed differences in harvest outcomes.

2. Q-Learning Imitation Model

Building on M-1, this model integrates Q-learning into the agents' decision-making processes. Agents take the majority action of their neighborhood as the state, choose to imitate or not as the action, and receive rewards based on the average difference in harvest.

3. Observation-Augmented Exploration Model

M-3 shifts focus from neighborhood actions to individual stock levels. Agents use their current stock levels as the state and decide on the effort level for harvesting as the action. This model combines regular RL algorithms for exploitation policies with neighbor imitation for exploration.

4. Observational Learning Model

The M-4 model also uses agents' current stock levels as the state and effort levels as the action. However, it employs a deep learning framework combining LSTM networks, actor-critic methods, and predictors. Importantly, in this model, all agents share a single resource pool, unlike other models in which agents learn to harvest private resources.

The implementation is inspired by those from CleanRL: ppo_continuous_action and ppo_atari_lstm.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
standalones		standalones
tests		tests
README.md		README.md
noxfile.py		noxfile.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of Multi-Agent Exploitation Models with Social Learning

Instructions

Models

1. Probabilistic Imitation Model

2. Q-Learning Imitation Model

3. Observation-Augmented Exploration Model

4. Observational Learning Model

About

Releases

Packages

Languages

DezrannCAS/Exploit-MARL

Folders and files

Latest commit

History

Repository files navigation

Implementation of Multi-Agent Exploitation Models with Social Learning

Instructions

Models

1. Probabilistic Imitation Model

2. Q-Learning Imitation Model

3. Observation-Augmented Exploration Model

4. Observational Learning Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages