Skip to content

DezrannCAS/Exploit-MARL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implementation of Multi-Agent Exploitation Models with Social Learning

This repository includes four models based on the copan:EXPLOIT model, that are aiming to develop more complex learning processes. The models simulate agent behavior in a resource harvesting scenario. The agents' decision processes vary across the models, incorporating probabilistic imitation, Q-learning, and more sophisticated reinforcement learning techniques. The original Python/Cython implementation can be found here: https://github.com/wbarfuss/cyexploit.

Two types of codes are given, one with explicit modules for the different parts of the model (e.g. resource pools, agents, trainer, etc.), and another that aims at improving the performance.

Instructions

  • For development:

    1. Clone the project repository: git clone [email protected]:DezrannCAS/Exploit_RL.git && cd Exploit_RL.
    2. Create the virtual environment: hatch env create; optionally you can specify the python version using: HATCH_PYTHON=python3.12 hatch env create. Note: for Python 3.13 support, use the numba 0.61.0rc1 pre-release; for Intel GPUs support, use the latest release of PyTorch (PyTorch 2.5).
    3. Activate the virtual environment: hatch env shell.
  • For installation:

    1. Clone the project repository: git clone [email protected]:DezrannCAS/Exploit_RL.git && cd Exploit_RL.
    2. Build the package: pip install build && python -m build.
    3. Install the package: pip install dist/Exploit_RL-<version>-py3-none-any.whl.

Models

1. Probabilistic Imitation Model

This model is simply a restructuring of the original COPAN model without rewiring processes, and using Numba's Just-in-Time compilation for specific functions, intead of Cython. Agents imitate their neighbors' harvesting decisions based on observed differences in harvest outcomes.

2. Q-Learning Imitation Model

Building on M-1, this model integrates Q-learning into the agents' decision-making processes. Agents take the majority action of their neighborhood as the state, choose to imitate or not as the action, and receive rewards based on the average difference in harvest.

3. Observation-Augmented Exploration Model

M-3 shifts focus from neighborhood actions to individual stock levels. Agents use their current stock levels as the state and decide on the effort level for harvesting as the action. This model combines regular RL algorithms for exploitation policies with neighbor imitation for exploration.

4. Observational Learning Model

The M-4 model also uses agents' current stock levels as the state and effort levels as the action. However, it employs a deep learning framework combining LSTM networks, actor-critic methods, and predictors. Importantly, in this model, all agents share a single resource pool, unlike other models in which agents learn to harvest private resources.

The implementation is inspired by those from CleanRL: ppo_continuous_action and ppo_atari_lstm.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages