Skip to content

Commit

Permalink
Adding user guide (#81)
Browse files Browse the repository at this point in the history
* completing user-guide

* adding benchmarks

* renaming the correct file

* update images to raw

* adding sequential processing caveat
  • Loading branch information
adamamer20 authored Aug 28, 2024
1 parent 87935b7 commit 329eb16
Show file tree
Hide file tree
Showing 10 changed files with 537 additions and 29 deletions.
3 changes: 2 additions & 1 deletion .markdownlint.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{
"MD013": false
"MD013": false,
"MD046": false
}
20 changes: 7 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# mesa-frames
# mesa-frames 🚀

mesa-frames is an extension of the [mesa](https://github.com/projectmesa/mesa) framework, designed for complex simulations with thousands of agents. By storing agents in a DataFrame, mesa-frames significantly enhances the performance and scalability of mesa, while maintaining a similar syntax. mesa-frames allows for the use of [vectorized functions](https://vegibit.com/what-is-a-vectorized-operation-in-pandas/) whenever simultaneous activation of agents is possible.
mesa-frames is an extension of the [mesa](https://github.com/projectmesa/mesa) framework, designed for complex simulations with thousands of agents. By storing agents in a DataFrame, mesa-frames significantly enhances the performance and scalability of mesa, while maintaining a similar syntax. mesa-frames allows for the use of [vectorized functions](https://stackoverflow.com/a/1422198) which significantly speeds up operations whenever simultaneous activation of agents is possible.

## Why DataFrames?
## Why DataFrames? 📊

DataFrames are optimized for simultaneous operations through [SIMD processing](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data). At the moment, mesa-frames supports the use of two main libraries: pandas and Polars.

Expand Down Expand Up @@ -45,10 +45,7 @@ conda activate myenv
Then, to install mesa-frames itself:

```bash
# For pandas backend
pip install -e .[pandas]
# Alternatively, for Polars backend
pip install -e .[polars]
pip install -e .
```

### Installing in a Python Virtual Environment
Expand All @@ -69,10 +66,7 @@ source myenv/bin/activate # On Windows, use `myenv\Scripts\activate`
Then, to install mesa-frames itself:

```bash
# For pandas backend
pip install -e .[pandas]
# Alternatively, for Polars backend
pip install -e .[polars]
pip install -e .
```

## Usage
Expand Down Expand Up @@ -141,11 +135,11 @@ class MoneyModelDF(ModelDF):
self.step()
```

## What's Next?
## What's Next? 🔮

- Refine the API to make it more understandable for someone who is already familiar with the mesa package. The goal is to provide a seamless experience for users transitioning to or incorporating mesa-frames.
- Adding support for default mesa functions to ensure that the standard mesa functionality is preserved.
- Adding GPU functionality (cuDF and Rapids).
- Adding GPU functionality (cuDF and Dask-cuDF).
- Creating a decorator that will automatically vectorize an existing mesa model. This feature will allow users to easily tap into the performance enhancements that mesa-frames offers without significant code alterations.
- Creating a unique class for AgentSet, independent of the backend implementation.

Expand Down
4 changes: 4 additions & 0 deletions docs/general/development/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Development Guidelines

!!! warning
This page is in construction, check it again soon.
27 changes: 19 additions & 8 deletions docs/general/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

mesa-frames is an extension of the [mesa](https://github.com/projectmesa/mesa) framework, designed for complex simulations with thousands of agents. By storing agents in a DataFrame, mesa-frames significantly enhances the performance and scalability of mesa, while maintaining a similar syntax.

You can get a model which is multiple orders of magnitude faster based on the number of agents - the more agents, the faster the relative performance.

## Why DataFrames? 📊

DataFrames are optimized for simultaneous operations through [SIMD processing](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data). Currently, mesa-frames supports two main libraries:
Expand All @@ -21,20 +23,27 @@ Check out our performance graphs comparing mesa and mesa-frames for the [Boltzma

### Installation

#### Installing from PyPI

```bash
pip install mesa-frames
```

#### Installing from Source

```bash
git clone https://github.com/adamamer20/mesa_frames.git
cd mesa_frames
pip install -e .[pandas] # For pandas backend
# or
pip install -e .[polars] # For Polars backend
pip install -e .
```

### Basic Usage

Here's a quick example of how to create an agent set using mesa-frames:
Here's a quick example of how to create a model using mesa-frames:

```python
from mesa-frames import AgentSetPolars, ModelDF
from mesa_frames import AgentSetPolars, ModelDF
import polars as pl

class MoneyAgentPolars(AgentSetPolars):
def __init__(self, n: int, model: ModelDF):
Expand Down Expand Up @@ -65,10 +74,12 @@ class MoneyModelDF(ModelDF):
## What's Next? 🔮

- API refinement for seamless transition from mesa
- Support for default mesa functions
- GPU functionality (cuDF and Rapids)
- Support for mesa functions
- Multiple other spaces: GeoGrid, ContinuousSpace, Network...
- Additional backends: Dask, cuDF (GPU), Dask-cuDF (GPU)...
- More examples: Schelling model, ...
- Automatic vectorization of existing mesa models
- Backend-independent AgentSet class
- Backend-agnostic AgentSet class

## Get Involved! 🤝

Expand Down
183 changes: 183 additions & 0 deletions docs/general/user-guide/0_getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# Getting Started 🚀

## Main Concepts 🧠

### DataFrame-Based Object-Oriented Framework 📊

Unlike traditional mesa models where each agent is an individual Python object, mesa-frames stores all agents of a particular type in a single DataFrame. We operate only at the AgentSet level.

This approach allows for:

- Efficient memory usage
- Improved performance through vectorized operations on agent attributes (This is what makes `mesa-frames` fast)

Objects can be easily subclassed to respect mesa's object-oriented philosophy.

### Vectorized Operations ⚡

mesa-frames leverages the power of vectorized operations provided by DataFrame libraries:

- Operations are performed on entire columns of data at once
- This approach is significantly faster than iterating over individual agents
- Complex behaviors can be expressed in fewer lines of code

You should never use loops to iterate through your agents. Instead, use vectorized operations and implemented methods. If you need to loop, loop through vectorized operations (see the advanced tutorial SugarScape IG for more information).

It's important to note that in traditional `mesa` models, the order in which agents are activated can significantly impact the results of the model (see [Comer, 2014](http://mars.gmu.edu/bitstream/handle/1920/9070/Comer_gmu_0883E_10539.pdf)). `mesa-frames`, by default, doesn't have this issue as all agents are processed simultaneously. However, this comes with the trade-off of needing to carefully implement conflict resolution mechanisms when sequential processing is required. We'll discuss how to handle these situations later in this guide.

Check out these resources to understand vectorization and why it speeds up the code:

- [What is vectorization?](https://stackoverflow.com/a/1422181)
- [Vectorization Explained, Step by Step](https://machinelearningcompass.com/machine_learning_math/vectorization/)

Here's a comparison between mesa-frames and mesa:

=== "mesa-frames"
```python
class MoneyAgentPolarsConcise(AgentSetPolars):
# initialization...

def give_money(self):
# Active agents are changed to wealthy agents
self.select(self.wealth > 0)

# Receiving agents are sampled (only native expressions currently supported)
other_agents = self.agents.sample(
n=len(self.active_agents), with_replacement=True
)

# Wealth of wealthy is decreased by 1
self["active", "wealth"] -= 1

# Compute the income of the other agents (only native expressions currently supported)
new_wealth = other_agents.group_by("unique_id").len()

# Add the income to the other agents
self[new_wealth, "wealth"] += new_wealth["len"]
```

=== "mesa"
```python
class MoneyAgent(mesa.Agent):
# initialization...

def give_money(self):
# Verify agent has some wealth
if self.wealth > 0:
other_agent = self.random.choice(self.model.agents)
if other_agent is not None:
other_agent.wealth += 1
self.wealth -= 1
```

As you can see, while in mesa you should iterate through all the agents' steps in the model class, here you execute the method once for all agents.

### Backend Flexibility 🔄

mesa-frames aims to support multiple DataFrame backends:
The supported backends right now are

- **pandas**: A widely-used data manipulation library
- **Polars**: A high-performance DataFrame library written in Rust

Users can choose the backend that best suits their needs:

```python
from mesa_frames import AgentSetPandas # or AgentSetPolars
```

Currently, there are two implementations of AgentSetDF and GridDF, one for each backend implementation: AgentSetPandas and AgentSetPolars, and GridPandas and GridPolars.
We encourage you to use the Polars implementation for increased performance. We are working on creating a unique interface [here](https://github.com/adamamer20/mesa-frames/discussions/12). Let us know what you think!

Soon we will also have multiple other backends like Dask, cuDF, and Dask-cuDF!

## Coming from mesa 🔀

If you're familiar with mesa, this guide will help you understand the key differences in code structure between mesa and mesa-frames.

### Agent Representation 👥

- mesa: Each agent is an individual object instance. Methods are defined for individual agents and called on each agent.
- mesa-frames: Agents are rows in a DataFrame, grouped into AgentSets. Methods are defined for AgentSets and operate on all agents simultaneously.

=== "mesa-frames"
```python
class MoneyAgentSet(AgentSetPolars):
def **init**(self, n, model):
super().**init**(model)
self += pl.DataFrame({
"unique_id": pl.arange(n),
"wealth": pl.ones(n)
})

def step(self):
givers = self.wealth > 0
receivers = self.agents.sample(n=len(self.active_agents))
self[givers, "wealth"] -= 1
new_wealth = receivers.groupby("unique_id").count()
self[new_wealth["unique_id"], "wealth"] += new_wealth["count"]
```

=== "mesa"
```python
class MoneyAgent(Agent):
def **init**(self, unique_id, model):
super().**init**(unique_id, model)
self.wealth = 1

def step(self):
if self.wealth > 0:
other_agent = self.random.choice(self.model.schedule.agents)
other_agent.wealth += 1
self.wealth -= 1
```

### Model Structure 🏗️

- mesa: Models manage individual agents and use a scheduler.
- mesa-frames: Models manage AgentSets and directly control the simulation flow.

=== "mesa-frames"
```python
class MoneyModel(ModelDF):
def **init**(self, N):
super().**init**()
self.agents += MoneyAgentSet(N, self)

def step(self):
self.agents.do("step")
```

=== "mesa"
```python
class MoneyModel(Model):
def **init**(self, N):
self.num_agents = N
self.schedule = RandomActivation(self)
for i in range(self.num_agents):
a = MoneyAgent(i, self)
self.schedule.add(a)

def step(self):
self.schedule.step()
```

### Transition Tips 💡

1. **Think in Sets 🎭**: Instead of individual agents, think about operations on groups of agents.
2. **Leverage DataFrame Operations 🛠️**: Familiarize yourself with pandas or Polars operations for efficient agent manipulation.
3. **Vectorize Logic 🚅**: Convert loops and conditionals to vectorized operations where possible.
4. **Use AgentSets 📦**: Group similar agents into AgentSets instead of creating many individual agent classes.

### Handling Race Conditions 🏁

When simultaneous activation is not possible, you need to handle race conditions carefully. There are two main approaches:

1. **Custom UDF with Numba 🔧**: Use a custom User Defined Function (UDF) with Numba for efficient sequential processing.

- [Polars UDF Guide](https://docs.pola.rs/user-guide/expressions/user-defined-functions/)
- [pandas Numba Engine](https://pandas.pydata.org/pandas-docs/stable/user_guide/window.html#numba-engine)

2. **Looping Mechanism 🔁**: Implement a looping mechanism on vectorized operations.

For a more detailed implementation of handling race conditions, please refer to the `examples/sugarscape-ig` in the mesa-frames repository. This example demonstrates how to implement the Sugarscape model with instantaneous growback, which requires careful handling of sequential agent actions.
67 changes: 67 additions & 0 deletions docs/general/user-guide/1_classes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Classes 📚

## AgentSetDF 👥

To create your own AgentSetDF class, you need to subclass the AgentSetPolars or AgentSetPandas class and make sure to call `super().__init__(model)`.

Typically, the next step would be to populate the class with your agents. To do that, you need to add a DataFrame to the AgentSetDF. You can do `self += agents` or `self.add(agents)`, where `agents` is a DataFrame or something that could be passed to a DataFrame constructor, like a dictionary or lists of lists. You need to make sure your DataFrame has a 'unique_id' column and that the ids are unique across the model, otherwise you will get an error raised. In the DataFrame, you should also put any attribute of the agent you are using.

How can you choose which agents should be in the same AgentSet? The idea is that you should minimize the missing values in the DataFrame (so they should have similar/same attributes) and mostly everybody should do the same actions.

Example:

```python
class MoneyAgent(AgentSetPolars):
def __init__(self, n: int, model: ModelDF):
super().__init__(model)
self.initial_wealth = pl.ones(n)
self += pl.DataFrame({
"unique_id": pl.arange(n),
"wealth": self.initial_wealth
})

def step(self):
self["wealth"] = self["wealth"] + self.random.integers(n)
```

You can access the underlying DataFrame where agents are stored with `self.agents`. This allows you to use DataFrame methods like `self.agents.sample` or `self.agents.group_by("wealth")` and more.

## ModelDF 🏗️

To add your AgentSetDF to your ModelDF, you should also add it to the agents with `+=` or `add`.

NOTE: ModelDF.agents are stored in a class which is entirely similar to AgentSetDF called AgentsDF. The API of the two are the same. If you try accessing AgentsDF.agents, you will get a dictionary of `[AgentSetDF, DataFrame]`.

Example:

```python
class EcosystemModel(ModelDF):
def __init__(self, n_prey, n_predators):
super().__init__()
self.agents += Preys(n_prey, self)
self.agents += Predators(n_predators, self)

def step(self):
self.agents.do("move")
self.agents.do("hunt")
self.prey.do("reproduce")
```

## Space: GridDF 🌐

mesa-frames provides efficient implementations of spatial environments:

- Spatial operations (like moving agents) are vectorized for performance

Example:

```python
class GridWorld(ModelDF):
def __init__(self, width, height):
super().__init__()
self.space = GridPolars(self, (width, height))
self.agents += AgentSet(100, self)
self.space.place_to_empty(self.agents)
```

A continuous GeoSpace, NetworkSpace, and a collection to have multiple spaces in the models are in the works! 🚧
Loading

0 comments on commit 329eb16

Please sign in to comment.