Adding user guide (#81)

* completing user-guide * adding benchmarks * renaming the correct file * update images to raw * adding sequential processing caveat
projectmesa · Aug 28, 2024 · 329eb16 · 329eb16
1 parent 87935b7
commit 329eb16
Show file tree

Hide file tree

Showing 10 changed files with 537 additions and 29 deletions.
diff --git a/.markdownlint.json b/.markdownlint.json
@@ -1,3 +1,4 @@
 {
-    "MD013": false
+    "MD013": false,
+    "MD046": false
 }
diff --git a/README.md b/README.md
@@ -1,8 +1,8 @@
-# mesa-frames
+# mesa-frames 🚀
 
-mesa-frames is an extension of the [mesa](https://github.com/projectmesa/mesa) framework, designed for complex simulations with thousands of agents. By storing agents in a DataFrame, mesa-frames significantly enhances the performance and scalability of mesa, while maintaining a similar syntax. mesa-frames allows for the use of [vectorized functions](https://vegibit.com/what-is-a-vectorized-operation-in-pandas/) whenever simultaneous activation of agents is possible.
+mesa-frames is an extension of the [mesa](https://github.com/projectmesa/mesa) framework, designed for complex simulations with thousands of agents. By storing agents in a DataFrame, mesa-frames significantly enhances the performance and scalability of mesa, while maintaining a similar syntax. mesa-frames allows for the use of [vectorized functions](https://stackoverflow.com/a/1422198) which significantly speeds up operations whenever simultaneous activation of agents is possible.
 
-## Why DataFrames?
+## Why DataFrames? 📊
 
 DataFrames are optimized for simultaneous operations through [SIMD processing](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data). At the moment, mesa-frames supports the use of two main libraries: pandas and Polars.
 
@@ -45,10 +45,7 @@ conda activate myenv
 Then, to install mesa-frames itself:
 
 ```bash
-# For pandas backend
-pip install -e .[pandas]
-# Alternatively, for Polars backend
-pip install -e .[polars]
+pip install -e .
 ```
 
 ### Installing in a Python Virtual Environment
@@ -69,10 +66,7 @@ source myenv/bin/activate  # On Windows, use `myenv\Scripts\activate`
 Then, to install mesa-frames itself:
 
 ```bash
-# For pandas backend
-pip install -e .[pandas]
-# Alternatively, for Polars backend
-pip install -e .[polars]
+pip install -e .
 ```
 
 ## Usage
@@ -141,11 +135,11 @@ class MoneyModelDF(ModelDF):
             self.step()
 ```
 
-## What's Next?
+## What's Next? 🔮
 
 - Refine the API to make it more understandable for someone who is already familiar with the mesa package. The goal is to provide a seamless experience for users transitioning to or incorporating mesa-frames.
 - Adding support for default mesa functions to ensure that the standard mesa functionality is preserved.
-- Adding GPU functionality (cuDF and Rapids).
+- Adding GPU functionality (cuDF and Dask-cuDF).
 - Creating a decorator that will automatically vectorize an existing mesa model. This feature will allow users to easily tap into the performance enhancements that mesa-frames offers without significant code alterations.
 - Creating a unique class for AgentSet, independent of the backend implementation.
 

diff --git a/docs/general/development/index.md b/docs/general/development/index.md
@@ -0,0 +1,4 @@
+# Development Guidelines
+
+!!! warning
+    This page is in construction, check it again soon.
diff --git a/docs/general/index.md b/docs/general/index.md
@@ -2,6 +2,8 @@
 
 mesa-frames is an extension of the [mesa](https://github.com/projectmesa/mesa) framework, designed for complex simulations with thousands of agents. By storing agents in a DataFrame, mesa-frames significantly enhances the performance and scalability of mesa, while maintaining a similar syntax.
 
+You can get a model which is multiple orders of magnitude faster based on the number of agents - the more agents, the faster the relative performance.
+
 ## Why DataFrames? 📊
 
 DataFrames are optimized for simultaneous operations through [SIMD processing](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data). Currently, mesa-frames supports two main libraries:
@@ -21,20 +23,27 @@ Check out our performance graphs comparing mesa and mesa-frames for the [Boltzma
 
 ### Installation
 
+#### Installing from PyPI
+
+```bash
+pip install mesa-frames
+```
+
+#### Installing from Source
+
 ```bash
 git clone https://github.com/adamamer20/mesa_frames.git
 cd mesa_frames
-pip install -e .[pandas]  # For pandas backend
-# or
-pip install -e .[polars]  # For Polars backend
+pip install -e .
 ```
 
 ### Basic Usage
 
-Here's a quick example of how to create an agent set using mesa-frames:
+Here's a quick example of how to create a model using mesa-frames:
 
 ```python
-from mesa-frames import AgentSetPolars, ModelDF
+from mesa_frames import AgentSetPolars, ModelDF
+import polars as pl
 
 class MoneyAgentPolars(AgentSetPolars):
     def __init__(self, n: int, model: ModelDF):
@@ -65,10 +74,12 @@ class MoneyModelDF(ModelDF):
 ## What's Next? 🔮
 
 - API refinement for seamless transition from mesa
-- Support for default mesa functions
-- GPU functionality (cuDF and Rapids)
+- Support for mesa functions
+- Multiple other spaces: GeoGrid, ContinuousSpace, Network...
+- Additional backends: Dask, cuDF (GPU), Dask-cuDF (GPU)...
+- More examples: Schelling model, ...
 - Automatic vectorization of existing mesa models
-- Backend-independent AgentSet class
+- Backend-agnostic AgentSet class
 
 ## Get Involved! 🤝
 

diff --git a/docs/general/user-guide/0_getting-started.md b/docs/general/user-guide/0_getting-started.md
@@ -0,0 +1,183 @@
+# Getting Started 🚀
+
+## Main Concepts 🧠
+
+### DataFrame-Based Object-Oriented Framework 📊
+
+Unlike traditional mesa models where each agent is an individual Python object, mesa-frames stores all agents of a particular type in a single DataFrame. We operate only at the AgentSet level.
+
+This approach allows for:
+
+- Efficient memory usage
+- Improved performance through vectorized operations on agent attributes (This is what makes `mesa-frames` fast)
+
+Objects can be easily subclassed to respect mesa's object-oriented philosophy.
+
+### Vectorized Operations ⚡
+
+mesa-frames leverages the power of vectorized operations provided by DataFrame libraries:
+
+- Operations are performed on entire columns of data at once
+- This approach is significantly faster than iterating over individual agents
+- Complex behaviors can be expressed in fewer lines of code
+
+You should never use loops to iterate through your agents. Instead, use vectorized operations and implemented methods. If you need to loop, loop through vectorized operations (see the advanced tutorial SugarScape IG for more information).
+
+It's important to note that in traditional `mesa` models, the order in which agents are activated can significantly impact the results of the model (see [Comer, 2014](http://mars.gmu.edu/bitstream/handle/1920/9070/Comer_gmu_0883E_10539.pdf)). `mesa-frames`, by default, doesn't have this issue as all agents are processed simultaneously. However, this comes with the trade-off of needing to carefully implement conflict resolution mechanisms when sequential processing is required. We'll discuss how to handle these situations later in this guide.
+
+Check out these resources to understand vectorization and why it speeds up the code:
+
+- [What is vectorization?](https://stackoverflow.com/a/1422181)
+- [Vectorization Explained, Step by Step](https://machinelearningcompass.com/machine_learning_math/vectorization/)
+
+Here's a comparison between mesa-frames and mesa:
+
+=== "mesa-frames"
+    ```python
+    class MoneyAgentPolarsConcise(AgentSetPolars):
+        # initialization...
+
+        def give_money(self):
+            # Active agents are changed to wealthy agents
+            self.select(self.wealth > 0)
+
+            # Receiving agents are sampled (only native expressions currently supported)
+            other_agents = self.agents.sample(
+                n=len(self.active_agents), with_replacement=True
+            )
+
+            # Wealth of wealthy is decreased by 1
+            self["active", "wealth"] -= 1
+
+            # Compute the income of the other agents (only native expressions currently supported)
+            new_wealth = other_agents.group_by("unique_id").len()
+
+            # Add the income to the other agents
+            self[new_wealth, "wealth"] += new_wealth["len"]
+    ```
+
+=== "mesa"
+    ```python
+    class MoneyAgent(mesa.Agent):
+        # initialization...
+
+        def give_money(self):
+            # Verify agent has some wealth
+            if self.wealth > 0:
+                other_agent = self.random.choice(self.model.agents)
+                if other_agent is not None:
+                    other_agent.wealth += 1
+                    self.wealth -= 1
+    ```
+
+As you can see, while in mesa you should iterate through all the agents' steps in the model class, here you execute the method once for all agents.
+
+### Backend Flexibility 🔄
+
+mesa-frames aims to support multiple DataFrame backends:
+The supported backends right now are
+
+- **pandas**: A widely-used data manipulation library
+- **Polars**: A high-performance DataFrame library written in Rust
+
+Users can choose the backend that best suits their needs:
+
+    ```python
+    from mesa_frames import AgentSetPandas  # or AgentSetPolars
+    ```
+
+Currently, there are two implementations of AgentSetDF and GridDF, one for each backend implementation: AgentSetPandas and AgentSetPolars, and GridPandas and GridPolars.
+We encourage you to use the Polars implementation for increased performance. We are working on creating a unique interface [here](https://github.com/adamamer20/mesa-frames/discussions/12). Let us know what you think!
+
+Soon we will also have multiple other backends like Dask, cuDF, and Dask-cuDF!
+
+## Coming from mesa 🔀
+
+If you're familiar with mesa, this guide will help you understand the key differences in code structure between mesa and mesa-frames.
+
+### Agent Representation 👥
+
+- mesa: Each agent is an individual object instance. Methods are defined for individual agents and called on each agent.
+- mesa-frames: Agents are rows in a DataFrame, grouped into AgentSets. Methods are defined for AgentSets and operate on all agents simultaneously.
+
+=== "mesa-frames"
+    ```python
+    class MoneyAgentSet(AgentSetPolars):
+        def **init**(self, n, model):
+            super().**init**(model)
+            self += pl.DataFrame({
+                "unique_id": pl.arange(n),
+                "wealth": pl.ones(n)
+            })
+
+        def step(self):
+            givers = self.wealth > 0
+            receivers = self.agents.sample(n=len(self.active_agents))
+            self[givers, "wealth"] -= 1
+            new_wealth = receivers.groupby("unique_id").count()
+            self[new_wealth["unique_id"], "wealth"] += new_wealth["count"]
+    ```
+
+=== "mesa"
+    ```python
+    class MoneyAgent(Agent):
+        def **init**(self, unique_id, model):
+            super().**init**(unique_id, model)
+            self.wealth = 1
+
+        def step(self):
+            if self.wealth > 0:
+                other_agent = self.random.choice(self.model.schedule.agents)
+                other_agent.wealth += 1
+                self.wealth -= 1
+    ```
+
+### Model Structure 🏗️
+
+- mesa: Models manage individual agents and use a scheduler.
+- mesa-frames: Models manage AgentSets and directly control the simulation flow.
+
+=== "mesa-frames"
+    ```python
+    class MoneyModel(ModelDF):
+        def **init**(self, N):
+            super().**init**()
+            self.agents += MoneyAgentSet(N, self)
+
+        def step(self):
+            self.agents.do("step")
+    ```
+
+=== "mesa"
+    ```python
+    class MoneyModel(Model):
+        def **init**(self, N):
+            self.num_agents = N
+            self.schedule = RandomActivation(self)
+            for i in range(self.num_agents):
+                a = MoneyAgent(i, self)
+                self.schedule.add(a)
+
+        def step(self):
+            self.schedule.step()
+    ```
+
+### Transition Tips 💡
+
+1. **Think in Sets 🎭**: Instead of individual agents, think about operations on groups of agents.
+2. **Leverage DataFrame Operations 🛠️**: Familiarize yourself with pandas or Polars operations for efficient agent manipulation.
+3. **Vectorize Logic 🚅**: Convert loops and conditionals to vectorized operations where possible.
+4. **Use AgentSets 📦**: Group similar agents into AgentSets instead of creating many individual agent classes.
+
+### Handling Race Conditions 🏁
+
+When simultaneous activation is not possible, you need to handle race conditions carefully. There are two main approaches:
+
+1. **Custom UDF with Numba 🔧**: Use a custom User Defined Function (UDF) with Numba for efficient sequential processing.
+
+   - [Polars UDF Guide](https://docs.pola.rs/user-guide/expressions/user-defined-functions/)
+   - [pandas Numba Engine](https://pandas.pydata.org/pandas-docs/stable/user_guide/window.html#numba-engine)
+
+2. **Looping Mechanism 🔁**: Implement a looping mechanism on vectorized operations.
+
+For a more detailed implementation of handling race conditions, please refer to the `examples/sugarscape-ig` in the mesa-frames repository. This example demonstrates how to implement the Sugarscape model with instantaneous growback, which requires careful handling of sequential agent actions.
diff --git a/docs/general/user-guide/1_classes.md b/docs/general/user-guide/1_classes.md
@@ -0,0 +1,67 @@
+# Classes 📚
+
+## AgentSetDF 👥
+
+To create your own AgentSetDF class, you need to subclass the AgentSetPolars or AgentSetPandas class and make sure to call `super().__init__(model)`.
+
+Typically, the next step would be to populate the class with your agents. To do that, you need to add a DataFrame to the AgentSetDF. You can do `self += agents` or `self.add(agents)`, where `agents` is a DataFrame or something that could be passed to a DataFrame constructor, like a dictionary or lists of lists. You need to make sure your DataFrame has a 'unique_id' column and that the ids are unique across the model, otherwise you will get an error raised. In the DataFrame, you should also put any attribute of the agent you are using.
+
+How can you choose which agents should be in the same AgentSet? The idea is that you should minimize the missing values in the DataFrame (so they should have similar/same attributes) and mostly everybody should do the same actions.
+
+Example:
+
+```python
+class MoneyAgent(AgentSetPolars):
+    def __init__(self, n: int, model: ModelDF):
+        super().__init__(model)
+        self.initial_wealth = pl.ones(n)
+        self += pl.DataFrame({
+            "unique_id": pl.arange(n),
+            "wealth": self.initial_wealth
+        })
+
+    def step(self):
+        self["wealth"] = self["wealth"] + self.random.integers(n)
+```
+
+You can access the underlying DataFrame where agents are stored with `self.agents`. This allows you to use DataFrame methods like `self.agents.sample` or `self.agents.group_by("wealth")` and more.
+
+## ModelDF 🏗️
+
+To add your AgentSetDF to your ModelDF, you should also add it to the agents with `+=` or `add`.
+
+NOTE: ModelDF.agents are stored in a class which is entirely similar to AgentSetDF called AgentsDF. The API of the two are the same. If you try accessing AgentsDF.agents, you will get a dictionary of `[AgentSetDF, DataFrame]`.
+
+Example:
+
+```python
+class EcosystemModel(ModelDF):
+    def __init__(self, n_prey, n_predators):
+        super().__init__()
+        self.agents += Preys(n_prey, self)
+        self.agents += Predators(n_predators, self)
+
+    def step(self):
+        self.agents.do("move")
+        self.agents.do("hunt")
+        self.prey.do("reproduce")
+```
+
+## Space: GridDF 🌐
+
+mesa-frames provides efficient implementations of spatial environments:
+
+- Spatial operations (like moving agents) are vectorized for performance
+
+Example:
+
+```python
+class GridWorld(ModelDF):
+    def __init__(self, width, height):
+        super().__init__()
+        self.space = GridPolars(self, (width, height))
+        self.agents += AgentSet(100, self)
+        self.space.place_to_empty(self.agents)
+```
+
+A continuous GeoSpace, NetworkSpace, and a collection to have multiple spaces in the models are in the works! 🚧