Reinforcement Learning Model Training and Prediction for Neuromatch group project This repository contains Python scripts and utilities for training a custom Reinforcement Learning (RL) model using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. The trained model can be used for predicting actions in a custom Gym environment.
The problem is goal-oriented spatial navigation with multi-modal sensory inputs. Two sensory inputs, that can be knocked-out (added noise) in order to investigate the importance of the senses: How do different modalities contribute to goal-oriented spatial navigation?
Agent in complex environment:
complex.mp4
Environment Design:
- 2D env (Gymnasium) with continuous action space
- action space:
- acceleration
- angular acceleration
- rewards:
- 1000 points for finding the goal
- negative reward for energy expenditure
- negative reward for distance from the goal
Observation space (6 inputs):
- sensory inputs:
- vision - one hot encoding (0/1) modeled with limited angle cone, (wall vision included - based on ray casting)
- smell - modeled as euclidean distance from the goal
- time elapsed
- velocity, angular velocity
- at each time step agent receives
(5, 6)
matrix containing a current time frame and a memory of previous 4 frames
Training specific:
- multiple-round training integration with model management and custom logging
- early stopping if no training progress
- goal size can be changed, custom walls/obstacles
For our sample training we gradually decreased the size of the goal and added the walls.
Training example:
train.mp4
The repository contains the following files:
bee.py
: Environment (gym) setup - agents action, rewards and space definition for the RL model.model.py
: A Python module containing utility functions for initializing and loading the RL model.utils.py
: A Python module containing utility functions for creating directories, saving the configuration, and more. 3train_model.py
: The main script to train the RL model.evaluate_model.py
: evaluation script for sensory input knockoutrender_model.py
: A script to generate and display a video of the RL model's predictions.gym_run.py
: Demonstration file for testing gym changes
configs
- contain all of the sample testing and training configurationsnotebooks
- will configs used for training and analysis visualizationfigures
-
Configure the
config.yaml
file with the desired parameters for training the RL model. -
Run the training script:
python train_model.py --config_path config.yaml
You can have a set of consecutive training rounds, make sure to specify each in a separate config.yaml and set proper alias setting. This is demonstrated in:
bash run-multiround-training.sh
-
Sensory knock-out experiments can be performed and saved:
python evaluate_model.py --config_path configs/test-config.yaml
-
After training, you can generate a video of the model's predictions using the
render_model.py
script or evaluate the model and save the episodes into a test log:python render_model.py --config_path configs/config.yaml
-
Analysis of the models performance
sense-knockout-analysis.ipynb
- plot failure rate and episodes lengths under each sensory conditiontraining-visualization.ipynb
- plot multi-round metrics
-
You can also use the convenience jupyter-notebooks, useful for working in google colab.
- Stable Baselines3: https://github.com/DLR-RM/stable-baselines3
- Gymnasium: https://github.com/openai/gym
This project is licensed under the MIT License - see the LICENSE
file for details.