Developing Autonomous Drone Swarms with Multi-Agent Reinforcement Learning for Scalable Post-Disaster Damage Assessment
Efficient Case | Reduncant Case | Unsuccessful Case |
---|---|---|
In recent years, drones have been used for supporting post-disaster damage assessment of buildings, roads, and infrastructure. Given that human resources are limited post-disaster, the development of autonomous drone swarms rather than a single drone can enable a further rapid and thorough assessment. Multi-Agent Reinforcement Learning is a promising approach because it can adapt to dynamic environments and relax computational complexity for optimization. This research project applies Multi-Agent Q-learning to autonomous navigation for mapping and multi-objective drone swarm exploration of a disaster area in terms of tradeoffs between coverage and scalability. We compare two different state spaces by assessing coverage of the environment within the given limited time. We find that using drones’ observations of the mapping status within their Field of View as a state is better than using their positions in terms of coverage and scalability. Also, we empirically specify parametric thresholds for performance of a multi-objective RL algorithm as a point of reference for future work incorporating deep learning approaches, e.g., Graph Neural Networks (GNNs).
Daisuke Nakanishi, Gurpreet Singh, Kshitij Chandna
.
├── experiment # Jupyter notebooks to run the algorithms
├── vis # Jupyter notebooks to create animations
├── single # RL environment for single agent
├── multi # RL environment for multi agents
├── QL # Tabular Q-learning
├── QL_NN # (Preliminary) Function approximation Q-learning
├── DQN # (Preliminary) Deep Q-Networks
├── GNN # (Preliminary) Graph Neural Networks
├── .gitignore
└── README.md
We discretize the 2-dimensional mission environment (disaster area) into a grid consisting of m × m square cells. The length of the square cell side is sufficiently larger than the size of the drone, and two or more drones can occupy a single cell . The cell visited by at least one drone is considered mapped.
Each time step, n drones take action sequentially; the preceding drone mapping results are reflected in the mapping status matrix M before the following one decides its action. Therefore, the following drone could indirectly and partially observe the preceding ones’ behaviors. The order of drones taking action is randomly set in each time step.
We examine two state spaces to deal with possible disaster areas and scenarios. Action space consists of four possible directions, A={up,down,right,left}.
Our Webpage: https://dn2153.wixsite.com/drone
- OpenAI Gym==0.21.0 or newer
- Python==3.7.12 or newer
- (Optional) Stable Baselines3==1.3.0 or newer
The authors would like to thank Dr. David K.A. Mordecai at RiskEcon® Lab @ Courant Institute of Mathematical Sciences NYU and Dr. Giuseppe Loianno at Agile Robotics and Perception Lab @ NYU Tandon School of Engineering for supervising the research.