Skip to content

driving_swarm_evolution

treegeist edited this page May 23, 2021 · 25 revisions

Description

Link to the repository

This ROS2 package provides a general framework for working with evolutionary algorithms / robotics.

Controller model implementations:

  • Reference finite-state machines

This code / package can:

  • Work on ROS2 and with python3 (yay!)
  • Use ROS2-nodes for simulated evolution with a distributed evolution manager
  • Do basic empirical evaluation, as re-evaluation is based on fitness and genome age
  • Use custom fitness-functions, representation models etc. by swapping out the respective nodes (fit_fun_*, fsm_eval, etc.)
  • Backup stats and a hall of fame of individuals

It can not (yet):

  • Do multi-objective fitness (lacking dist_ea)
  • Do real / hybrid evaluation (real and simulated robots)

TODO screenshot plot + behaviour with arrows

Setup

  • Setup ROS2 and Gazebo
  • Install additional python dependencies with sudo apt install python3-deap or pip3 install deap
  • For stat_plot.py install dependencies with pip3 install seaborn pandas
  • git clone <this repository>, colcon build and source install/setup.bash / setup-workspace.sh for docker

With the docker development image

For easy setup and development with docker use this repository

  1. Clone the driving_swarm_evolution repository

  2. Start the driving_swarm_docker development image:

    docker run --name ros-development -d -h rosdev \
      --device=/dev/dri \
      -p 127.0.0.1:1800:1800 \
      -p 127.0.0.1:1900:1900 \
      -p 127.0.0.1:5900:5900 \
      -v "$PWD:/home/docker/workspace/src/driving_swarm_evolution" \
      ovgudrivingswarm/development
    
  3. Go to the Theia IDE (http://localhost:1900)

If you're developing locally (without Theia), you can run the evolution stack headlessly (i.e. without GUI) with the following command:

docker run --rm --name dsevolution -h dsevolution --device=/dev/dri \
     -v "$PWD/workspace/src/driving_swarm_evolution:/home/docker/workspace" \
     ovgudrivingswarm/turtlebot \

Usage

  • Run ros2 launch driving_swarm_evolution multi_bots.launch.py num_worlds:=X num_bots:=Y file_world:=arena_empty.world
    • This spawns X worlds with the arena_empty.world-world (from this package)
    • Also spawns Y turtlebot3-models in each world, by default in the namespace /<hostname>_X/bookturtle_Y
  • Run ros2 launch driving_swarm_evolution multi_eval.launch.py num_worlds:=X num_bots:=Y file_eval:=fsm_eval.launch.py
    • This creates X times Y sets of evaluation nodes (with the same namespaces as from multi_bots) with fsm_eval.launch.py
    • This also starts the file defined with file_exec_launch:=<> for executing the model, by default basic_4.launch.py
  • Finally run ros2 launch driving_swarm_evolution fsm_ea.launch.py for launching the distributed evolutionary algorithm manager
    • Evaluation targets are defined in config/eval_targets.json, all available are used upon launching
    • Make sure that the fitness function in config/fsm_template.json is correct and already started by multi_eval with file_fit_launch:=<>
  • Look for logs in rqt
  • Observe the results and generate fitness graphs with stats/stat_plot.py by calling it with python3 stat_plot.py <input>.csv <output>.png

Defaults and Runtime

  • Default settings for Gazebo-worlds in the <>.world-file are a step-size (step_size) of 0.001 and an amount of iterations (iters) of 150
  • The provided worlds set step_size to 0.005 and let iters set to 150 - in our testing this seems rather stable and about 4x the default speed
  • General default values for the launch-files can be viewed with ros2 launch driving_swarm_evolution <>.launch.py --show-args
  • A full run of a default fsm_ea.launch.py (100 generations, population size of 30, 15 offspring) on an Intel Core i7-8550U-Notebook with an evaluation time of 60s takes about 2.5h
  • A full run on a more powerful server (Ryzen 3600X) takes about 30 minutes

Configuration

Update the config.toml file in the current working directory that will be used by the evolver, fitness function and evaluator, so the file contents depend fully on those components.

The configuration file can be imported to python with from driving_swarm_evolution.config import config.

Quick Launch and Repeated Evaluations

To set the launch parameters (and run multiple evaluations without interaction) in an easier way, you can use runrepeat.py from the tools/-folder. Create a rr.toml-file similar to this:

[parameters]
time_poll_procs = 5.0
time_sleep_runs = 5.0
time_wait_cmd = 5.0
time_wait_kill = 3.0


[parameters.wildcards]
input = { card = "[INPUT]" }
output = { card = "[OUTPUT]" }
options = { card = "[OPTIONS]", pre = "", op=":=", post = ""}

[[runs]]

cmds = [
	["ros2", "launch", "driving_swarm_evolution", "multi_bots.launch.py", "num_worlds:=2", "[OPTIONS]"],
	["ros2", "launch", "driving_swarm_evolution", "multi_eval.launch.py", "num_worlds:=2", "[OPTIONS]"],
	["ros2", "launch", "driving_swarm_evolution", "fsm_ea.launch.py", "[OPTIONS]"],
]

variations = [

	[
		{ options = { file_world = "arena_gate.world" } },
		{ options = { file_fit_exec = "fit_fun_gate.py" } },
		{ options = { num_population = 60 } },
	],
	[
		{ options = { file_world = "arena_gate.world" } },
		{ options = { file_fit_exec = "fit_fun_gate.py" } },
		{ options = { num_population = 60 } },
	],
	[
		{ options = { file_world = "arena_gate.world" } },
		{ options = { file_fit_exec = "fit_fun_gate.py" } },
		{ options = { num_population = 60 } },
	],

]

You can run commands (in general) but also explicitly ros2 commands with python3 runrepeat.py <file>.toml. For introspection it is advisable to create a log-dir and use it with --dir_logs <logs-dir>. For further information run python3 runrepeat.py --help and look at tools/runrepeat_template.toml for default values.

Running ros2-launch-file-commands only works if a node/launch-file actually terminates, for that you potentially have to set on_exit=Shutdown() on the respective Node in the launch-file, so that it shuts down the entire launch-file with it.

Development and Extending

This package can be extended by adding your own evolution classes / functions (which mutate & recombine genomes), fitness function nodes (which rate the fitness of individuals) and evaluation nodes (which control everything from execution to calling the fitness etc. for evaluation)

Using your own evolution functions

The evolver is a simple Python class, by default in a file named evolver.py in the current working directory (config by default). That class must be named Evolver and contain the methods start_genome() and reproduce(self, *individuals):

Code example: own evolution function class
from driving_swarm_evolution.config import config

class Evolver:
	def start_genome(self):
		return config.start_genome

	def reproduce(self, genomes):
		return genomes[0]

Using your own fitness function / node

The fitness function will need to subscribe to ROS topics from the robot's sensors, so it needs to be a node by default, the fit_fun_generic launch file will use the script named fitness.py in the current working directory (config by default). Information can be accessed through services or topics, in the case of fsm_eval it is reset and accessed via services (reset for restarting the evaluation).

Code example: own fitness function node
#!/usr/bin/env python3

import sys
import json

import rclpy
from rclpy.qos import QoSPresetProfiles

from driving_swarm_evolution.fitness import FitnessNode

from sensor_msgs.msg import LaserScan
from nav_msgs.msg import Odometry

# Calculates the fitness of an individual from start to stop
class MyFitnessFunction(FitnessNode):
	def __init__(self):
		super().__init__("my_fitness_function")

		# TODO: declare additional parameters
		self.params["time_cycle"] = self.declare_parameter("time_cycle", 0.1).value
		self.reset()
		
		# TODO: subscribe to the required topics
		sensor_qos = QoSPresetProfiles.SENSOR_DATA.value
		self.scan_subscription = self.create_subscription(LaserScan, "scan", self.scan_handler, sensor_qos)
		self.odom_subscription = self.create_subscription(Odometry, "odom", self.odom_handler, sensor_qos)
		
	def reset(self):
		# TODO: reset values
		self.fitness = 0.0
		self.collision = False

	def fitness_updater(self):
		# TODO: use collected values to calculate fitness
		if not self.collision:
			self.fitness += 1

	# TODO: handle messages from subscriptions

	def scan_handler(self, msg):
		if not self.running:
			return
		# TODO: use message data to update values
		for val in msg.ranges:
			if val != 0.0 and val <= 0.2:
				self.collision = True
				return

	def odom_handler(self, msg):
		if not self.running:
			return
		# TODO: use message data to update values

def main(args=sys.argv):
	rclpy.init(args=args)
	ff = MyFitnessFunction()
	rclpy.spin(ff)

if __name__ == "__main__":
	main()

Using your own evaluation nodes

The evaluation node sets up the environment for fitness evaluation and controls the behaviour of the targets based on the genome.

For writing your own evaluation node (and using it with dist_ea), you have to implement the three services fit_eval (start evaluation), fit_status (get the current status and fitness) and fit_reset (reset both evaluation and bot) with their respective service-types (check the wiki or the fsm_eval source). Expected here is that after calling fit_eval the evaluation gets started, then does not exceed time_timeout in dist_ea (by default 60s) and the implemented status responses are:

  • "WAIT" - Waiting for first instruction (unoccupied)
  • "PROG" - In progress, evaluating
  • "DONE" - Evaluation was successful
  • "FAIL" - Evaluation was not successful
  • "REST" - Evaluation has been reset, but not fully (still occupied)

Using hardware bots / turtlebots instead of simulated ones

This is still completely unsupported, but planned for the future. eval_targets.json contains information about the type of target (simulated or not), fsm_eval currently exits with "FAIL" if a real bot is the target.

Launch-Files, Nodes and Tools

TODO remove fsm_evolver etc. when deprecated

Launch-Files

launchfiles-graph

Launch-files in ROS2 pipe through parameters to the included launch-files even if they are not explicitly declared. Unlisted parameters for included launch-files can therefore still be changed on the top-level CLI with ros2 launch <> <>.

Additionally all launch-files have the parameter namespace:=/<hostname> for setting the namespace of launched nodes / included launch-files.

multi_bots

Launches multiple Gazebo instances (with a world) with multiple spawned bots / sdf-models each:

  • port_gazebo:=11346 - port for first gazebo instances
  • port_increase:=10 - port increase for each gazebo / world spawned
  • num_worlds:=1 - number of gazebo / worlds to be spawned
  • file_arena_launch:=arena.launch.py - path to arena launch-file
  • file_spawn_launch:=multi_spawn.launch.py - path to spawn launch-file
  • all multi_spawn parameters

Instances are spawned in /<name>_X/<name_bots_base>_Y where X is the number of the instance and Y the number of the bot.

arena

Launches Gazebo with a specified world in a specified namespace (with a specified port):

  • port_gazebo:=11346 - port for launching gazebo-instances
  • gui:=false - whether or not gui is supposed to be launched
  • use_sim_time:=false - whether gazbo should use simulated time
  • file_world:=arena_empty.world - file name of world for gazebo launch

multi_spawn

Launches spawn_with_remap and robot_state_publisher for spawning multiple bots:

  • port_gazebo:=11346 - port for the gazebo-instance
  • num_bots:=1 - declare number of launched bots
  • name_bots_base:=bookturtle - base name for turtlebot-namespaces
  • name_spawn_pkg:=driving_swarm_evolution - package for spawning node (soon deprecated)
  • file_spawn_exec:=spawn_with_remap.py - file for spawning node executable
  • file_model:=turtlebot3 - name for model in gazebos database
  • list_pos_x:=-1.0 - list of x positions (multiples for multiple models), separated by spaces
  • list_pos_y:=1.0 - list of positions (multiples for multiple models), separated by spaces
  • list_pos_z:=0.0 - list of positions (multiples for multiple models), separated by spaces

multi_eval

Launches multiple sets of evaluation launch files:

  • time_cycle:=0.01 - time inbetween controlling cycles in s
  • num_worlds:=1 - number of worlds to be spawned
  • num_bots:=1 - number of bots to be spawned
  • name_bots_base:=bookturtle - base robot name for naformespaces
  • file_eval_launch:=fsm_eval.launch.py - path to evaluation launch-file
  • file_behav_launch:=basic_4.launch.py - path to behaviour launch-file
  • file_fit_launch:=fit_fun_generic.launch.py - path to fitness launch-file
  • val_penalty:=0.1 - set base penalty (for fit_fun_generic)

fsm_eval

Launches a finite-state machine evaluator based on a behaviour definition and a fitness function, that can then be used by dist_ea:

  • time_cycle:=0.01 - time inbetween controlling cycles (for calling status functions)
  • file_behav_launch:=basic_4.launch.py - path to behaviour launch-file
  • file_fit_launch:=fit_fun_generic.launch.py - path to fitness launch-file

It launches the nodes fsm_eval, the launch-file fit_fun_generic (by default), fsm_vars and the launch-file basic_4 (by default)

basic_4

Launches a group of behaviour-launch-files (for fsm-eval) based on 4 basic_drive-launches (forward, backward, left, right):

  • time_cycle:=0.01 - time inbetween controlling cycles (looking for obstacles, publishing velocity)
  • vel_linear_front:=0.2 - sets front velocity for basic_drive_front
  • vel_linear_back:=-0.2 - sets back velocity for basic_drive_back
  • vel_angular_left:=-0.4 - sets angular velocity for basic_drive_left
  • vel_angular_right:=-0.4 - sets angular velocity for basic_drive_right

The launch-file basic_drive launches a basic_drive node with:

  • name:=basic_stop
  • time_cycle:=0.01 - time inbetween controlling cycles (looking for obstacles, publishing velocity) -vel_linear:=0.0 - linear velocity
  • vel_angular:=0.0 - angular velocity (for turtlebot3 positive angular = turn left)
  • index_obstacle:=0 - index for the obstacle value in the scan-array
  • dist_obstacle:=0.25 - distance for which an obstacle is registered
  • num_obstacle_med:=3 - number of scan-values to median over for obstacle detection

fit_fun_generic

A generic fitness function launch file (by default fit_fun_driver):

  • time_cycle:=0.01 - time inbetween controlling cycles in s
  • file_fit_exec:=fit_fun_driver.py - path to fitness executable / node
  • val_penalty:=0.1 - set base penalty (currently unused by fit_fun_driver and fit_fun_gate

fsm_ea - TODO remove when replaced

Run & manage the evolutionary algorithm for finite-state machines by using fsm_evolver (soon deprecated) with dist_ea:

  • file_ea_launch:=dist_ea.launch.py - path to ea launch-file
  • all dist_ea parameters (c.f. below)
  • enable_long_bool:=false - enables complex expressions for transitions (not implemented yet)
  • num_transitions_creation:=4 - number of transitions during creation
  • max_transitions:=10 - maximum number of transitions for FSM
  • max_transitions_same:=10 - maximum number of same node-to-node transitions
  • prob_crossover:=0.25 - probability for crossover
  • prob_mutation:=0.25 - probability for mutation
  • ratio_mutation_atomic:=0.5 - ratio of atomic to altering mutations (0.5 = 50/50 to do atomic / altering mutations)
  • mode_genome_repair:=random - mode for genome repairs (can be random and levels)

dist_ea

Launches dist_ea, the evolutionary algorithm manager:

  • time_update:=5 - update time interval for the evaluation node in s
  • time_service_timeout:=20 - timeout for service-calls
  • time_timeout:=90 - timeout for evaluations
  • num_generations:=100 - number of generations
  • num_population:=30 - number of individuals in population
  • num_tournament:=2 - number of selection tournament participants
  • ratio_offspring:=0.5 - ratio of offspring to population (0.5 = half population size)
  • num_re_eval:=3 - number of re-evaluations per generation

test_eval

Launches test_eval for testing a genome by calling fitness evaluation services: Has file_test for the genome file, file_targets for the evaluation targets file and uses it's namespace as the target namespace.

test_fit

Launches test_fit for testing a fitness function (node) by calling it's services: Has time_limit and time_factor (for RTF > 1) for determining how long to evaluate fitness. name_fit_fun determines the name of the node to use.

Nodes

TODO update to newer work

spawn_with_remap

Node for spawning a sdf-model with tf-remaps to the specified namespace (can be called with --help for more info).

fsm_eval

Node for starting, controlling and resetting the evaluation process for finite-state machines, also executes the controller model.

Services:

  • <space_gazebo>/fit_eval (FitEval): Start evaluation with a supplied mode (e.g. "fsm") and the genome as stringified json - returns a triple (fitness, status, message)
  • <space_gazebo>/fit_status (FitStatus): Return current triple (fitness, status, message)
  • <space_gazebo>/fit_reset (DataCall): Call reset of the robot with data string from eval_targets.json (for now only simulated works)

The controller model is executed via the driving_swarm_evolution.fsm module, evaluation of transition expressions for now with eval() (soon deprecated).

fit_fun_*

Nodes for evaluating fitness based on sensor data / other topics, provide start, stop and status service functions.

Services:

  • /start (SetBool): Start fitness evaluation (bool for resetting)
  • /status (FitFunStatus): Return fitness float
  • /stop (Empty): Stop evaluation (soon deprecated)

Implemented:

  • fit_fun_driver: Fitness based on laser scan and odometry data with f = distance - t_collision
  • fit_fun_gate: Fitness for a gate-passing situation based on laser scan, odometry data and gazebo position data with f = distance - t_collision + gate_bonus
    • gate_bonus is calculated as a linearly decreasing bonus, resetting on gate-passing

fsm_vars

Node that publishes variables for state machine transitions for fsm_eval, like aggregated distance values (like generic "front distance").

Publishers:

  • /fsm_vars (String): String with all information in json

Services:

  • /reset (Empty): Reset all variables to 0

basic_drive

Node that executes a behaviour definition for a combination of linear / angular velocities of a bot (publishes on /cmd_vel)

Services:

  • /start (DataCall): Start publishing on /cmd_vel and scanning for obstacles
  • /stop (DataCall): Stop publishing and scanning

Stops publishing if it detects an obstacle (on the set index for the scan-msg array).

dist_ea

Node that executes a distributed evolutionary algorithm, the genome is based on a template file, evaluation targets are read from a config file.

Services:

  • Planned for calling to run an evaluation

For now only script-style usage is supported (launch and execute automatically).

fsm_evolver

Node that provides service calls for creation, recombination and mutation of genomes, soon deprecated.

Parameters:

  • file_template - file path for genome template
  • file_expression - file path for expression template (what variables with boundaries)
  • Parameters for creation / reproduction (c.f. launch-files fsm_ea)

Services:

  • /gen_create (DataReqInt): Create int genomes, based on template and expressions
  • /gen_repro (DataExchange): Reproduce (recombine and mutate) the supplied, serialised json genomes and return

test_eval

Node for testing a backed-up genome, calling the respective fit_eval etc. services. Uses a file named test_candidate.json in driving_swarm_evolution/config/.

test_fit

Node for testing a fitness-function-node, calling the respective start etc. services. Based on a time_limit (with a time_factor for RTF > 1).

Worlds

  • arena_empty.world - an empty 4 x 3 m arena
  • arena_gate.world - a 4 x 3 m arena with a passable gate in the middle
  • *_test - "test"-versions of the former worlds, with higher accuracy
  • arena_box.world - a world with a floating box (actor-collision-showcase-world)
  • arena-door.world - a gate world with an opening and closing sliding door

Actor-Collisions-Plugin

For using the worlds with a moving actor you will have to build and install the ActorCollisions-Plugin (from here). This can be easily done using tools/build_and_copy_actorcollisionsplugin.sh.

Tools

  • plot_stats.py: Plots a fitness over generations graph for the generated <timestamp>.csv-files (by dist_ea) with python3 plot_stats.py --input <input>.csv --output <output>.png
  • plot_fsm.py: Plots a finite-state machine genome with the graphviz-language and the dot-tool with python3 plot_fsm.py --input <input>.json --output <output>.png
  • runrepeat.py: Repeatedly run commands from a formatted .toml-file (by default rr.toml) - for usage cf. here

Additional Reading

The bachelor's thesis about this package is a little outdated (and based on ROS1), but can be found here (or ask in the swarmlab).

Bugs

Performance-Based

  • Under too high system loads several bugs can occur:
    • Nodes never return service calls
    • rclpy-Executors have problems running consistently
    • Others.
    • Symptom: Targets for dist_ea get lost one by one
    • Solution: Run fewer num_worlds / reduce system load (up to 80% seems to work stable)