Autonomous Driving Data Service (ADDS)

Overview

ADDS is a ROS based data service for replaying selected drive scenes from multimodal driving datasets. The multimodal dataset used with ADDS is typically gathered during the development of advanced driver assistance systems (ADAS), or autonomous driving systems (ADS), and comprises of 2D image, 3D point cloud and vehicle bus data.

One common reason for replaying drive data is to visualize the data. ADDS supports visualization of replayed data using any ROS visualization tool, for example Foxglove studio.

ADDS is supported on ROS 1 noetic, and ROS 2 humble. ADDS is pre-configured to use Audi Autonomous Driving Dataset (A2D2) and Ford Multi-AV Seasonal Dataset. ADDS can be extended to other autonomous driving datasets.

In the following sections, we describe the ADDS logical dataset design and runtime data services. This is followed by a step-by-step tutorial for building and using ADDS with A2D2 dataset. Finally, we discuss extending ADDS to other datasets.

Logical dataset design

Any multimodal dataset used with ADDS is assumed to contain drive data gathered from a homogeneous vehicle fleet. By homogeneous, we mean that all the vehicles in the fleet have the same vehicle sensor array configuration and vehicle bus data attributes. Each vehicle in the fleet can have distinct calibration data for the sensor array configuration.

Each ADDS runtime instance serves one multimodal dataset. To serve multiple datasets, you need corresponding number of ADDS runtime instances.

Each multimodal dataset is comprised of multimodal frame and tabular data.

Frame data

The serialized multimodal data acquired in the vehicle in some file-format, for example, MDF 4, MCAP, or Ros bag, must be decomposed into discreet timestamped 2D image and 3D point cloud data frames, and the frames must be stored in an Amazon S3 bucket under some bucket prefix.

The workflow for decomposing the serialized data and storing it in S3 is not prescribed by ADDS. For many public datasets, for example A2D2: Audi Autonomous Driving Dataset, and Ford Multi-AV Seasonal Dataset, the serialized data has already been decomposed into discreet data frames. However, for these types of public datasets, one may still need to extract the discreet data frames from compressed archives (e.g. Zip or Tar files), and upload them to the ADDS S3 bucket.

For the discreet data frames stored in the S3 bucket, we need a fast retrieval mechanism so we can replay the data frames on-demand. For that purpose, we build a data frame manifest and store it in an Amazon Redshift database table: This is described in detail in drive data. The manifest contains pointers to the data frames stored in S3 bucket. For the A2D2 dataset, the extraction and loading of the drive data is done automatically during the step-by-step tutorial.

The vehicle bus data is stored in an Amazon Redshift table and is described in vehicle bus data.

Tabular data

Each logical multimodal dataset must use a distinct Amazon Redshift named schema. Below, we describe the data definition language (DDL) for creating the required Amazon Redshift tables within a given logical dataset's schema_name.

For the A2D2 dataset, we use a2d2 as the Redshift schema name, and all the required tables are created automatically during the step-by-step tutorial.

Amazon Redshift does not impose Primary and Foreign key constraints. This point is especially important to understand so you can avoid duplication of data in Redshift tables when you extend ADDS to other datasets.

Vehicle data

Vehicle data is stored in schema_name.vehicle table. The DDL for the table is shown below::

CREATE TABLE IF NOT EXISTS schema_name.vehicle
(
	vehicleid VARCHAR(255) NOT NULL ENCODE lzo,
	description VARCHAR(255) ENCODE lzo,
	PRIMARY KEY (vehicleid)
)
DISTSTYLE ALL;

The vehicleid refers to the required unique vehicle identifier. The description is optional.

For the A2D2 dataset, the vehicle data is automatically loaded into the a2d2.vehicle table from vehicle.csv during the step-by-step tutorial.

Sensor data

Sensor data is stored in schema_name.sensor table. The DDL for the table is shown below::

CREATE TABLE IF NOT EXISTS schema_name.sensor
(
	sensorid VARCHAR(255) NOT NULL ENCODE lzo,
	description VARCHAR(255) ENCODE lzo,
	PRIMARY KEY (sensorid)
)
DISTSTYLE ALL;

Each sensorid must be unique in a dataset and must refer to a sensor in the homogeneous vehicle sensor array configuration. The description is optional.

The implicit sensorid value Bus is reserved and denotes vehicle bus: This implicit value is not stored in the schema_name.sensor table.

For the A2D2 dataset, the sensor data is automatically loaded into the a2d2.sensor table from sensors.csv during the step-by-step tutorial.

Drive data

Image and point cloud data frames must be stored in an Amazon S3 bucket. Pointers to the S3 data frames must be stored in the schema_name.drive_data table. The DDL for the table is shown below:

create table
schema_name.drive_data
(
	vehicle_id varchar(255) encode Text255 not NULL,
	scene_id varchar(255) encode Text255 not NULL,
	sensor_id varchar(255) encode Text255 not NULL,
	data_ts BIGINT not NULL sortkey,
	s3_bucket VARCHAR(255) encode lzo NOT NULL,
	s3_key varchar(255) encode lzo NOT NULL,
	primary key(vehicle_id, scene_id, sensor_id, data_ts),
	FOREIGN KEY(vehicle_id) references a2d2.vehicle(vehicleid),
	FOREIGN KEY(sensor_id) references a2d2.sensor(sensorid)
)
DISTSTYLE AUTO;

The scene_id is an arbitrary identifier for a unique drive scene.

The data_ts is the acquisition timestamp for a discreet data frame and is typically measured in international atomic time (TAI).

The s3_bucket is the name of the Amazon S3 bucket, and the s3_key is the key for the data frame stored in the S3 bucket.

For the A2D2 dataset, the drive data is automatically loaded into the a2d2.drive_data table during the step-by-step tutorial.

Vehicle bus data

To allow for the variation of the vehicle bus data across different datasets, the schema_name.bus_data table may define different columns across different datasets, but must have the same composite primary key, as detailed below. The DDL for the table is shown below, where the ellipsis (...) indicate dataset specific columns for storing the vehicle bus data.

create table schema_name.bus_data
(
	vehicle_id varchar(255) encode Text255 not NULL,
	scene_id varchar(255) encode Text255 not NULL,
	data_ts BIGINT not NULL sortkey,
	
	... 

	primary key(vehicle_id, scene_id, data_ts),
	FOREIGN KEY(vehicle_id) references schema_name.vehicle(vehicleid)
)
DISTSTYLE AUTO;

For example, the DDL for the a2d2.bus_data table is shown below:

CREATE TABLE IF NOT EXISTS a2d2.bus_data 
( 
	vehicle_id varchar(255) encode Text255 not NULL, 
	scene_id varchar(255) encode Text255 not NULL, 
	data_ts BIGINT not NULL sortkey, 
	acceleration_x FLOAT4 not NULL, 
	acceleration_y FLOAT4 not NULL, 
	acceleration_z FLOAT4 not NULL, 
	accelerator_pedal FLOAT4 not NULL, 
	accelerator_pedal_gradient_sign SMALLINT not NULL, 
	angular_velocity_omega_x FLOAT4 not NULL, 
	angular_velocity_omega_y FLOAT4 not NULL, 
	angular_velocity_omega_z FLOAT4 not NULL, 
	brake_pressure FLOAT4 not NULL, 
	distance_pulse_front_left FLOAT4 not NULL, 
	distance_pulse_front_right FLOAT4 not NULL, 
	distance_pulse_rear_left FLOAT4 not NULL, 
	distance_pulse_rear_right FLOAT4 not NULL, 
	latitude_degree FLOAT4 not NULL, 
	latitude_direction SMALLINT not NULL, 
	longitude_degree FLOAT4 not NULL, 
	longitude_direction SMALLINT not NULL, 
	pitch_angle FLOAT4 not NULL, 
	roll_angle FLOAT4 not NULL, 
	steering_angle_calculated FLOAT4 not NULL, 
	steering_angle_calculated_sign SMALLINT not NULL, 
	vehicle_speed FLOAT4 not NULL, 
	primary key(vehicle_id, scene_id, data_ts), 
	FOREIGN KEY(vehicle_id) references a2d2.vehicle(vehicleid) 
) DISTSTYLE AUTO;

For the A2D2 dataset, the vehicle bus data is automatically loaded into the a2d2.bus_data table during the step-by-step tutorial.

Runtime services

ADDS runtime services are deployed as Kubernetes Deployments in an Amazon EKS cluster. ADDS auto-scales using Horizontal Pod Autoscaler, and Cluster Autoscaler.

Concretely, there are two manifestations for the ADDS runtime services:

Rosbridge data service
Kafka data service

Rosbridge data service

Rosbridge data service uses rosbridge as the communication channel. The data client connects to the data service via the rosbridge web-socket. The data client sends the data request for sensor data on a pre-defined ROS topic, and the data service responds by publishing the requested sensor data on the requested ROS topics. If requested, the data service can serve the response data as ROS bags.

The data client for Rosbridge data service can be any ROS visualization tool that can communicate with rosbridge, for example, Foxglove Studio.

Kafka data service

Kafka data service uses Apache Kafka as the communication channel. The data client sends the data request for sensor data on a pre-defined Kafka topic. The request includes the name of a Kafka response topic. The data service stages the response ROS bag(s) in the ROS bag store and responds with each ROS bag location on the Kafka response topic.

The data client for Kafka data service is a standalone Python application that runs on the desktop and is used in conjunction with rviz visualization tool. The Python application plays back the response ROS bag files on the local ROS server on the desktop, and the rviz tool is used to visualize the playback.

Figure 1. High-level system architecture for the data service

The data service runtime uses Amazon Elastic Kubernetes Service (EKS). Raw sensor data store and ROS bag store can be configured to use Amazon S3, Amazon FSx for Lustre, or Amazon Elastic File System (EFS). Raw data manifest store uses Amazon Redshift Serverless. The data processing workflow for building and loading the raw data manifest uses AWS Batch with Amazon Fargate, AWS Step Functions, and Amazon Glue. Amazon Managed Streaming for Apache Kafka (MSK) provides the communication channel for the Kafka data service.

The tutorial below walks-through the Rosbridge service, and, optionally, the Kafka service. The Rosbridge service feature set is a super set of the Kafka service.

Data request for sensor data

Concretely, imagine the data client wants to request drive scene data from Audi Autonomous Driving Dataset (A2D2) for vehicle id a2d2, drive scene id 20190401145936, starting at timestamp 1554121593909500 (microseconds) , and stopping at timestamp 1554122334971448 (microseconds). The data client wants the response to include data only from the front-left camera in sensor_msgs/Image ROS data type, and the front-left lidar in sensor_msgs/PointCloud2 ROS data type. The data client wants the response data to be staged on Amazon FSx for Lustre file system, partitioned across multiple ROS bag files. Such a data request can be encoded in a JSON object, as shown below:

{
	"vehicle_id": "a2d2",
	"scene_id": "20190401145936",
	"sensor_id": ["lidar/front_left", "camera/front_left"],
	"start_ts": 1554121593909500, 
	"stop_ts": 1554122334971448,
	"ros_topic": {"lidar/front_left": "/a2d2/lidar/front_left", 
			"camera/front_left": "/a2d2/camera/front_left"},
	"data_type": {"lidar/front_left": "sensor_msgs/PointCloud2",
			"camera/front_left": "sensor_msgs/Image"},
	"step": 1000000,
	"accept": "fsx/multipart/rosbag",
	"preview": false,
	...
}

The sensor_id values are keys in ros_topic and data_type maps that map the sensors to ROS topics, and ROS data types, respectively. For a detailed description of each request field shown in the example above, see data request fields.

Step-by-step tutorial

Overview

In this tutorial, we use A2D2 autonomous driving dataset. The high-level outline of the tutorial is as follows:

Prerequisites
Configure data service
Build dataset
Run Rosbridge data service
Run Rosbridge data client

You may optionally run Kafka service:

Run Kafka data service
Run Kafka data client

Prerequisites

This tutorial assumes you have an AWS Account, and you have system administrator job function access to the AWS Management Console.

To get started:

Select your AWS Region. The AWS Regions supported by this project include, us-east-1, us-east-2, us-west-2, eu-west-1, eu-central-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, ap-northeast-2, and ap-south-1. The A2D2 dataset used in this tutorial is stored in eu-central-1.
If you do not already have an Amazon EC2 key pair, create a new Amazon EC2 key pair. You need the key pair name to specify the KeyName parameter when creating the AWS CloudFormation stack below.
You need an Amazon S3 bucket in your selected AWS region. If you don't have one, create a new Amazon S3 bucket in the selected AWS region. You use the S3 bucket name to specify the S3Bucket parameter in the stack. The bucket is used to store the A2D2 data.
Use the public internet address of your laptop as the base value for the CIDR to specify DesktopRemoteAccessCIDR parameter in the CloudFormation stack you create below.
For all passwords used in this tutorial, we recommend using strong passwords using the best-practices recommended for AWS root account user password.

Configure data service

Create AWS CloudFormation Stack

The AWS CloudFormation template cfn/mozart.yml in this repository creates AWS Identity and Access Management (IAM) resources, so when you create the CloudFormation Stack using the console, in the Review step, you must check I acknowledge that AWS CloudFormation might create IAM resources.

Create a new AWS CloudFormation stack using the cfn/mozart.yml template. The stack input parameters you must specify are described below:

Parameter Name	Parameter Description
KeyPairName	This is a required parameter whereby you select the Amazon EC2 key pair name used for SSH access to the desktop. You must have access to the selected key pair's private key to connect to your desktop.
RedshiftMasterUserPassword	This is a required parameter whereby you specify the Redshift database master user password.
DesktopRemoteAccessCIDR	This is a required parameter whereby you specify the public IP CIDR range from where you need remote access to your graphics desktop, e.g. 1.2.3.4/32, or 7.8.0.0/16.
DesktopInstanceType	This is a required parameter whereby you select an Amazon EC2 instance type for the ROS desktop. The default value, `g4dn.xlarge`, may not be available for your selected region, in which case, we recommend you try one of the other available instance types.
S3Bucket	This is a required parameter whereby you specify the name of the Amazon S3 bucket to store your data. The S3 bucket must already exist.

For all other stack input parameters, default values are recommended during first walkthrough. See complete list of all the template input parameters below.

Key AWS resources

The key resources in the CloudFormation stack are listed below:

A ROS desktop EC2 instance (default type g4dn.xlarge)
An Amazon EKS cluster with 2 managed node groups: system-nodegroup, and work-nodegroup. Both maanged node groups auto-scale as needed.
Amazon Redshift Serverless workgroup and namespace
An Amazon EFS file system

If you choose to run the optional Kafka data service and client, following additional resources are created:

An Amazon MSK cluster with 3 broker nodes (default type kafka.m5.large)
An Amazon Fsx for Lustre file system (default size 7,200 GiB)

Connect to the graphics desktop using SSH

Once the stack status in CloudFormation console is CREATE_COMPLETE, find the desktop instance launched in your stack in the Amazon EC2 console, and connect to the instance using SSH as user ubuntu, using your SSH key pair.
When you connect to the desktop using SSH, and you see the message "Cloud init in progress. Machine will REBOOT after cloud init is complete!!", disconnect and try later after about 20 minutes. The desktop installs the NICE DCV server on first-time startup, and reboots after the install is complete.
If you see the message NICE DCV server is enabled!, run the command sudo passwd ubuntu to set a new password for user ubuntu. Now you are ready to connect to the desktop using the NICE DCV client

Connect to the graphics desktop using NICE DCV Client

Download and install the NICE DCV client on your laptop.
Use the NICE DCV Client to login to the desktop as user ubuntu
When you first login to the desktop using the NICE DCV client, you are asked if you would like to upgrade the OS version. Do not upgrade the OS version.

Now you are ready to proceed with the following steps. For all the commands in this tutorial, we assume the working directory to be ~/amazon-eks-autonomous-driving-data-service on the graphics desktop.

Configure EKS cluster access

In this step, you will be prompted for AWS credentials for the IAM user you used to create the AWS CloudFormation stack, above. If you instead used an IAM role to create the stack, you must first manually setup the AWS credentials in the ~/.aws/credentials file with the following fields:

[default]
aws_access_key_id=
aws_secret_access_key=
aws_session_token=

The AWS credentials are used one-time to enable EKS cluster access from the ROS desktop, and are automatically removed at the end of this step. After setting up the credentials, in the working directory, run the command:

./scripts/configure-eks-auth.sh

At the successful execution of this command, you must see AWS Credentials Removed.

Setup developer environment

To setup the developer environment, in the working directory, run the command:

./scripts/setup-dev.sh

This step also builds and pushes the data service container image into Amazon ECR.

Build dataset

In this tutorial, we use A2D2 autonomous driving dataset dataset. This dataset is stored in compressed TAR format in aev-autonomous-driving-dataset S3 bucket in eu-central-1. We need to extract the A2D2 dataset into the S3 bucket for your stack, build the raw data manifest, and load the manifest into the raw data manifest store. To execute these steps, we use an AWS Step Functions state machine. To run the AWS Step Functions state machine, run the following command in the working directory:

./scripts/a2d2-etl-steps.sh

The time to complete this step depends on many variable factors, including the choice of your AWS region, and may take anywhere from 12 - 24 hours, or possibly longer. The AWS Region eu-central-1 takes the least amount of time for this step because the A2D2 data set is located in this region.

Note: If you have already run ./scripts/a2d2-etl-steps.sh before in another CloudFormation stack that uses the same Amazon S3 bucket as your current stack, you can complete this step in less than 30 minutes by running the following script, instead:

./scripts/a2d2-etl-steps-skip-raw.sh

Note the executionArn of the state machine execution in the output of the previous command. To check the status the status of the execution, use following command, replacing executionArn below with your value:

aws stepfunctions describe-execution --execution-arn executionArn

Run Rosbridge data service

To deploy the a2d2-rosbridge data service, run the following command in the working directory:

helm install --debug a2d2-rosbridge ./a2d2/charts/a2d2-rosbridge/

To verify that the a2d2-rosbridge deployment is running, run the command:

kubectl get pods -n a2d2

This service provides a Kubernetes service for data client connection. To find the DNS endpoint for the service, run the command:

kubectl get svc -n a2d2

The service takes approximately 5 minutes to be ready after it is started, so you may not be able to connect to the service right away.

Run Rosbridge data client

To publish data requests and visualize the response data, open Foxglove Studio on the desktop client, and sign-in using your Foxglove Studio sign-up credentials. Connect Foxglove Studio to your Rosbridge service. In Foxglove Studio, import example layout file a2d2/config/rosbridge/foxglove/a2d2-ex1.json. Publish the data request, and wait for approximately 60 seconds to visualize the response.

The data request is published on the pre-defined ROS topic /mozart/data_request. Notice the accept field is set to rosmsg, which means the data for each requested sensor is directly published on its mapped ROS topic specified in the ros_topic map field. More examples can be found under a2d2/config/rosbridge/foxglove/ folder.

You can exercise control on a running data request by publishing ROS messages on the pre-defined ROS topic /mozart/data_request/control. For example, to pause the request, you can publish:

{ "data": "{ \"command\": \"pause\" }" }

To resume the request, you can publish:

{ "data": "{ \"command\": \"play\" }" }

To stop the request, you can publish:

{ "data": "{ \"command\": \"stop\" }" }

When you are done with the Rosbridge data service, stop it by executing the command:

helm uninstall a2d2-rosbridge

(Optional) Run Kafka data service

Update the CloudFormation stack to set the parameter DataClientType to KafkaAndRosBridge, and FsxForLustre to enabled. After CloudFormation update is completed, run the following command in the working directory:

./scripts/setup-dev.sh

To deploy the a2d2-data-service Kafka service, run the following command in the working directory:

helm install --debug a2d2-data-service ./a2d2/charts/a2d2-data-service/

To verify that the a2d2-data-service deployment is running, run the command:

kubectl get pods -n a2d2

The data service can be configured to use S3, FSx for Lustre, or EFS (see Preload A2D2 data from S3 to EFS ) as the raw sensor data store. The default raw data store is fsx, if FSx for Lustre is enabled (see FSxForLustre parameter), else it is s3.

Below is the Helm chart configuration for various raw data store options, with recommended Kubernetes resource requests for pod memory and cpu. This configuration is used in a2d2/charts/a2d2-data-service/values.yaml:

Data source input	`values.yaml` Configuration
`fsx` (default)	`a2d2.requests.memory: "72Gi"` `a2d2.requests.cpu: "8000m"` `configMap.data_store.input: "fsx"`
`efs`	`a2d2.requests.memory: "32Gi"` `a2d2.requests.cpu: "1000m"` `configMap.data_store.input: "efs"`
`s3`	`a2d2.requests.memory: "8Gi"` `a2d2.requests.cpu: "1000m"` `configMap.data_store.input: "s3"`

For matching data staging options in data client request, see request.accept field in data request fields.

(Optional) Run Kafka data client

To visualize the response data, we use rviz2 tool on the graphics desktop. Open a terminal on the desktop, and run rviz2 (rviz for ROS 1).

In the rviz2 tool, use File>Open Config to select /home/ubuntu/amazon-eks-autonomous-driving-data-service/a2d2/config/rviz2/a2d2.rviz as the rviz configuration. You should see rviz2 tool configured with two windows for visualizing response data: image data on the left, and point cloud data on the right. This rviz2 configuration is specific to the examples we run below.

To run the Kafka data client with an example data request, run the following command in the working directory:

python ./a2d2/src/data_client.py --config ./a2d2/config/c-config-ex1.json

After a brief delay, you should be able to preview the response data in the rviz2 tool.

To preview data from a different drive scene, execute:

python ./a2d2/src/data_client.py --config ./a2d2/config/c-config-ex2.json

You can set "preview": false in the data client config files, and run the above commands again to view the complete response.

The data client exits automatically at the end of each successful run. You can use CTRL+C to exit the data client manually.

When you are done with the Kafka data service, stop it by executing the command:

helm uninstall a2d2-data-service

Preload A2D2 data from S3 to EFS

This step can be executed anytime after Configure data service. and is required only if you plan to configure the data service to use EFS as the raw data store, otherwise, it may be safely skipped. Execute following command to start preloading data from your S3 bucket to the EFS file system:

kubectl apply -n a2d2 -f a2d2/efs/stage-data-a2d2.yaml

To check if the step is complete, execute:

kubectl get pods stage-efs-a2d2 -n a2d2

If the pod is still Running, the step has not yet completed. This step takes approximately 6.5 hours to complete.

Extending ADDS to other datasets

This section describes how to extend ADDS to work with datasets other than a2d2. We recommend reading the entire section before executing any of the steps in this section.

First, we must select a dataset name for the dataset you wish to add to ADDS. The dataset name should start with a letter, be all lowercase, and only contain alphanumeric characters. For the purposes of this documentation, we assume you want to add a dataset named ds1. To add a new dataset to ADDS, for example dataset ds1, start by executing following command:

./scripts/add-dataset.sh ds1

The above command copies the a2d2 folder to ds1 folder, and customizes the files in the ds1 folder to the extent automatically possible. This command also copies a2d2_ros_util.py to adds/src/ds1_ros_util.py, ready for your customization. You must complete the customization of the new dataset following the steps below:

Identify vehicle bus data attributes
Create Redshift schema and tables
Load vehicle and sensor data
Extract and upload vehicle drive and bus data
Define vehicle bus ROS message
Extend RosUtil
Specify calibration data path
Customize data client configuration files
Apply tutorial steps to your dataset

Identify vehicle bus data attributes

To extend ADDS to other datasets, you must identify the vehicle bus data attributes for your vehicle fleet. You will need this information to define the vehicle bus data table columns, and for defining a new custom ROS message for your vehicle bus data.

Create Redshift schema and tables

Create a Redshift schema for your dataset. Choose the dataset Redshift schema name the same as the dataset name, for example, ds1: This is not a hard requirement, but this will make customization for your dataset much simpler.

Create the Redshift tables ds1.vehicle, ds1.sensor, and ds1.drive_data, using the DDL files in the folder ds1/ddl/.

Recall, when creating the DDL for vehicle bus data table, you must define the table columns corresponding to the specific attributes in your vehicle bus data, while maintaining the prescribed primary key. Therefore, modify ds1/ddl/bus_data.ddl for your vehicle bus data, and create ds1.bus_data table.

Load vehicle and sensor data

Load data into the ds1.vehicle and ds1.sensor tables.

See vehicle.csv and sensors.csv for A2D2 vehicle and sensor data, but note that your data may be be different, and will depend on your vehicle and sensor identifiers. It is recommended that you use user-friendly names to identify the vehicles and sensors, since they appear in each data request.

Extract and load vehicle drive and bus data

This is the step where you decompose the serialized data acquired in the vehicle into discreet timestamped 2D image and 3D point cloud data frames, and upload the data frames into the ADDS S3 bucket. You must also build a manifest for the data frames, and upload the manifest to the drive data table.

You may want to use an automated workflow to implement this step. In AWS, you have the option of using AWS Step Functions, or Amazon Managed Workflows for Apache Airflow (MWAA) for orchestrating the workflow. You may also find AWS Batch useful in implementing various steps in the workflow. For example, for the A2D2 dataset, we use AWS Step Functions and AWS Batch to extract and upload the drive data and the vehicle bus data.

You may want to combine the steps Create Redshift schema and tables, Load vehicle and sensor data and this step in a single script (see, for example, scripts/a2d2-etl-steps.sh used for A2D2 dataset).

Define vehicle bus ROS message

Next, define a custom ROS 2 message for encapsulating your vehicle bus data. You may create a custom ROS 1 message, if you plan to use ROS 1.

For example, for A2D2 dataset, the custom vehicle bus ROS 2 message, a2d2_msgs/Bus, is defined in adds/colcon_ws/src/a2d2_msgs/, and for ROS 1 in adds/catkin_ws/src/a2d2_msgs/.

Keeping with our assumed dataset name, you could define the custom vehicle bus ROS 2 message in adds/colcon_ws/src/ds1_msgs/, and for ROS 1 in adds/catkin_ws/src/ds1_msgs/.

Extend `RosUtil`

In this step, you must extend the RosUtil abc class to implement the abstract methods. For example, for the A2D2 dataset, we implement the Python class a2d2_ros_util.DatasetRosUtil to extend the abstract RosUtil class.

Keeping our assumed dataset name, modify adds/src/ds1_ros_util.py to implement the RosUtil abstract class for ds1 dataset. Note, the extended class you implement must be placed under adds/src.

Specify calibration data path

Next, we need to customize the Helm charts used to run ADDS with your dataset. Most of the customization has already been done automatically. You only need to customize the values in calibration fields in ds1/charts/ds1-data-service/values.yaml and ds1/charts/ds1-rosbridge/values.yaml files. For example, A2D2 dataset uses following calibration fields:

configMap:
{
	...

	"calibration": {
		"cal_bucket": "",
		"cal_key": "a2d2/cams_lidars.json"
	}
}

Leave the calibration.cal_bucket as shown above.

The cal_key must point to the bucket prefix where your vehicle calibration data is stored. The cal_key may point to a calibration file, as in the example above, or to an S3 bucket folder: This is dependent on how you store your vehicle calibration data. The python class implemented in Extend RosUtil uses the calibration data to implement its abstract methods.

Customize data client configuration files

You must customize the "requests" in the data client JSON configuration files under ds1/config to work with your dataset. We will explain this customization using the example of c-config-ex1.json, which makes a data request for a2d2 data:

"requests": [{
	"kafka_topic": "a2d2", 
	"vehicle_id": "a2d2",
	"scene_id": "20190401121727",
	"sensor_id": ["bus", "lidar/front_left", "camera/front_left"],
	"start_ts": 1554115465612291, 
	"stop_ts": 1554115765612291,
	"ros_topic": {"bus": "/a2d2/bus", "lidar/front_left": "/a2d2/lidar/front_left", 
			"camera/front_left": "/a2d2/camera/front_left"},
	"data_type": {"bus": "a2d2_msgs/Bus", "lidar/front_left": "sensor_msgs/PointCloud2",
			"camera/front_left": "sensor_msgs/Image"},
	"step": 1000000,
	"accept": "fsx/multipart/rosbag",
	"preview": true
}]

All the fields above need to be customized for your dataset. For example, your kafka_topic value will be ds1. Your vehicle_id will depend on the values in the ds1.vehicle table. your sensor_id values (except for the implicit value bus, which is always the same for all datasets) will depend on the values in the ds1.sensor table. The Ros message data type for your bus data will be different. Your scene_id will be different. Your start_ts and stop_ts values will be different. Your ros_topic values may be different. You will need to customize all these values so you can request data from ds1 dataset.

Apply tutorial steps to your dataset

Next, walk-through the step-by-step tutorial, but starting with the step Setup developer environment. You will need to make following changes to the tutorial steps, so you can use ADDS with ds1 dataset:

Instead of a2d2, use ds1.
In Build dataset step, you will need to execute your ETL script instead of scripts/a2d2-etl-steps.sh so you can launch your workflow to upload your data into S3 and Redshift tables.

Deleting the AWS CloudFormation stack

When you no longer need the ADDS data service, you may delete the AWS CloudFormation stack from the AWS CloudFormation console. Deleting the CloudFormation stack deletes all the resources in the stack (including FSx for Lustre and EFS), except for the Amazon S3 bucket.

Reference

Data client request fields

Below, we explain the semantics of the various fields in the data client request JSON object.

Request field name	Request field description
`servers`	The `servers` identify the AWS MSK Kafka cluster endpoint.
`delay`	The `delay` specifies the delay in seconds that the data client delays sending the request. Default value is `0`.
`use_time`	(Optional) The `use_time` specifies whether to use the `received` time, or `header` time when playing back the received messages. Default value is `received`.
`requests`	The JSON document sent by the client to the data service must include an array of one or more data `request` objects.
`request.kafka_topic`	For Kafka data service only. The `kafka_topic` specifies the Kafka topic on which the data request is sent from the client to the Kafka data service.
`request.vehicle_id`	The `vehicle_id` is used to identify the relevant drive scene dataset.
`request.scene_id`	The `scene_id` identifies the drive scene of interest, which in this example is `20190401145936`, which in this example is a string representing the date and time of the drive scene, but in general could be any unique value.
`request.start_ts`	The `start_ts` (microseconds) specifies the start timestamp for the drive scene data request.
`request.stop_ts`	The `stop_ts` (microseconds) specifies the stop timestamp for the drive scene data request.
`request.ros_topic`	The `ros_topic` is a map from `sensor ids` in the vehicle to ROS topics.
`request.data_type`	The `data_type` is a map from `sensor ids` to ROS data types.
`request.step`	The `step` is the discreet time interval (microseconds) used to discretize the timespan between `start_ts` and `stop_ts`. If `request.accept` value contains `multipart`, the data service responds with a ROS bag for each discreet `step`: See possible values below.
`request.accept`	The `accept` specifies the response data staging format acceptable to the client: See possible values below.
`request.image`	(Optional) The value `undistorted` undistorts the camera image. Undistoring an image slows down the image frame rate. Default value is `original` distorted image.
`request.lidar_view`	(Optional) The value `vehicle` transforms lidar points to `vehicle` frame of reference view. Default value is `camera`.
`request.preview`	If the `preview` field is set to `true`, the data service returns requested data over a single time `step` starting from `start_ts` , and ignores the `stop_ts`.
`request.no_playback`	(Optional) This only applies to Kafka data client. If the `no_playback` field is set to `true`, the data client does not playback the response ROS bags. Default value is `false`.
`request.storage_id`	For ROS2 only. The storage id of `rosbag2` storage plugin. The default value is `mcap`. (See rosbag2_storage_mcap )
`request.storage_preset_profile`	For ROS2 only. The storage preset profile of `rosbag2` storage plugin. The default value is `zstd_fast`. (See rosbag2_storage_mcap )

Possible `request.accept` field values

Rosbridge data service publishes response data on ROS topic /mozart/data_response for all values of request.accept shown below, except rosmsg.

`request.accept` value	Description
`rosmsg`	For Rosbridge data service only. Publish response data on the requested ROS topics.
`fsx/multipart/rosbag`	Stage response data on Amazon FSx for Lustre in multiple ROS bags.
`efs/multipart/rosbag`	Stage response data on Amazon EFS in multiple ROS bags.
`s3/multipart/rosbag`	Stage response data on Amazon S3 in multiple ROS bags.
`fsx/singlepart/rosbag`	Stage response data on Amazon FSx for Lustre in a single ROS bag.
`efs/singlepart/rosbag`	Stage response data on Amazon EFS in a single ROS bag.
`s3/singlepart/rosbag`	Stage response data on Amazon S3 in a single ROS bag.
`manifest`	Respond with a manifest of S3 paths to raw data.

AWS CloudFormation template input parameters

This repository provides an AWS CloudFormation template that is used to create the required stack.

Below, we describe the AWS CloudFormation template input parameters. Desktop below refers to the NICE DCV enabled high-performance graphics desktop that acts as the data service client in this tutorial.

Parameter Name	Parameter Description
DesktopInstanceType	This is a required parameter whereby you select an Amazon EC2 instance type for the desktop running in AWS cloud. Default value is `g4dn.xlarge`.
DesktopEbsVolumeSize	This is a required parameter whereby you specify the size of the root EBS volume (default size is 200 GB) on the desktop. Typically, the default size is sufficient.
DesktopEbsVolumeType	This is a required parameter whereby you select the EBS volume type (default is gp3).
DesktopHasPublicIpAddress	This is a required parameter whereby you select whether a Public Ip Address be associated with the Desktop. Default value is `true`.
DesktopRemoteAccessCIDR	This parameter specifies the public IP CIDR range from where you need remote access to your client desktop, e.g. 1.2.3.4/32, or 7.8.0.0/16.
DesktopType	This parameter specifies support for `Graphical` desktop with NICE-DCV server enabled, or `Headless` desktop with NICE-DCV server disabled. Default value is `Graphical`.
DataClientType	This parameter specifies support for `RosBridge`, and `KafkaAndRosBridge`. Default value is `RosBridge`.
EKSEncryptSecrets	This is a required parameter whereby you select if encryption of EKS secrets is `Enabled`. Default value is `Enabled`.
EKSEncryptSecretsKmsKeyArn	This is an optional advanced parameter whereby you specify the AWS KMS key ARN that is used to encrypt EKS secrets. Leave blank to create a new KMS key.
EKSNodeVolumeSizeGiB	This is a required parameter whereby you specify EKS Node group instance EBS volume size. Default value is 200 GiB.
EKSSystemNodeGroupCapacityType	This is a required parameter whereby you specify EKS system node group capacity type: `SPOT`, or `ON_DEMAND`. Default value is `SPOT`
EKSSystemNodeGroupInstanceType	This is a required parameter whereby you specify EKS system node group instance types as a comma separated list. Default value is `"t3a.small,t3a.medium,t3a.large,m5a.large,m7a.large"`
EKSWorkNodeGroupCapacityType	This is a required parameter whereby you specify EKS work node group capacity type: `SPOT`, or `ON_DEMAND`. Default value is `SPOT`
EKSWorkNodeGroupInstanceType	This is a required parameter whereby you specify EKS work node group instance types as a comma separated list. Default value is `"m5a.8xlarge,m5.8xlarge,m5n.8xlarge,m7a.8xlarge,r5n.8xlarge"`
EKSWorkNodeGroupMaxSize	This is a required parameter whereby you specify EKS work node group maximum size. Default value is 16 nodes. Cluster auto-scaler scales this node group as needed.
FargateComputeType	This is a required parameter whereby you specify Fargate compute environment type. Allowed values are `FARGATE_SPOT` and `FARGATE`. Default value is `FARGATE_SPOT`.
FargateComputeMax	This is a required parameter whereby you specify maximum size of Fargate compute environment in vCpus. Default value is `1024`.
FSxForLustre	This is a required parameter whereby you specify whether FSx for Lustre is `enabled`, or `disabled`. Default value is `disabled`.
FSxStorageCapacityGiB	This is a required parameter whereby you specify the FSx Storage capacity, which must be in multiples of `2400 GiB`. Default value is `7200 GiB`.
FSxS3ImportPrefix	This is an optional advanced parameter whereby you specify FSx S3 bucket path prefix for importing data from S3 bucket. Leave blank to import the complete bucket.
KeyPairName	This is a required parameter whereby you select the Amazon EC2 key pair name used for SSH access to the desktop. You must have access to the selected key pair's private key to connect to your desktop.
KubectlVersion	This is a required parameter whereby you specify EKS `kubectl` version. Default value is `1.28.3/2023-11-14`.
KubernetesVersion	This is a required parameter whereby you specify EKS cluster version. Default value is `1.28`.
MSKBrokerNodeType	This is a required parameter whereby you specify the type of node to be provisioned for AWS MSK Broker.
MSKNumberOfNodes	This is a required parameter whereby you specify the number of MSK Broker nodes, which must be >= 2.
PrivateSubnet1CIDR	This is a required parameter whereby you specify the Private Subnet1 CIDR in Vpc CIDR. Default value is `172.30.64.0/18`.
PrivateSubnet2CIDR	This is a required parameter whereby you specify the Private Subnet2 CIDR in Vpc CIDR. Default value is `172.30.128.0/18`.
PrivateSubnet3CIDR	This is a required parameter whereby you specify the Private Subnet3 CIDR in Vpc CIDR. Default value is `172.30.192.0/18`.
PublicSubnet1CIDR	This is a required parameter whereby you specify the Public Subnet1 CIDR in Vpc CIDR. Default value is `172.30.0.0/24`.
PublicSubnet2CIDR	This is a required parameter whereby you specify the Public Subnet2 CIDR in Vpc CIDR. Default value is `172.30.1.0/24`.
PublicSubnet3CIDR	This is a required parameter whereby you specify the Public Subnet3 CIDR in Vpc CIDR. Default value is `172.30.2.0/24`.
RedshiftNamespace	This is a required parameter whereby you specify the Redshift Serverless namespace. Default value is `mozart`.
RedshiftWorkgroup	This is a required parameter whereby you specify the Redshift Serverless workgroup. Default value is `mozart`.
RedshiftServerlessBaseCapacity	This is a required parameter whereby you specify the Redshift Serverless base capacity in DPUs. Default value is `128`.
RedshiftDatabaseName	This is a required parameter whereby you specify the name of the Redshift database. Default value is `mozart`.
RedshiftMasterUsername	This is a required parameter whereby you specify the name Redshift Master user name. Default value is `admin`.
RedshiftMasterUserPassword	This is a required parameter whereby you specify the name Redshift Master user password.
RosVersion	This is a required parameter whereby you specify the version of ROS. The supported versions are `melodic` on Ubuntu Bionic, `noetic` on Ubuntu Focal, and `humble` on Ubuntu Jammy. Default value is `humble`.
S3Bucket	This is a required parameter whereby you specify the name of the Amazon S3 bucket to store your data.
UbuntuAMI	This is an optional advanced parameter whereby you specify Ubuntu AMI (18.04 or 20.04).
VpcCIDR	This is a required parameter whereby you specify the Amazon VPC CIDR for the VPC created in the stack. Default value is 172.30.0.0/16. If you change this value, all the subnet parameters above may need to be set, as well.

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
a2d2		a2d2
adds		adds
cfn		cfn
fordav		fordav
images		images
scripts		scripts
README.md		README.md

aws-samples/amazon-eks-autonomous-driving-data-service

Folders and files

Latest commit

History

Repository files navigation

Autonomous Driving Data Service (ADDS)

Overview

Logical dataset design

Frame data

Tabular data

Vehicle data

Sensor data

Drive data

Vehicle bus data

Runtime services

Rosbridge data service

Kafka data service

Data request for sensor data

Step-by-step tutorial

Overview

Prerequisites

Configure data service

Create AWS CloudFormation Stack

Key AWS resources

Connect to the graphics desktop using SSH

Connect to the graphics desktop using NICE DCV Client

Configure EKS cluster access

Setup developer environment

Build dataset

Run Rosbridge data service

Run Rosbridge data client

(Optional) Run Kafka data service

(Optional) Run Kafka data client

Preload A2D2 data from S3 to EFS

Extending ADDS to other datasets

Identify vehicle bus data attributes

Create Redshift schema and tables

Load vehicle and sensor data

Extract and load vehicle drive and bus data

Define vehicle bus ROS message

Extend RosUtil

Specify calibration data path

Customize data client configuration files

Apply tutorial steps to your dataset

Deleting the AWS CloudFormation stack

Reference

Data client request fields

Possible request.accept field values

AWS CloudFormation template input parameters

About

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Extend `RosUtil`

Possible `request.accept` field values

Packages