Skip to content

Latest commit

 

History

History
209 lines (156 loc) · 8.53 KB

README.md

File metadata and controls

209 lines (156 loc) · 8.53 KB

CMTouch Dataset

This repository contains datasets for cross-modal representation learning, used in developing rich touch representations in "Learning rich touch representations through cross-modal self-supervision" [1].

The datasets we provide are:

  1. CMTouch-Props
  2. CMTouch-YCB

The datasets consist of episodes collected by running a reinforcement learning agent on a simulated Shadow Dexterous Hand [2] interacting with different objects. From this interactions, observations from different sensory modalities are collected at each time step, including vision, proprioception (joint positions and velocities), touch, actions, object IDs. We used these data to learn rich touch representations using cross-modal self-supervision.

Bibtex

If you use one of these datasets in your work, please cite the reference paper as follows:

@InProceedings{zambelli20learning,
author = "Zambelli, Martina and Aytar, Yusuf and Visin, Francesco and Zhou, Yuxiang and Hadsell, Raia",
title = "Learning rich touch representations through cross-modal self-supervision",
year = "2020",
}

Descriptions

Experimental setup

We run experiments in simulation with MuJoCo [3] and we use the simulated Shadow Dexterous Hand [2], with five fingers and 24 degrees of freedom, actuated by 20 motors. In simulation, each fingertip has a spatial touch sensor attached with a spatial resolution of 4×4 and three channels: one for normal force and two for tangential forces. We simplify this by summing across the spatial dimensions, to obtain a single force vector for each fingertip representing one normal force and two tangential forces. The state consists of proprioception (joint positions and joint velocities) and touch.

Visual inputs are collected with a 64×64 resolution and are only used for representation learning, but are not provided as observations to control the robot’s actions. The action space is 20-dimensional. We use velocity control and a control rate of 30 Hz. Each episode has 200 time steps, which correspond to about 6 seconds. The environment consists of the Shadow Hand, facing down, and interacting with different objects. These objects have different shapes, sizes and physical properties (e.g. rigid or soft). We develop two versions of the task, the first using simple props and the second using YCB objects. In both cases, objects are fixed to their frame of reference, while their position and orientation are randomized.

CMTouch-Props

This is a dataset based on simple geometric 3D shapes (referred to as "props"). Props are simple 3D shaped objects that include cubes, spheres, cylinders and ellipsoid of different sizes. We also generated the soft version of each prop, which can deform under the pressure of the touching fingers.

Soft deformable objects are complex entities to simulate: they are defined through a composition of multiple bodies (capsules) that are tied together to form a shape, such as a cube or a sphere. The main characteristic of these objects is their elastic behaviour, that is they change shape when touched. The most difficult thing to simulate in this context is contacts, which grow exponentially with the increased number of colliding bodies.

Forty-eight different objects are generated by sampling from 6 different sizes, 4 different shapes (i.e. sphere, cylinder, cube, ellipsoid), and they can either be rigid or soft.

CMTouch-YCB

This is a dataset based on YCB objects. The YCB objects dataset [4] consists of everyday objects with different shapes, sizes, textures, weight and rigidity.

We chose a set of ten objects: cracker box, sugar box, mustard bottle, potted meat can, banana, pitcher base, bleach cleanser, mug, power drill, scissors. These are generated in simulation at their standard size, which is also proportionate to the default dimension of the simulated Shadow Hand.

The pose of each object is randomly selected among a set of 60 different poses, where we vary the orientation of the object. These variations make the identification of each object more complex than the CMTouch-Props and require a higher generalization capability from the learning method applied.

Download

The datasets can be downloaded from Google Cloud Storage. Each dataset is a single TFRecord file.

On Linux, to download a particular dataset, use the web interface, or run wget with the appropriate filename as follows:

wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_props_all_test.tfrecords
wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_props_all_train.tfrecords
wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_props_all_val.tfrecords
wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_ycb_all_test.tfrecords
wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_ycb_all_train.tfrecords
wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_ycb_all_val.tfrecords

Usage

After downloading the dataset files, you can read them as tf.data.Dataset instances with the readers provided. The example below shows how to read the cmtouch-props dataset:

record_file = 'test.tfrecords'
dataset = tf.data.TFRecordDataset(record_file)
parsed_dataset = dataset.map(_parse_tf_example)

(a complete example is provided in the Colab).

All dataset readers return the following set of observations:

'camera': tf.io.FixedLenFeature([], tf.string), 'camera/height': tf.io.FixedLenFeature([], tf.int64), 'camera/width': tf.io.FixedLenFeature([], tf.int64), 'camera/channel': tf.io.FixedLenFeature([], tf.int64), 'object_id': tf.io.FixedLenFeature([], tf.string), # for both 'object_id/dim': tf.io.FixedLenFeature([], tf.int64), 'orientation_id': tf.io.FixedLenFeature([], tf.string), # only for ycb 'orientation_id/dim': tf.io.FixedLenFeature([], tf.int64), 'shadowhand_motor/joints_vel': tf.io.FixedLenFeature([], tf.string), 'shadowhand_motor/joints_vel/dim': tf.io.FixedLenFeature([], tf.int64), 'shadowhand_motor/joints_pos': tf.io.FixedLenFeature([], tf.string), 'shadowhand_motor/joints_pos/dim': tf.io.FixedLenFeature([], tf.int64), 'shadowhand_motor/spatial_touch': tf.io.FixedLenFeature([], tf.string), 'shadowhand_motor/spatial_touch/dim': tf.io.FixedLenFeature([], tf.int64), 'actions'

  • 'camera': Tensor of shape [sequence_length, height, width, channels] and type uint8

  • 'shadowhand_motor/spatial_touch': Tensor of shape [sequence_length, num_fingers x 3] and type float32

  • 'shadowhand_motor/joints_pos': Tensor of shape [sequence_length, num_joint_positions] and type float32

  • 'shadowhand_motor/joints_vel': Tensor of shape [sequence_length, num_joint_velocities] and type float32

  • 'actions': Tensor of shape [sequence_length, num_actuated_joints] and type float32

  • 'object_id': Scalar indicating an object identification number

  • 'orientation_id': Scalar indicating a YCB object pose identification number (CMTouch-YCB only)

Few-shot evaluations can be made by creating subsets of data to train and evaluate the models.

References

[1] M. Zambelli, Y. Aytar, F. Visin, Y. Zhou, R. Hadsell. Learning rich touch representations through cross-modal self-supervision. Conference on Robot Learning (CoRL), 2020.

[2] ShadowRobot, Shadow Dexterous Hand. https://www.shadowrobot.com/products/dexterous-hand/.

[3] E. Todorov, T. Erez, and Y. Tassa. MuJoCo: A physics engine for model-based control. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2012.

[4] B. Calli, A. Singh, J. Bruce, A. Walsman, K. Konolige, S. Srinivasa, P. Abbeel, and A. M. Dollar. Yale-cmu-berkeley dataset for robotic manipulation research. The International Journal of RoboticsResearch, 36(3):261–268, 2017.

Disclaimers

This is not an official Google product.

Appendix and FAQ

Find this document incomplete? Leave a comment!