This project presents a novel approach to robotic grasping in cluttered environments, combining 3D vision with deep learning. The system integrates Mask R-CNN, 3D Convolutional Neural Networks (CNNs), and a UR5 robotic arm to identify, grasp, and manipulate objects accurately and adaptively.
- Python 3.x
- NumPy
- OpenCV
- Matplotlib
- psutil
- Open3D (optional for point cloud visualization)
The dataset includes RGB-D images of ten objects: apples, bananas, Pringles cans, Cheez-It crackers, coffee cups, Coke cans, chocolate Jello boxes, mustard bottles, pears, and strawberries.
detection.py
: Handles image resizing, object detection, and masking.point_cloud.py
: Creates point clouds from masked images and depth maps (Requires Open3D).voxel2.py
,voxel_grid_augmentation2.py
: Manages voxel grid creation and augmentation.dir_pred.py
,mode_pred.py
: Predicts grasping directionality and modes using 3D CNNs.imagecapture.py
(not included in the provided script): Captures RGB-D images (optional).
runCode(image, depth)
: Main function to process images and depth maps for object detection, masking, and point cloud creation.MatplotlibClearMemory()
: Clears memory used by Matplotlib figures.
- Import necessary modules.
- Load RGB and depth images of the target object.
- Call
runCode(image, depth)
to process the images and obtain detection and point cloud outputs. - (Optional) Visualize point clouds using Open3D.
- Use the outputs for further analysis or robotic arm control.
- The project includes exploration of different network structures and learning rates for the CNN models.
- Grasping modes include pinch grasp and hold actions for different object types.
- Future work includes expanding the dataset, refining models, and exploring neural network-driven haptic sensors.
Refer to the thesis report for detailed methodology, experimental setup, and references.