TactileAloha: Learning Bimanual Manipulation with Tactile Sensing

Abstract

Tactile texture is vital for robotic manipulation but challenging for camera vision-based observation. To address this, we propose TactileAloha, an integrated tactile-vision robotic system built upon Aloha, with a tactile sensor mounted on the gripper to capture fine-grained texture information and support real-time visualization during teleoperation, facilitating efficient data collection and manipulation. Using data collected from our integrated system, we encode tactile signals with a pre-trained ResNet and fuse them with visual and proprioceptive features. The combined observations are processed by a transformer-based policy with action chunking to predict future actions. We use a weighted loss function during training to emphasize near-future actions, and employ an improved temporal aggregation scheme at deployment to enhance action precision. Experimentally, we introduce two bimanual tasks: zip tie insertion and Velcro fastening, both requiring tactile sensing to perceive the object texture and align two object orientations by two hands. Our proposed method adaptively changes the generated manipulation sequence itself based on tactile sensing in a systematic manner. Results show that our system, leveraging tactile information, can handle texture-related tasks that camera vision-based methods fail to address. Moreover, our method achieves an average relative improvement of approximately 11.0% compared to state-of-the-art method with tactile input, demonstrating its performance.


Data Collection Process

Real-time Monitoring Interface

Fig.1. Real-time monitoring during teleoperation. The interface includes observations from three cameras and visualized tactile feedback from the GelSight sensor (top-right image). The timeline below shows the timestep progression during teleoperation data collection.

The teleoperator performs the task by visually observing the scene directly and by monitoring sensor outputs on the screen, especially the visualized tactile information. The next action is determined based on this combined perception. After completing the episode, the teleoperator decides whether it is successful. If it is, they press “C” to continue; otherwise, they press “R” to repeat the episode or “S” to stop. The operator uses a foot pad to facilitate this process.

Autonomous Bimanual Skills with Tactile Information

1. Zip Tie Insertion Task

Our Method with Tactile Information (Case 1: Success)

Our Method with Tactile Information (Case 2: Success)

2. Velcro Fastening Task

Our Method with Tactile Information (Case 1: Success)

Our Method with Tactile Information (Case 2: Success)


Failure Case: Other Methods & No Tactile Info.

Other methods using only Camera Vision (failed due to incorrect alignment)

Other Methods Using Only Camera Vision (Failed Due to the Incorrect Orientation of the Velcro; the Velcro Could Not Engage with Another Piece)

Failure Case: Our Methods are Combined with Co-training

Our Method with co-training technique (Failed Due to the Premature Closing of the Gripper.)

Ablation Videos

Ablation Results on Zip-Tie Insertion Task.