Acar, C., Binici, K., Tekirdağ, A., & Wu, Y. (2024). Visual-Policy Learning Through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks. IEEE Robotics and Automation Letters, 9(1), 691–698. https://doi.org/10.1109/lra.2023.3336245
Abstract:
The use of multi-camera views simultaneously has been shown to improve the generalization capabilities and performance of visual policies. However, using multiple cameras in real-world scenarios can be challenging. In this study, we present a novel approach to enhance the generalization performance of vision-based Reinforcement Learning (RL) algorithms for robotic manipulation tasks. Our proposed method involves utilizing a technique known as knowledge distillation, in which a “teacher” policy, pre-trained with multiple camera viewpoints, guides a “student” policy in learning from a single camera viewpoint. To enhance the student policy's robustness against camera location perturbations, it is trained using data augmentation and extreme viewpoint changes. As a result, the student policy learns robust visual features that allow it to locate the object of interest accurately and consistently, regardless of the camera viewpoint. The efficacy and efficiency of the proposed method were evaluated in both simulation and real-world environments. The results demonstrate that the single-view visual student policy can successfully learn to grasp and lift a challenging object, which was not possible with a single-view policy alone. Furthermore, the student policy demonstrates zero-shot transfer capability, where it can successfully grasp and lift objects in real-world scenarios for unseen visual configurations.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the A*STAR - Innovation and Enterprise 2020 Plan (Advanced Manufacturing and Engineering domain)
Grant Reference no. : A19E4a0101
Description:
2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.