Collaborative robot (cobot) is designed to be deployed to different tasks flexibly. For a new task, it is necessary to train the cobot to detect and recognize novel objects. Using dominant object detector based on Faster R-CNN, a user has to train it using a large number of manually annotated samples, which is inefficient and expensive. In this paper, we propose a self-teaching strategy for a cobot to learn to recognize novel objects efficiently and effectively. Like human-to-human teaching, the user just provides a few examples of a novel object captured by an RGB-D camera. The cobot obtains the ground truth annotation of the object automatically through depth segmentation. To achieve robust performance of object detection in real-world scenes, it generates augmented training samples by virtually placing the object in various backgrounds with changing scales and orientations (2D augmentation), and variations of viewpoints through projective transformation (3D augmentation). A state-of-the-art Faster R- CNN is re-trained and evaluated on real-world scenarios for a task of gearbox assembly. The comparison with conventional training approaches shows the superiority of the proposed approach in terms of efficiency and robustness for novel object detection.