R. Tong, L. Wang and B. Ma, "Transfer learning for children's speech recognition," 2017 International Conference on Asian Language Processing (IALP), Singapore, 2017, pp. 36-39. doi: 10.1109/IALP.2017.8300540
Children’s speech processing is more challenging than that of adults due to the lacking of large scale children’s speech corpora. With the developing of the physical speech organ, higher inter speaker and intra speaker variabilities are observed in children’s speech. On the other hand, data collection on children is difficult as children have limited language proficiency, and they usually have shorter attention span. In this paper, we aiming to improve children’s automatic speech recognition performance with transfer learning. We compare two transfer learning approaches in enhancing children’s ASR with adult’s data. The first method is to obtain children’s acoustic model by performing acoustic adaptation on the pre-trained adult model. The second method is multi-task learning, the adult and children’s acoustic characteristics are learnt jointly in the shared hidden layers, while the output layers are optimized with different targets. Our experiment results show that both approaches are effective in transferring rich phonetic and acoustic information from adult to children. The multi-task learning approach outperforms the acoustic adaptation approach. We further show that the transfer learning technique is also effective in transferring speaker’s acoustic characteristics from other languages.