Distilling knowledge from Gaussian process teacher to neural network student

Page view(s)
25
Checked on Mar 11, 2025
Distilling knowledge from Gaussian process teacher to neural network student
Title:
Distilling knowledge from Gaussian process teacher to neural network student
Journal Title:
INTERSPEECH
Publication Date:
20 August 2023
Citation:
Wong, J. H. M., Zhang, H., & Chen, N. F. (2023). Distilling knowledge from Gaussian process teacher to neural network student. INTERSPEECH 2023. https://doi.org/10.21437/interspeech.2023-190
Abstract:
Neural Networks (NN) and Gaussian Processes (GP) are different modelling approaches. The former stores characteristics of the training data in its many parameters, and then performs inference by parsing inputs through these parameters. The latter instead performs inference by computing a similarity between the test and training inputs, and then predicts test outputs that are correlated with the reference training outputs of similar inputs. These models may be combined to leverage upon their diversity. However, both combination and the matrix computations for GP inference are expensive. This paper investigates whether a NN student is able to effectively learn from the information distilled from a GP or ensemble teacher. It is computationally cheaper to infer using this student. Experiments on the speechocean762 spoken language assessment dataset suggest that learning is effective.
License type:
Publisher Copyright
Funding Info:
This research is supported by core funding from: AI3 seed fund
Grant Reference no. : SC20/21-816400
Description:
ISBN:
10.21437/Interspeech.2023-190
Files uploaded:

File Size Format Action
wong23-interspeech-amended.pdf 326.63 KB PDF Open