Deep convolutional neural networks have emerged as strong candidates for a model of human vision, often outperforming competing models on both computer vision benchmarks and computational neuroscience benchmarks of neural response correspondence. The design of these models has undergone several refinements in recent years drawing on both statistical and cognitive insights and, in the process, shown increasing correspondence to primate visual processing representations. However, their training methodology still remains in contrast to the process of primate visual development, and we believe that it can benefit from being more aligned with this natural process. Primate visual development is characterized by low visual acuity and colour sensitivity as well as high plasticity and neuronal growth in the first year of infancy, prior to the development of specific visual-cognitive functions such as visual object recognition. In this work, we investigate the synergy between the gradual variation in the distribution of visual input and the concurrent growth of a statistical model of vision on the task of large-scale object classification, and discuss how it may yield better approaches to training deep convolutional neural networks. The experiments we performed across multiple object classification benchmarks indicate that a growing statistical model trained with a gradually varying visual input distribution converges to a better generalization at a faster rate than traditional, more static training setups.
This research is supported by A*STAR under its Human-Robot Collaborative AI for Advanced Manufacturing and Engineering (Award A18A2b0046) and the
National Research Foundation Singapore under its AI Singapore Programme (Award Number: AISG-RP-2019-010).