Neural encoding plays an important role in faithfully describing the temporally rich patterns, whose instances include human speech and environmental sounds. To classify such spatio-temporal patterns with the Spiking Neural Networks(SNNs),how these patterns are encoded has a direct impact on the complexity of the task. In this paper, we study several existing temporal and population coding schemes in speech and audio recognition. We show that, with population neural coding, the encoded patterns are linearly separable using the Support Vector Machine (SVM). We note that the population neural coding effectively project the temporal information into the spatial domain, thus improving linear separability of the patterns. We achieve an accuracy of 95% and 100% on TIDIGITS and RWCP datasets respectively with SVM classiﬁer. We further implement the Tempotron as an SNN-based classiﬁer on the same datasets and achieve similar results. The study suggests that an effective neural coding scheme is just as important as the classiﬁer.