Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation?

Page view(s)
24
Checked on Sep 09, 2025
Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation?
Title:
Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation?
Journal Title:
2024 Conference on Neural Information Processing Systems
DOI:
Keywords:
Publication Date:
10 December 2024
Citation:
Xiao, L. and He, Y., Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation?. In The Thirty-eighth Annual Conference on Neural Information Processing Systems.
Abstract:
In ImageNet-condensation, the storage for auxiliary soft labels exceeds that of the condensed dataset by over 30 times. However, are large-scale soft labels necessary for large-scale dataset distillation? In this paper, we first discover that the high within-class similarity in condensed datasets necessitates the use of large-scale soft labels. This high within-class similarity can be attributed to the fact that previous methods use samples from different classes to construct a single batch for batch normalization (BN) matching. To reduce the within-class similarity, we introduce class-wise supervision during the image synthesizing process by batching the samples within classes, instead of across classes. As a result, we can increase within-class diversity and reduce the size of required soft labels. A key benefit of improved image diversity is that soft label compression can be achieved through simple random pruning, eliminating the need for complex rule-based strategies. Experiments validate our discoveries. For example, when condensing ImageNet-1K to 200 images per class, our approach compresses the required soft labels from 113 GB to 2.8 GB (40X compression) with a 2.6% performance gain. Code is available at: https://github.com/he-y/soft-label-pruning-for-dataset-distillation
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the National Research Foundation, Singapore, and Maritime and Port Authority of Singapore / Singapore Maritime Institute - Maritime Transformation Programme (Maritime AI Research Programme)
Grant Reference no. : SMI-2022-MTP-06

This research / project is supported by the Agency for Science, Technology and Research (A*STAR) - Career Development Fund (CDF)
Grant Reference no. : C233312004
Description:
ISSN:
1049-5258
Files uploaded:

File Size Format Action
934-are-large-scale-soft-label.pdf 32.13 MB PDF Open