Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

Page view(s)
5
Checked on Dec 23, 2024
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Title:
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Journal Title:
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Keywords:
Publication Date:
17 June 2024
Citation:
Xin Zhang, Jiawei Du, Yunsong Li, Weiying Xie, Joey Tianyi Zhou; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 26223-26232
Abstract:
Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original, full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples, often resulting in poor generalization across various pruning and cross-architecture scenarios. Recent studies have addressed this issue by expanding the scope of training dynamics considered, including factors such as forgetting event and probability change, typically using an averaging approach. However, these works struggle to integrate a broader range of training dynamics without overlooking well-generalized samples, which may not be sufficiently highlighted in an averaging manner. In this study, we propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS), to tackle this problem. TDDS utilizes a dual-depth strategy to achieve a balance between incorporating extensive training dynamics and identifying representative samples for dataset pruning. In the first depth, we estimate the series of each sample’s individual contributions spanning the training progress, ensuring comprehensive integration of training dynamics. In the second depth, we focus on the variability of the sample-wise contributions identified in the first depth to highlight well-generalized samples. Extensive experiments conducted on CIFAR and ImageNet datasets verify the superiority of TDDS over previous SOTA methods. Specifically on CIFAR-100, our method achieves 54.51% accuracy with only 10% training data, surpassing baselines methods by more than 12.69%.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the Agency for Science, Technology and Research - Career Development Fund
Grant Reference no. : C233312004

This research / project is supported by the Agency for Science, Technology and Research - Science and Engineering Research Council (SERC) - Central Research Fund (Use-inspired Basic Research)
Grant Reference no. :
Description:
© 2024 IEEE.  Personal use of this material is permitted.  Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
ISSN:
1063-6919
Files uploaded:

File Size Format Action
231113613v3.pdf 5.85 MB PDF Request a copy