Cao, B., Li, X., He, T., Wang, B., Zhou, S., Wu, X., & Zhang, Q. (2026). Learning Structurally Stabilized Representations for Lossless DNA Storage. Proceedings of the AAAI Conference on Artificial Intelligence, 40(1), 39-47. https://doi.org/10.1609/aaai.v40i1.36962
Abstract:
This paper presents Reed-Solomon coded single-stranded
representation learning (RSRL), a novel end-to-end model
for learning representations for lossless DNA data storage.
In contrast to existing learning-based methods, RSRL is inspired
by both error-correction codec and structural biology.
Specifically, RSRL first learns the representations for the
subsequent storage from the binary data transformed by the
Reed-Solomon codec (RS code). Then, the representations
are masked by an RS-code-informed mask to focus on correcting
the burst errors occurring in the learning process. The
synergy of RS masks and graph attention enables active error
localization, breaking through the limitations of traditional
passive error correction. With the decoded representations
with error corrections, a novel biologically stabilized loss
is formulated to regularize the data representations to possess
stable single-stranded structures. By incorporating these
novel strategies, RSRL can learn highly durable, dense, and
lossless representations for subsequent storage tasks in DNA
sequences. The proposed RSRL has been compared with a
number of baselines in real-world tasks of multi-type data
storage. The experimental results obtained demonstrate that
RSRL can store diverse types of data with much higher information
density and durability, but much lower error rates.
License type:
Publisher Copyright
Funding Info:
There was no specific funding for the research done
Description:
This material may not be retransmitted or redistributed without permission in writing from The Association for the Advancement of Artificial Intelligence. Permission to use document is granted, provided that (1) the copyright notice appears in all copies and that both the copyright notice and this permission notice appear, (2) use of such documents is for personal use only, and will not be copied or posted on any network computer or broadcast in any media, and (3) no modifications of any documents are made.