Computing Profession

Durable, Dense, and Efficient: The Promise of DNA Data Storage

Is DNA the future of data storage?

Yaniv Erlich, a core member of the New York Genome Center and associate professor of Computer Science and Computational Biology at Columbia University, holds a small three-dimensionally (3D)-printed bunny in front of his webcam. The toy, he says, is actually a storage device. "The plastic fibers in the bunny have silica beads," he says, "and inside these beads is DNA that encodes a file with instructions on how to print an exact replica of this bunny."

As with a real rabbit, the 3D-printed toy, developed with chemical engineer Robert Grass at ETH Zürich, carries its own blueprints in the DNA within it. "You can chop off any part of the bunny," Erlich explains, "and there's DNA in every piece, and you can amplify it and print a new bunny. We think we can replicate them to about 1021, or enough bunnies for everyone in the world until the end of humanity."

The project is less about toymaking than it is about the transformative potential of DNA data storage.

DNA boasts a rare combination of durability, low energy consumption, and phenomenal density. "We estimate that a DNA system could store one exabyte per cubic inch," says computer scientist Karin Strauss, a principal research manager at Microsoft. By using a DNA data storage system, she said, "What requires a whole datacenter to store today would fit in the palm of your hand."

On a basic level, DNA storage involves taking the four basic molecules in DNA—adenine, thymine, cytosine, guanine, or A, T, C, and G—and mapping them to sequences of bits, so "A" might correspond to 00 and "T" to 01. Scientists take a sequence of bits and synthesize and store DNA that represents those bits.

Strauss, computer scientist Luis Ceze of the University of Washington, and their interdisciplinary team recently developed a fully automated, end-to-end system. Previous systems required help from chemists and other scientists, but the new prototype automatically encodes the bits, makes the DNA, stores that DNA, retrieves and reads it, and then returns the data.

In the first iteration, they stored the word 'hello'. "It is by no means a high-performance system," Strauss says. "It was intended to be a first demonstration that automation in DNA data storage is indeed possible, end to end. But the maturity will improve. Eventually we could see DNA storage devices that look like racks, but with fluidics components, inside datacenters."

Strauss and Ceze recently were named to share the 2020 Maurice Wilkes Award, for their work on DNA-based digital data storage.

Another recent breakthrough focused on efficiently reading and retrieving DNA-stored data. Computer engineer James M. Tuck, chemical engineer Albert Keung, and their colleagues at North Carolina State University recently published a paper detailing their novel approach, which they call Dynamic Operations and Reusable Information Storage, or DORIS. The technique employs what they call a toehold system, in which a single-stranded piece of DNA is attached to a double-stranded section that stores data. The single strand, or toehold, effectively carries the file name, or identifying information, which allows them to efficiently search for specific DNA data. Once they retrieve a file, they make RNA copies of the DNA and its stored data, then return the original DNA to the storage medium undamaged.

Previous systems relied on more involved chemistry or molecular manipulations that could degrade the stored data in the long run.

The system holds great potential, says Tuck, for a very dense, resilient storage system. "In a relatively small space, we'd be able to store lots of information, label it with distinct addresses, and pull out the information we want while having minimal degradation on the library that's there," he says.

As for applications, the storage density and durability of DNA make it ideal for archival storage, according to Strauss, who suspects the first iterations might appear in the controlled environment of a datacenter.

Erlich has additional applications in mind. In the future, car parts could be embedded with DNA that harbors data on how to manufacture the component, should it become obsolete. An artificial knee or hip could contain a patient's relevant medical details, so doctors operating on the prosthetic in the future could easily recover important health information.

Tuck adds that it would be a waste not to find a way to compute on DNA-stored data where it resides, and Strauss and Ceze have made advances in that area. Keung, meanwhile, hopes that instead of choosing a particular system, researchers will continue to explore creative approaches.

"We are at this inflection point with how we build computers right now, with the end of Moore's Law in sight, and different efforts into quantum computing and molecular computing," says Ceze. "It's becoming increasingly clear that these approaches are all good at different things, and we need to develop this portfolio of new technologies to ensure we can continue building better computers."


Gregory Mone is a Boston-based science writer and the author, with Bill Nye, of Jack and the Geniuses: At the Bottom of the World.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More