Dna as a Data Storage Medium

The tiny smear of DNA in the test tube can store massive amounts of data. — More than 10 terabytes of data can be stored in the faint pink smear of DNA at the end of this test tube, equivalent to all the movies, images, emails, and other digital data in 625 16-gigabyte smartphones.

Archival data storage systems of the future may be based on the same storage material used by human beings: namely, deoxyribonucleic acid (DNA). Researchers have successfully stored and perfectly retrieved digital images from a single sequence of DNA a million times smaller than semiconductor memory—10 terabytes in a drop of DNA. What is more, the strand was immersed among a sea of look-a-likes, and the team was still able to pull out the desired sequence.

This achievement was detailed at the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (APLOS 2016) in Atlanta in April, in the paper A DNA-Based Archival Storage System.

The research discussed in the paper was performed by a team at the University of Washington Information Systems Laboratory led by associate professor Luis Ceze, along with Georg Seelig, James Bornholt, and Randolph Lopez , as well as Microsoft Research’s Douglas Carmean and Karin Strauss.

Said independent observer Victor Zhirnov, chief scientist at Semiconductor Research Corp., "The work by Professor Ceze’s group is an important step towards developing practical DNA storage technology.

"In principle, DNA has an information storage density that is several orders of magnitude higher than any other known storage media in the universe. In theory, a few tens of kilograms of DNA could meet all of the world’s storage needs for centuries to come. Successful development of DNA-based massive information storage technologies would address the issue of the ongoing global explosion of data."

The idea of using DNA for data storage goes back decades, according to Ceze, but it has only become viable now due to the tremendous progress in manipulating DNA made during the Human Genome Project, as well as ongoing research into genetically modifying organisms and creating completely new genomes.

"The difference with our work," Ceze explained, "is that it approaches DNA from a computer architecture angle—taking what we learned from building computers/storage systems and applying it to biology. We are proposing that DNA can be a viable alternative for archival storage, what is today fulfilled by a mix of tape, hard drive and optical media."

DNA strands use a base-four method of encoding information using nucleotides consisting of cytosine (C), guanine (G), adenine (A), or thymine (T), along with alternating sugar and phosphate molecules to join them into a chain. Covalent bonds between the sugar molecule of one nucleotide are joined to the phosphate molecule of the next, forming the storage medium called a strand. The molecular-sized system is so small, it could encode all the information in the world’s biggest datacenter in an archival storage unit the size of a sugar cube.

"I think using DNA for data storage will be inevitable," said Rob Carlson, managing director of Bioeconomy Capital, a consultant on the DNA storage project. "We already know that it works, and commercialization is just a matter of scaling the cost and throughput rather than any fundamental advance in technology,"

While DNA data storage is based on biotechnology, Carlson said, "it is being driven by orthogonal technical and economic demands for information density and stability. That will have consequences that feed back on biotechnology, and those consequences should be acknowledged and discussed now.

"For example, just to compete with a tape drive available today, which has I/O speeds of about 2 gigabits per second, a ‘DNA drive’ would need to be able to write the equivalent of about 10 human genomes a minute worth of synthetic DNA. That is substantially greater than the current annual demand for synthetic DNA, which means that adopting DNA as a medium for storing digital information will completely marginalize the existing market for synthetic DNA."

In the course of a year, Carlson observed, "A single DNA drive would need to have the capacity to synthesize the equivalent of 10,000 human genomes. This production capacity will make synthetic DNA ubiquitously available for any use."

While a functional human genome cannot yet be built from synthetic DNA, Carlson added, "You can be sure that the scale of DNA synthesis required for DNA storage is going to bring de novo human genome synthesis into the spotlight sooner rather than later."

Technologically, the system is relatively simple, using techniques like the long-perfected polymerase chain reaction, which is just one of the tools genetic engineers use to create DNA memories in liquid form. The researchers claim to have no major engineering hurdles to surmount in order to mass-produce such systems in dry thin-films that can be integrated into the semiconductor design flow.

"The overall system supports reads and writes indefinitely," said Ceze, "and if kept dry and away from environmental hazards like ultraviolet light and extreme temperatures, DNA memories could last for a long, long time, up to hundreds or even thousands of years."

To date, the team has successfully archived and retrieved—down to the individual bit—exact copies of hours and hours of video files from the Voices from the Rwanda Tribunal project, but it is amenable to any kind of digital data—from spreadsheets to photos to sensor streams.

The University of Washington’s Molecular Information Systems Lab is currently cooperating with Microsoft Research to develop a commercial DNA-based random-access archival storage system to meet the needs of data centers by 2020. Their biggest barrier is reducing the cost (Ceze said the cost "needs to be brought down many orders of magnitude") and increasing the efficiency of manufacturing and accessing DNA-based memories.

R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.