Two researchers, Yaniv Erlich and Dina Zielinski were able to successfully encode 2MB of data into strands of DNA. Erlich is a computer science professor at Columbia Engineering and Zielinski is an associate scientist at the New York Genome Center (NYGC) used an algorithm originally designed for streaming video to store 6 files into strands into strands of DNA. The 6 files selected were: an entire operating system (KolibriOS), an 1895 French film – “Arrival of a train at La Ciotat”, a computer virus, a $50 gift card from Amazon, Claude Shannon’s 1948 paper – A Mathematical Theory of Communication, and a Pioneer Plaque (plaques launched into space in 1972 with various universal measurements).
DNA was chosen as the storage medium is ultra-compact and it doesn’t degrade over time like CDs or DVDs. DNA can last up to hundreds or thousands of years if the conditions are favorable (i.e. cool and dry). They combined all data into a master file and separated the data into binary code strings which was then rewritten with the streaming video algorithm (called fountain codes), and then sorted into smaller packets called droplets which were then imprinted into the 4 bases of DNA. In the end, they generated 72,000 DNA strands, each 200 bases in length.
The information was sent in a text file to a startup company called Twist Bioscience who turned the file into DNA molecules which could be read by a DNA sequencer. The test was so successful that even copies of copies turned out to be without any errors in any situation. Erlich believes DNA is the highest density data storage mankind has created, with at least 215 petabytes (or 215,000,000 gigabytes) per gram of DNA. The only downside is that this process came in at a total cost of $9,000 to create and read the DNA file.
- Report from Data Science Center at Columbia
- All of Your Data in a Drop of DNA: an interview with Yaniv Erlich (video)
- Pioneer Plaques
- A Mathematical Theory of Communication