Bonnie Berger discusses "Computational Biology in the 21st Century" (cacm.acm.org/magazines/2016/8/205052), a Review Article in the August 2016 Communications of the ACM.
---
TRANSCRIPT
00:00 Who are you? I mean in your bones, your guts, your cells.
00:10 The Human Genome Project showed us by mapping the three billion base pairs that comprise a person's complete DNA.
00:18 The data hold secrets of why we fall prey to disease -- and how, perhaps, to overcome it. But without clever approaches, cracking those secrets is beyond even our most powerful computers.
00:31 Join us as we talk with ACM Fellow Bonnie Berger about a new way to interpret genetic data, in "Computational biology in the 21st century".
00:42 [Intro graphics/music]
00:52 Bonnie Berger was already a professor at MIT in 2004 when she helped finish sequence the first human genome, unleashing a torrent of data.
01:03 DR. BERGER: And it just kept ballooning after that. Genomic data's going up by a factor of 10 every year.
01:11 That's faster than computer power. But only if technology wins the race can we use this ever-expanding dataset to diagnose and cure genetic diseases.
01:21 DR. BERGER: Computation is the only thing that's enabled us to handle all these thousands of genomes, and to find genes where you might have a mutation which would lead to susceptibility to a disease. And the idea is that we want to edit the genes so that you don't get the disease.
01:45 But searching the genomic data can be incredibly complicated.
01:49 DR. BERGER: It can even be on protein structures! It can be on DNA, RNA, amino acid sequences....
01:57 The space is much too big for brute-force searches. Fortunately, the stucture of genomic data helps us out.
02:04 DR. BERGER: So there's a lot of commonality in these sequences, a lot more than you'd even get in a text of, let's say, the English language. And so we can take advantage of compressive algorithms.
02:16 In her paper, Dr. Berger collapses the multidimensional space and shows how two characteristics -- metric entropy and fractal dimension -- offer clues to speeding up searches.
02:27 DR. BERGER: Now it turns out that biological data has very low metric entropy. That is, it takes up a very small amount of the entire inhabitable space. ... And ... it has what's called low fractal dimension. ... What fractal dimension says is that when you expand the search out to look at neighboring clusters in the space, you don't have to look at too many neighboring clusters.
02:54 What kinds of conditions could this technique help cure? The range is enormous.
03:00 DR. BERGER: Alzheimer's... substance abuse... Parkinson's disease... cancer is a huge one.
03:06 We could learn more about how drugs affect us.
03:09 DR. BERGER: It allows us to repurpose existing drugs or perhaps design novel drugs.
03:14 But our best chance of doing so comes from keeping pace with the wealth of biological data produced.
03:21 DR. BERGER: The key point in all of this is that our algorithm scales sublinearly with the size of the data. So we have sublinear time and space algorithms. So that the cost doesn't explode as the sizes of the databases increase exponentially.
03:40 Find out more in the review article, "Computational biology in the 21st century", in the August 2016 issue of Communications of the ACM.
03:52 [Outro and credits]