On Universal DNA Computing

Grigory Sapunov
12 min readMay 2, 2017

I found an interesting paper [0] that presents the physical design for a computer that has an exponential speedup over conventional and quantum computers on NP complete problems.

NUTM — The mythical creature

You probably heard (or did’t yet) of a Nondeterministic universal Turing machine (NUTM), a mythical and purely theoretical construct which is able to solve NP-complete (NPC) tasks in polynomial time. If you didn’t hear, they are worth looking at [1,2,3].

A Nondeterministic Turing Machine (NTM) differs from a Deterministic Turing Machine (DTM) in that each state allows several transitions instead of a single one. There are two ways of looking at NTM. One is to say that the machine is the “luckiest possible guesser”; it always picks a transition that eventually leads to an accepting state, if there is such a transition. The other is to imagine that the machine “branches” into many copies, each of which follows one of the possible transitions. Whereas a DTM has a single “computation path” that it follows, an NTM has a “computation tree”. If at least one branch of the tree halts with an “accept” condition, we say that the NTM accepts the input.

DTMs and NTMs are computationally equivalent in terms that NTMs can compute the same results as DTMs, that is, they are capable of computing the same values, given the same inputs. But the time complexity differs. The NTM can (in theory) solve NP tasks.

And the less known equivalent formal definition of a complexity class NP is the set of decision problems solvable in polynomial time by a theoretical non-deterministic Turing machine (so here comes the name of the class itself “nondeterministic, polynomial time”).

Quantum computer

From the first view it could look similar to a quantum computer (QC), but the similarity is misleading. Contrary to the myth, quantum computers are not known to be able to solve efficiently the very hard class of NP-complete problems [4]. But wait, doesn’t the Shor’s quantum algorithm factor numbers faster than ordinary one? Yes, it does. But despite a widespread misconception, the factoring problem is neither known nor believed to be NP-complete. So in general QCs cannot significantly help with NPC tasks. To know more about the potential work for QCs see the great introductory article by Scott Aaronson (and read his extremely cool blog here as well):

From [4]

Anyway, QCs are on their way, and some scientists are expecting they will appear in recent years [6,7,8,9]

NUTM is different. It pretends to be a universal machine capable of solving NPC tasks. There was many proposals for a physically achievable system to do it including soap bubbles, protein folding, a lot of quantum related ideas, analog computers, even closed timelike curves and “anthropic computing”. Scott Aaronson again has a great overview of such attempts [5], arguing they are not achievable or able to solve NPC tasks.

The paper

The interesting thing that it seems a NUTM could actually be engineered. And it’s not quantum. It’s DNA-based. A rather fresh paper called “Computing exponentially faster: Implementing a nondeterministic universal Turing machine using DNA” describes the physical design for a NUTM and shows that it is possible to physically create a NUTM using DNA molecules. The design exploits the ability of DNA to replicate to execute an exponential number of computational paths in P time. Computational modelling and in vitro molecular biology experimentation shows that this can be achievable.

The authors use a Thue rewriting system to implement a NUTM. Thue systems are a model of computation with equivalent power to Turing Machines. It is possible to translate any Turing machine into a Thue system, and vice-versa.

Informally a Thue system is a set of rules of the form w ↔ u where w, u are strings in a finite alphabet of symbols. A string e.g. v w v’ can be rewritten by the rule above to give v u v’. The application of a Thue rule to a string therefore produces a new string — equivalent to change of state in a UTM.

The starting-state (program) is a specific string, as is the accepting-state. The execution of a Thue program consists of repeated application of Thue rewrite rules until an accepting state is produced.

The order and position of application of Thue rules is naturally nondeterministic: multiple Thue rules may be applied to a string, and individual Thue rules may be applied to multiple positions in a string.

Example trace of the execution of a NUTM. The execution of NUTM is a tree of all possible computations. The root of the tree is the initial program. The child nodes of the root are the subsequent Thue sequences generated from the initial program by application of one of the seven Thue rules: note that the antecedent of a rule (e.g. ca — the reverse form of rule 1) may occur multiple times. Thue rules are recursively applied until the accepting state is produced, thus execution of a program generates a potentially exponential number of states in P time

Each Thue rewriting step is embodied in a DNA edit implemented using a novel combination of polymerase chain reactions and site-directed mutagenesis.

Starting-states (programs) and accepting states (read-outs) are sequences of DNA that encode strings of Thue symbols. The physical architecture of the computer, with a mixing chamber, and editing chambers for each for different Thue rule/direction, ensures that every Thue rule is applied to every NUTM state.

The design exploits DNA’s ability to replicate to execute an exponential number of computational paths in P time.

Authors demonstrated that this design works using both computational modelling and in vitro molecular biology experimentation.

For more details of the physical implementation please refer to the paper.

Surely, the approach is limited by physical constraints (as any other physical implementation of a computing device) like amount of space and DNA building blocks available (even for an ordinary Turing Machine the tape cannot be infinite). We should understand that we have a kind of trade-off here. What we actually do with such kind of a computer is trading space for time. These entities seem to be interchangeable for computations as matter and energy are interchangeable for physics:

“The theory of computational complexity treats time and space as fundamentally different: space is reusable while time is not. The resource limitation in a physical NUTM is space, not time. The speed of the computation increases exponentially, while the amount of space is polynomially bound: the light-cone is cubic, and the holographic bound on the maximum amount of information in a volume of space is quadratic. Computation therefore resembles an explosion [hello, Big Bang!]. Although DNA nucleotides are very small (Avogadro’s number is ~6 x 10²³) they are still of finite size, and this restricts the size of NP problem that a DNA NUTM could practically solve before running out of space — the Earth has ~10⁴⁹ atoms, and the observable Universe only ~10⁸⁰. We therefore argue that what protects cryptographic systems from being broken is not lack of time, as is generally argued, but lack of space.“

The proposed NUTM has potential advantages in speed, energy efficiency and information storage over electronics. By authors’ estimates the number of operations for a desktop DNA computer could plausibly be ~10²⁰s (~10³ times faster than the fastest current super-computer [10]); it could execute ~2 × 10¹⁹ operations per Joule (~10¹⁰ more than current computers); and utilise memory with an information density of ~1 bit per nm³ (~10¹² more dense than current memory). These advantages mean that it is feasible that a DNA NUTM based computer could outperform all standard computers on significant practical problems.

From [19]: “For the purposes of evaluating the “practicality” of DNA algorithms we assume that 10²¹ is an upper bound on the number of DNA strands that are available to an algorithm. It will be useful to keep in mind that 10²¹ ~=2⁷⁰.”

The question is what’s the amount of space (number of DNA molecules) is required to solve practical tasks. What is 1 gram of DNA (or 10 grams, or 100 grams, etc) is capable of solving? What size of a task can be encoded by 10¹⁹ 100-nt dsDNA molecules? It’s still a question.

DNA storage

Keeping in mind that we are focusing on DNA computing right now, it’s worth to mention that DNA is actively being researched as a storage as well. You might have read articles about the DNA storage actively explored by Microsoft [11,12] (stored 200MB of data in 2016; and by the way, Microsoft has its own research in computing as well [13]) or Columbia University and the New York Genome Center [14] (stored around 2MB but with an average of 1.6 bits into each base nucleotide,which is at least 60 percent more data than previously published methods, and close to the 1.8-bit limit). Both teams collaborated with a San Francisco DNA-synthesis startup, Twist Bioscience, that has a silicon-based DNA synthesis substrate that can make many different sequences in parallel.

DNA is an interesting substrate because of it’s ultra-compact and can last hundreds of thousands of years if kept in a cool, dry place (so, we were able to decode Neanderthal and Denisovan DNA). The researchers from Columbia University and the New York Genome Center showed that their coding strategy (fountain codes) packs 215 petabytes of data on a single gram of DNA. There are even ideas to use DNA storage for quantum computers [15].

But there are still problems. DNA is expensive to synthesize and takes a long time to read out.

To summarize, the main advantages of DNA as a storage is:

  • Stability (half-life of over 500 years)
  • Low weight (1 gram contains, for example, 9.89 x 10¹⁸ double stranded DNA molecules with 100 nucleotides)
  • Error-proof/noise tolerance (because of duplication and possibility to fix some errors, which is done on regular basis inside the cell)

A short history of DNA computing

Let’s return to the computing. This is not the first attempt to use DNA for computations. The field of DNA computing was started by Leonard Adleman who published a breakthrough paper in Science in 1994 [16]. In this paper, he solved the directed Hamiltonian Path problem on 7 vertices using DNA reactions. A brief summary of the paper can be found here [17].

Another work was dedicated to breaking the DES cryptography [18]. The authors reported: “Our analysis suggests that such an attack might be mounted on a tabletop machine using approximately a gram of DNA and might succeed even in the presence of a large number of errors”.

There were many attempts to solve other computational tasks many of which are NP-complete: 3SAT, 3-Coloring, Independent set etc. [19, 20, 21]. Abstract models of molecular computers are being discussed for a long time as well [22].

What puts the NUTM paper aside from all these works is its universality and practical approach. While the papers mentioned above develop molecular computing methods for solving particular hard tasks, the paper on NUTM presents the first physically demonstrated molecular UTM design, which means that there is no need for hardware redesign for different problems.

from http://www.rle.mit.edu/sbg/research/

This kind of approach is not like an ordinary synthetic biology approach as well, but pretends to be more general:

“A major motivation for this work is to engineer a general-purpose way of controlling cells. The natural way cells are controlled is a very complex combination of DNA, RNA, protein, and small-molecule interactions (supplemented by epigenetics, etc.) with multilevel control implemented through specific chemical interactions. This makes cells very difficult to reprogram. Synthetic biology has sought to control cells through the design of simple switches, circuits, etc. and has some notable successes. However, we argue that a more radical and general approach is required: a DNA UTM. This would in principle enable arbitrary biological processes to be programmed and executed. The UTM could receive biological signals from the environment through interaction with transcription factors, etc. It could also use as effectors RNA/proteins generated using special sequences and RNA polymerase, etc. Our current in vitro implementation of a NUTM is not suitable for this. However, it would seem possible to implement the core ideas in a biological substrate. One way to do this would be to use plasmids as programs, and rolling circle replication.”

The current design has limitations, such as restricted error-correction.

Due to noise the correspondence between a NUTM and its DNA implementations is less good than that between UTMs and electronic computers. Although noise was a serious problem in the early days of electronic computers, it has now essentially been solved. Compared to Quantum Computers (QCs) noise is far less a problem for DNA NUTM as there are multiple well established ways to control it, e.g. use of restriction enzymes/CRISPR to eliminate DNA sequences that do not encode symbols, use of error-correcting codes, repetition of computations, amplification of desired sequences, etc. Most significantly, when a NP problem is putatively solved the answer can be efficiently checked using an electronic computer in P time.”

Another possible limitation could be the length of the synthesized DNA which is required to encode the program. Currently Twist Biosciences (the startup that synthesizes DNA for experiments with DNA storage) can synthesize 300 bp — 1.8 kb Genes, with longer lengths to come. [23]

Anyway, this is a very interesting work. First electronic computers were clumsy and unimpressive, but the evolution went by an exponent. DNA sequencing and synthesis productivity goes the same way, so it could be not far away from production ready solutions.

https://synbiobeta.com/time-new-dna-synthesis-sequencing-cost-curves-rob-carlson/

And just one more beautiful quote:

“NUTMs and QCs both utilize exponential parallelism, but their advantages and disadvantages seem distinct. NUTM’s utilize general parallelism, but this takes up physical space. In a QC the parallelism is restricted, but does not occupy physical space (at least in our Universe). In principle therefore it would seem to be possible to engineer a NUTM capable of utilizing an exponential number of QCs in P time.

Advocates of the Many-Worlds interpretation of quantum mechanics argue that QCs work through exploitation of the hypothesised parallel Universes. Intriguingly, if the Multiverse were a NUTM this would explain the profligacy of worlds.”

Links

[0] Computing exponentially faster: Implementing a nondeterministic universal Turing machine using DNA, Andrew Currin, Konstantin Korovin, Maria Ababi, Katherine Roper, Douglas B. Kell, Philip J. Day, Ross D. King, https://arxiv.org/abs/1607.08078

[1] Non-deterministic Turing machine, https://en.wikipedia.org/wiki/Non-deterministic_Turing_machine

[2] What’s the difference between deterministic and non-deterministic Turing machines? https://www.quora.com/Computational-Complexity-Theory-Whats-the-difference-between-deterministic-and-non-deterministic-Turing-machines

[3] Theory of Computation course, Portland State University: Prof. Harry Porter, Lecture 29/65: Nondeterminism in Turing Machines, https://www.youtube.com/watch?v=eMchSJaJJVU

[4] The Limits of Quantum Computers, Scott Aaronson, http://www.cs.virginia.edu/~robins/The_Limits_of_Quantum_Computers.pdf

[5] NP-complete Problems and Physical Reality,
Scott Aaronson, http://www.scottaaronson.com/papers/npcomplete.pdf

[6] Google’s New Chip Is a Stepping Stone to Quantum Computing Supremacy, https://www.technologyreview.com/s/604242/googles-new-chip-is-a-stepping-stone-to-quantum-computing-supremacy/

[7] Quantum Computers Ready to Leap Out of the Lab in 2017, https://www.scientificamerican.com/article/quantum-computers-ready-to-leap-out-of-the-lab-in-2017

[8] IBM Will Unleash Commercial “Universal” Quantum Computers This Year, https://www.scientificamerican.com/article/ibm-will-unleash-commercial-universal-quantum-computers-this-year/

[9] Quantum Computers Are Coming. The World Might Not Be Ready. https://www.bloomberg.com/view/articles/2016-09-06/quantum-computers-are-coming-the-world-might-not-be-ready

[10] https://en.wikipedia.org/wiki/TOP500#Top_10_ranking

[11] Researchers break record for DNA data storage, https://phys.org/news/2016-07-dna-storage.html

[12] DNA Storage, https://www.microsoft.com/en-us/research/project/dna-storage

[13] Programming DNA Circuits, https://www.microsoft.com/en-us/research/project/programming-dna-circuits

[14] Researchers store computer operating system and short movie on DNA, https://phys.org/news/2017-03-short-movie-dna.html

[15] What if Quantum Computers Used Hard Drives Made of DNA? https://www.wired.com/2017/03/quantum-computers-used-hard-drives-made-dna/

[16] Adleman, Molecular computation of solutions to combinatorial problems, https://www.usc.edu/dept/molecular-science/papers/adleman-science.pdf

[17] A Brief Introduction to DNA Computing, https://luckytoilet.wordpress.com/2016/07/28/a-brief-introduction-to-dna-computing/

[18] On Applying Molecular Computation To The Data Encryption Standard, https://www.usc.edu/dept/molecular-science/papers/fp-des96.pdf

[19] On the computational power of DNA, http://www.sciencedirect.com/science/article/pii/S0166218X96000583

[20] DNA models and algorithms for NP-complete problems, https://www.cs.ubc.ca/~condon/papers/bcgt.pdf

[21] Using DNA to solve NP-complete problems, https://pdfs.semanticscholar.org/591d/33325588c1e92236cbd912b8282e714b5b5e.pdf

[22] Adleman, On constructing a molecular computer, https://www.usc.edu/dept/molecular-science/papers/fp-onconstructing95.pdf

[23] https://www.twistbioscience.com/products/genes/

[24] Scientists reveal new super-fast form of computer that ‘grows as it computes’, https://phys.org/news/2017-03-scientists-reveal-super-fast.html

--

--

Grigory Sapunov

ML/DL/AI expert. Software engineer with 20+ years programming experience. Loves Life Sciences. CTO and co-Founder of Intento. Google Developer Expert in ML.