Sign In

Communications of the ACM


Automating Organic Synthesis

View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
Automating Organic Synthesis, illustration


The image of a chemist slaving away in a lab, haphazardly pouring steaming test tubes of multi-colored liquid into bubbling beakers amid stacks of leather-bound reference books has long been relegated to old Hollywood films or TV shows. However, while today's organic chemists generally spend as much or more time planning their work in advance, thinking and laying out the sequence of reactions that will be required to make a specific molecule, they still largely mix, filter, and combine substances by hand to try to recreate those planned sequences.

The advent of the modern computer and software packages capable of collecting, categorizing, and recombining vast amounts of chemical proprieties and reaction data may one day help to automate the process of creating molecules. Described as an organic synthesis machine, it would be able to make a huge number of small molecules on demand, speeding the development of new chemical research and of end products across a wide range of industries.

Back to Top

Organic Chemistry

The process often used to conduct organic synthesis is a technique called retrosynthetic analysis. Chemists draw a completed molecule and then deconstruct it, erasing the chemical bonds that would be easy to form, while leaving fragments of molecule that are stable or readily available. The chemist then tries to identify the new raw materials needed to connect the missing pieces of the molecule, based on their experience and expertise. Then, the chemist must actually manually combine the raw material in the lab to synthesize the new molecule.

A few of the challenges involved with conducting organic synthesis in this manner are apparent. First, the human brain is relatively limited in terms of the number of molecular structures and rules it can quickly recall without needing to refer to a database or reference sources. Similarly, it takes significant time and effort to physically perform a synthesis in the lab, and real-world synthesis results often do not match the theoretical plan.

As such, chemists have increasingly turned to online databases of chemical compounds, reactions, and rules that can be used when trying to construct molecules. Commercial molecular data bases such as SciFinder, an electronic interface to the American Chemical Society's Chemical Abstracts Service, or Reaxys, a commercial database service offered by Elsevier, can provide reference data that can be used as a jumping-off point for the creation of new molecules. And, these data repositories may be the content that helps power the organic synthesis machine of the future.

Back to Top

Viewing Molecules

One of the visionaries in the space that believes such a machine can and will be built is Richard Whitby, a chemist at the University of Southampton, in the U.K. Whitby is the leader of Dial-A-Molecule, a collaborative project that is working to identify the technical and research requirements to build such a machine.

The key vision of the Dial-A-Molecule project is largely designed around the development of a machine that can quickly develop any molecule, based on a specific set of desired properties.

"By making the delivery of a new molecule as quick and easy as it is now to order a 'stock' chemical, we aim to remove that bottleneck in development."

"Every molecule has different properties, and the rate-limiting step in finding the 'best' for a particular application is making them," Whitby says. "The key component is deciding how to put the molecule together. Even for a simple molecule, there are a vast number of possible routes, each comprising many steps."

However, the organic chemistry community is currently "very poor at judging how well even individual steps will work" even if a reliable route is chosen, Whitby laments. Using a machine that is able to reference a large database of reaction and raw material data, and then automatically synthesize a molecule, could drastically increase the speed, breadth, and depth of chemical creation.

Currently, scientists are often forced to use the best readily available molecule, as manually constructing hundreds or thousands of new molecules is not time- or resource-efficient. "By making the delivery of a new molecule as quick and easy as it now is to order a 'stock' chemical, we aim to remove that bottleneck in development and allow the 'best' to be used," Whitby explains.

Back to Top


Dial-A-Molecule began in 2008 as a consultation between the Engineering and Physical Sciences Research Council, the Royal Society of Chemistry, the Institute of Chemical Engineers, and the Chemistry Innovation Knowledge Transfer Network that sought out to identify a "Grand Challenge" in the field of chemistry, which has been defined as an achievement that will have a transformative impact on science or the world at large, and which requires scientists or researchers from many disciplines to accomplish that goal. The Dial-A-Molecule project was one of three projects out of more than 150 submitted that was selected for funding, with $1.2 million committed via an initial and a continuing grant, and began in 2009.

Whitby says the key hardware components are likely to include a variety of reactors for different functions (which are used to gradually build up the required molecule from simple starting materials); analytical instrumentation (to monitor the process and optimize the chemical process on the fly); and purification equipment (to remove chemical by-products that are present in nearly all chemical reactions). These components would then be linked together so the material can be routed as needed. While Whitby says that from a hardware engineering perspective, such a machine likely could be constructed today (albeit very expensively), it is the software and analytical components of a machine that have yet to be successfully worked out.

The software used in an organic synthesis machine likely would access databases containing information on chemical compounds and their respective properties, as well as the results from chemical reactions that have been conducted and cataloged by the chemistry community. By using these data pieces, the software would be able to accurately combine materials and automatically produce new molecules with a high degree of accuracy and very little human interaction.

The key issue on the software side revolves around figuring out a way to accurately and efficiently apply the various rules and models that govern the way materials interact in combinational chemistry. The sheer number of rules and models can vary widely based on the raw materials used, as well as the specific combinations of these raw materials, thereby adding significant complexity to a potential machine. In essence, the machine would need to calculate the result of each combination of materials, and then ensure the desired rule or model governing the combination was used, which could result in hundreds or thousands of permutations per combination.

Antony J. Williams, vice president of Strategic Development for the Royal Society of Chemistry (RSC) and leader of the Society's Cheminformatics team (which is working on a collection of reaction data located within its Chem Spider chemical-structure database that will be hosted within the society's developing data repository), notes that this information is key to the development of a machine capable of fully automating the organic synthesis process.

"I am assuming that the machine would be underpinned by a strong software platform that would utilize some form of retrosynthetic analysis using rules extracted from a reaction database," Williams says. "Basic rules will certainly get you some way along the path, but a large database combined with extracted rules is likely the most powerful approach. We are presently working on building out a 'reaction repository' as part of our development of our RSC data repository and we will be encouraging the community to contribute their reaction data."

The Dial-a-Molecule project is not the only effort focused on finding ways to more quickly synthesize molecules that can be used in research, development, and manufacturing processes. Bartosz Grzybowski, a chemist at Northwestern University in Evanston, IL, is working on a synthesis machine of his own based on Chematica, a software/database that uses algorithms and a collective database of 250 years of organic chemical information to predict and provide synthesis pathways for molecules. Chematica supports 3D modeling of individual molecules, as well as labeling of functional groups, and Grzybowski is negotiating with Elsevier to incorporate the program into its Reaxys database, and also is said to be bidding for a $2.3-million grant from the Polish government to use Chematica as the brain of a synthesis machine that can plan and execute the synthesis of at least three drug molecules.

Despite the obvious benefits of an organic synthesis machine, it could be years before one actually comes to fruition, according to Whitby, who notes that less than $100,000 of the Dial-A-Molecule funding grant went to actual research, with the bulk of the money used to "identify how we might get to the target and the key challenges."

"The Grand Challenge has a 30 40-year estimated delivery time, so completion is not imminent," Whitby says, contrasting it with large projects that had a fixed, tangible goal, such as landing on the Moon. However, he notes that achievements made over the next 30 or 40 years on the path to the development of an organic synthesis machine likely will have a substantial impact on chemistry specifically and our world in general.

Still, while Grzybowski did not respond to a request for comment for this article, he has been quoted as stating that an organic synthesis machine could be built and available within five years. Because he has been shopping Chematica to various entities, few independent assessments of Grzybowski's efforts have been conducted.

"I have to believe that it is the chemistry itself that will be the largest limitation, [with] kinetics of reaction, side-products and issues such as precipitation/crystallization," Williams says. "I remember trying to do flow-kinetics in an NMR (nuclear magnetic resonance) probe, only to have solid drop out and clog the lines."

Indeed, work is being done to smooth this process. Jamison Research Group, led by Massachusetts Institute of Technology chemistry professor Tim Jamison, is working on continuous-flow synthesis methods, through which reactions occur as the chemicals move through a machine (rather than in a step-by-step process), which can improve speed and yields. This type of continuous-flow reaction process is better suited to automation, and could be integral to the efficient and error-free design of a fully automated organic synthesis machine.

Furthermore, Williams notes the overall success of any future organic synthesis machine is predicated on the quality of the underlying reaction databases and the various rules or algorithms used to govern the choice of chemical reactions that can be performed.

"Any predictive algorithm, especially for retrosynthetic analysis, is massively influenced by the underpinning training set and extracted models," Williams says, which often renders an imperfect end result. Williams says any machine capable of conducting organic synthesis likely will require some form of self-learning capability, so it can grow more efficient over time.

Back to Top

Further Reading




Jamison Research Group:

What is Organic Synthesis?

Steps in Organic Synthesis:

Back to Top


Keith Kirkpatrick is principal of 4K Research & Consulting, LLC, based in Lynbrook, NY.

©2015 ACM  0001-0782/15/03

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2015 ACM, Inc.