Hello dear friends!
Here some thoughts and ramblings about an expert system seeking syntheses for organic compounds
were among the first successful developments of artificial intelligence decades ago, well before machine learninghttps://en.wikipedia.org/wiki/Expert_system
You can imagine them as a renunciation of the too complicated "if then else" control on the programmes by the programmer. Instead, the expert system comprises a knowledge base
, that is a set of rules, possible actions applicable when their conditions are met, and an inference engine
that has often zero knowledge of the topic but tries to apply billions of combinations of the rules to a situation proposed by the user.
Expert systems are artificial intelligence in that sometimes they make "reasonings" far beyond what software is expected to, and in that the programmer doesn't try to predict their behaviour. Nice and successful ones existed for spectroscopy, geology, synthesis of digital circuits, and more.
Seeking to produce a compound from more easily available reactants by applying reactions from a set of (at least here) known ones, organic synthesis resembles what an expert system can do
The case proposed by the user is the molecule to synthesize.
The transformation rules are the the known reactions. Thousands exist, which is fine for an expert system. They have domains of validity, like the compatibility with other functions present on the intermediates: this is easy for expert systems. They are parameterized by some R, R', R"... parts of the compounds: nothing new for expert systems.
Knowing the reactions in the reactants to products direction too can be useful, when the user asks "make this compound from this feed", which limits much the combinatorial explosion by starting the search from both ends.
The system must know a set of reactants that tell "the retrosynthesis is finished". This is less usual, but easy to program. Perhaps as classes of reactants rather than individual ones, like "all 1-alkenes".
As an interesting option, the system could first build from the known reactants a huge base of easily obtained intermediates, or classes of, that would short-circuit the time-consuming end of the retro-synthesis.
Guides are necessary so the engine applies reactions towards ever simpler, or more easily obtained, intermediates. As these intermediates are not individually known, some sort of evaluation function must be very general, maybe by comparing some quick but fuzzy "distance" to the known reactants. This educated guess is not standard practice for expert systems. But allowing for a limited number of steps in a seemingly wrong direction is common practice.
The combinatorial explosion is brutal in organic synthesis: thousands of known reactions, sometimes over 20 steps. The computer's brute force is the big argument of an expert system but it won't suffice by far. The evaluation function shall reduce the combinatorial explosion. Guidelines that indicate what reactions make probably more sense in a given situation may also help the inference engine. This is less common for expert systems, which tend to apply neutrally all possible rules.
Some sort of "cost function" is necessary, not only to compare found syntheses, but also the guide the inference engine as it seeks syntheses. The function may consist of economic costs with al refinements (recurring, investment...) and include also the needs, desires, beliefs of the user, like "no poisons, no explosives" or "reactants from renewable sources" or "by-products easily disposed of" or "kg cost less important here".
The cost function may compute more than a sum of the individual steps, for instance if no separation is needed between two reactions. Optional later refinement.
This expert system seems bigger than the historical ones. Reorganizing digital circuits is simple in comparison with the application of thousands of reactions. The availability of vast chemical databases should help.
The search for syntheses is naturally very parallel and can run on supercomputers. That's a detail, because artificial intelligence uses to succeed or fail by a factor of a zillion.
Maybe the first thoughts should define a simpler first trial. One expected to provide encouraging results from a limited effort, even if it finds only very simple syntheses using few known reactions, as a proof-of-concept.
Gentlemen, start your inference engines...
Marc Schaefer, aka Enthalpy