A basic problem in computational chemistry is to find a set of reactants to synthesize the target molecule, that is, reverse syntheses prediction. As computers are widely used in various fields, the search space for all possible conversions is very large. This technique is widely applied in drug discovery. Scholars have been exploring multiple computer-aided reverse syntheses route analysis. Nowadays, synthetic route planning relies heavily on the knowledge of experienced chemists. The development of modern computers has allowed machine learning to be successfully applied to reverse syntheses route prediction. Computer-aided retrosynthetic route planning can be divided into two categories: template-based methods and template-free methods. In addition to template-based technique, Alfa Chemistry also applies the template-free automatic retrosynthetic route planning strategy to support your computer-aided chemical drug synthesis.
Advantages of Template-free Automatic Retrosynthetic Route Method
- Do not require a manual coding method to extract the rules in the template.
- The algorithm considers the global chemical environment of the molecule.
- Small mount of calculation.
Figure 1. The workflow of AutoSynRoute. (Kang, Li.; et al. 2020)
At Alfa Chemistry, we propose a new template-free strategy for automatic reverse synthesis route planning. The process of template-free automatic retrosynthetic route design can be described as follows:
1. First, the Transformer architecture is constructed to train the end-to-end model of the single-step reverse syntheses route on the reaction from the USPTO database.
- Transformer architecture
We have built the Transformer architecture based entirely on the self-attention mechanism which is good at capturing the internal correlation of data or features.
The model architecture uses two-way long and short-term memory (LSTM) units of the attention mechanism.
The Transformer architecture supports parallel computing which can significantly increase training time.
It can generate highly effective smiles strings through the effective calculation of long-range dependent sequences.
- Data set and data preprocessing
The reactants and products of each reaction are extracted by using rdkit to mark molecules as model inputs, and they are converted into SMILES string successfully.
2. Then, the Monte Carlo Tree Search (MCTS) method is applied to search for intermediate molecules.
Selection step: Traverse the search tree from the root node to the leaf node by selecting the child node with the largest upstream confidence interval (UCB) score.
Extension step: Create child nodes by sampling from the Transformer model.
Simulation step: Evaluate each searched position in the state space to determine the best position, and then perform the search from this position until the target is found to create a path to the terminal node.
Backpropagation: Calculate the reward of the terminal node and update the UCB score of the upstream node.
3. According to the obtained heuristic score, the reaction path of the molecule that finally forms the root node is obtained.
Figure 2. Monte Carlo tree search for retrosynthetic pathway search. (Kang, Li.; et al. 2020)
Features of Our Template-free Automatic Retrosynthetic Route Planning
- The accuracy rate of our model with a priori reaction category information added surpasses the similarity-based or LSTM-based seq2seq model, and our methods can generate more effective molecules.
- Our experienced experts have added chiral reactions in the model which enables to deal with simple chiral reactants or products.
- At Alfa Chemistry, our comprehensive scoring function is related to the cost of building blocks, the yield of each step, the affinity of toxic compounds and functional groups, the length of the reaction pathway, etc.
Our template-free automatic retrosynthetic route planning services remarkably reduce the cost, promote further experiments, and accelerate the process of drug design for customers worldwide. Our personalized and all-around services will satisfy your innovative study demands. If you are interested in our services, please don't hesitate to contact us. We are glad to cooperate with you and witness your success!
- Kang, Li.; et al. Automatic retrosynthetic route planning using template-free models. Chemical Science. 2020, 11, 3355-3364.