Researchers Rafael Gomez-Bombarelli from MIT and Simon Axelrod from Harvard collaborated on training and tuning models that generate three-dimensional geometric data based on 2D geometric definition strings for molecules. They attempted to build a robust machine learning model to replace expensive and time-consuming simulation. In the process, they released the GEOM dataset to encourage others to train and build better models for generating point clouds and more robust geometrical definitions. You can find the research paper here.
Q: What is your research and for whom is it most useful?
A: We use machine learning to find drugs that can be repurposed to treat COVID-19. The typical way to do this is to represent a molecule in terms of nodes (atoms) and edges (bonds). A neural network is then trained on this 2D representation to predict whether a molecule can be repurposed. In reality, however, molecules consist of atoms with 3D positions. Moreover, these positions fluctuate in time, leading to an ensemble of 3D structures called conformers. Drug binding is a 3D recognition event between a protein and one or several conformers. We are trying to leverage this 3D information to improve machine learning predictions.
Q: How did you decide on your model architecture for this research?
A: To our knowledge no one has represented molecules as an average over conformers. However, there are well-established models for predicting properties of individual geometries, and models for predicting properties from 2D representations. We combined well-known models in each area to create our pooled conformer model.
Q: How did you employ SigOpt? Did you consider tuning other parameters, and why or why not?
A: We used SigOpt to tune dropout rates during training. We considered tuning architecture hyperparameters as well, but found that the results were fairly insensitive to different choices.
Q: Why did you choose SigOpt over evolutionary search, grid search, or open source options?
A: We chose SigOpt because of its ease of use and organized web interface. After specifying hyperparameter ranges, the choice of optimization algorithm is handled by SigOpt without further input. Moreover, several different optimization jobs for different models can be run in parallel, and the results can be easily accessed online. Finally, the informative hyperparameter analysis online helps to interpret the results.
Q: Where do you feel that your research succeeded in its goal versus where did it fail to achieve the desired ends?
A: We have so far succeeded in generating accurate conformers and showing that their 3D information can marginally improve basic prediction tasks. However, we have not shown any major benefit to using conformers over 2D representations. We are currently preparing final training for COVID-19 predictions, and will know soon whether we can achieve those goals.
Q: What aspect of this process will you research next? Would it make sense to tune and optimize simulations as well?
A: We will explore different ways of pooling conformers to represent a single molecular species. We believe that the pooling operation is key to leveraging 3D information in an optimal way.
Q: Do you see a future in using deep learning to efficiently generate 3D models of molecules?
A: We definitely see a future in using deep learning to generate 3D molecular geometries. Generating conformers is a time-consuming task. We believe that researchers will be able to use our dataset to train generative models to bypass expensive conformer generation. These conformers can then be used as inputs to a network that predicts molecular properties.
SigOpt facilitates more efficient experimentation and results-oriented academic research, from reinforcement learning in robotics to material science. If you’re an academic researcher, you can register to use our product free of charge here.