Integration of in vitro and in silico Models Using Bayesian Optimization With an Application to Stochastic Modeling of Mesenchymal 3D Cell Migration

Ruben Martinez-Cantin and Francisco Merino-Casallo

Note: This is a blog post summarizing a more detailed paper that is fully available online. To access the full paper, please navigate here:

Thank you to our colleagues who collaborated on this research and cowrote this paper: Francisco Merino-Casallo, Maria J. Gomez-Benito, Yago Juste-Lanas, Ruben Martinez-Cantin, Jose M. Garcia-Aznar.


Studying cellular migration is crucial to learn about multiple biological processes, such as immune response, wound healing or, even, cancer formation. This knowledge can lead to the design of better treatments for cancer or other diseases, improve the healing factor or even design artificial healing mechanisms. With this work, we are making the research of those mechanisms more accessible and cost effective. Using SigOpt’s technology we have devised a method to generate high fidelity simulations of the biological processes.

There are three routes to study biological systems:

  • In vivo: the process is studied as it happens inside a living organism. This method provides the most accurate representation of the nature of the process, but it has many limitations. In vivo experiments are really expensive and there might be technical or moral limitations.
  • In vitro: this approach is an approximate reproduction of the physical, chemical and biological conditions within a living organism. The more realistic the experiment, however, the more complex and expensive it is to conduct.
  • In silico: the use of computer simulations to provide a valid alternative to real experiments. The combination of the growth of vast computational power and recent developments in simulation tools allows researchers to recreate even the most complex biological processes with high precision. Compared to the previous approaches, this is the most cost effective solution by a large margin, and also allows experiments that would be implausible in vivo or in vitro. In silico models ares transforming the world of biological trials and medical research in general.

To illustrate the differences between the three methods, a single in vivo experiment might require weeks of planning and data collection; a typical in vitro experiment can be conducted in several hours to days of preparation and analysis of the samples; meanwhile, an in silico experiment can be performed in just a few minutes to hours, with minimal preparation.

In this work, we focus on the mesenchymal migration mode, in which the cell develops protrusions to pull or push nearby collagen fibers, as can be seen in the following animation. Note the very small speed of the cell, taking minutes or even a couple of hours to grow and retract the protrusions. This results in the cell needing several hours to move very small distances.

The growth of protrusions is based on the activation of some chemical receptors placed on the surface of the cell. These receptors allow the cell to find chemical cues in its surroundings. Finally, the interaction of the protrusions with the environment results in the actual cell movement.

We built a numerical simulator of the three processes that results in the cell behaviour, namely: a) chemical sensing, b) protrusion dynamics, and c) cell motion. The simulator was based on a previous work (Ribeiro et al., 2017), which has been improved to reduce the overall computational cost. Still, each simulation needed 1.5 hours on average in a computer cluster. In the next animations, we can see how the cell grows protrusions and moves within the extracellular matrix. In the first animation, the cell sensors are unable to detect the chemical gradient, resulting in a random trajectory. In the second animation, the cell detects higher concentrations on the left side, following the chemical gradient in the environment.

In order to gather information about the mechanisms, each process is modeled by a series of equations with physical meaning, resulting in a plethora of free parameters that affect the simulation results. Examples of these parameters include the protrusion expansion rate or the amount of chemical signal needed to activate the growth of a protrusion. These parameters need to be calibrated so that the resulting simulation is a valid representation of the in vitro or the in vivo experiments. For the optimization of the simulator parameters, we used SigOpt’s software that is designed to automate this process

Using SigOpt and its advanced features helped us solve a number of challenges that are typically endemic to these optimization processes:

  • Metric Selection: We wanted to find the parameters that produce a distribution of protrusion lengths and number of protrusions as close as possible to the same distribution for in vitro data that had previously been collected.
  • Complex parameter space: There were 9 parameters, combining both continuous and discrete (integer) parameters, which become impractical for manual tuning or for other automated approaches.
  • Noise: Simulations were stochastic, producing noisy results.
  • Multimetric Optimization: Because the simulator is a numerical approximation of the physical processes, the parameters that produce the best match in lengths might not be the best match in number of protrusions. Thus, we did not want a single solution, but a set of optimal solutions that allowed us to adapt the simulator depending on the process that we wanted to analyze.
  • Parallelism: We wanted to run multiple simulations in parallel to exploit the computer cluster infrastructure.
  • Explainability: We wanted to analyze the importance of each parameter in the metrics that were considered.

The metric that we used to match the distributions of the in vitro and in silico data was the Bhattacharyya coefficient. This coefficient gives a ratio of how similar are two different distributions, with BC=1 for identical distributions and BC=0 for non-overlapping distributions. It does not assume anything about the distributions, so it can be applied to discrete or multimodal data. In the next animation, we can see how SigOpt is trying to find the distributions produced in silico that match the in vitro distribution.

We used SigOpt’s multimetric optimization to find the set of parameterizations that were optimal for the metrics that we defined. First, we see how difficult is the optimization process. We can see that the modeling simplifications of the in silico model make it virtually impossible to find Bhattacharyya coefficients greater than 0.8 or 0.9. However, SigOpt is able to find a set of interesting parametrizations. The best sets of parameters were validated using different concentrations and chemical gradients in the cell environment. The calibrated simulator was able to correctly predict the behavior and velocity of the cell.

Finally, we did a parameter sensitivity analysis based on SigOpt’s parameter importance results. We found some surprising results about which parameters are relevant that might be useful for future research.

Intuitively, the parameters for birth and retraction of protrusions are the most important for the number of protrusions; while the growing factors are the most relevant for the length based metric. We also included a threshold for the binarization of the chemical signals, which turned out to be the most relevant overall, highlighting the importance of signal processing for the cell.

Ruben Martinez-Cantin Advisor
Francisco Merino-Casallo Guest Author