Highlight: Beyond the Pareto Efficient Frontier: Constraint Active Search for Multiobjective Experimental Design, ICML 2021

Gustavo Malkomes, Harvey Cheng, Eric Lee, and Michael McCourt
Active Search, Advanced Optimization Techniques, Materials Science, Multimetric Optimization

We are excited to present our most recent work, “Beyond the Pareto Efficient Frontier: Constraint Active Search for Multiobjective Experimental Design,” at the upcoming International Conference of Machine Learning (ICML) 2021. In this blogpost, we will briefly discuss the key contributions of this work; for more details, please check the full version of our paper and come by our poster on July 22 9:00 – 11:00 AM PDT and checkout our 5 min spotlight presentation.

Collaborating with materials scientists

Our motivation for this work comes from collaborations with the outstanding researchers from the Laboratory for Advanced Materials at Pittsburgh (LAMP), directed by Professor Paul Leu. LAMP uses computational modeling and experimental research to design and understand nanomaterials and surfaces for several applications. We have discussed some of our joint projects in previous blog posts, here, here, and hereIn one of our more recent works, we designed an efficient Bayesian optimization strategy to accelerate the discovery of optimal design parameters for an omnidirectional anti-reflective glass. We discussed how this new glass, generated from this Bayesian optimization strategy, can be used to improve the performance of photovoltaic solar power panels in this paper.

Figure 1. Bayesian Optimization efficiency can help confirm physical properties. From Haghanifar et al. (2020) Optica 7(7).

Development and production discrepancy

One of the takeaways from this experience is that scientists and engineers actively seek to understand tradeoffs between competing objectives and their design parameters/configurations. The numerical simulations they run during the development of these new materials might be different from the results during fabrication, both in terms of metrics and parameters. This implies we should search for configurations that perform consistently despite these physical imprecisions of the equipment. Figure 2 shows the discrepancy between a CAD layout of a component and the actual fabricated sample.  

Figure 2: Discrepancy between CAD model and fabricated result

Let’s illustrate these challenges in a simple example. Figure 3 shows a two-metric problem in two dimensions. We don’t have access to the analytical form of these functions, only to the function values at the locations we queried. In the literature, we say that we have black-box access to these functions. We also assume that we are interested in metric values that are above some predefined thresholds. In this example, let’s say that we want function values that are above 80%, and the yellow color represents them. The left panel displays the two competitive metrics and their respective global optimum locations. The right panel combines both metrics into a single visualization. We also plot the range of possible metric values for both metrics alongside four observations placed precisely on the Pareto efficient frontier for this problem (blue line). Finally, the bottom right visualization shows the metric space graph. We see that the Pareto efficient frontier portrays the relationship between the metrics: it is impossible to maximize one metric without sacrificing the performance on the other metric.

Figure 3. An example of a two dimension problem with two competitive metrics (left). The Pareto efficient frontier reveals all points that tradeoff one metric vs the other (right).

If our knowledge about these two metrics were 100% accurate, then the best “answer” to this problem is to select as many points on the blue dashed line as possible. However, the observations we collect during development are not a perfect representation of what we will get during production. Consider, for example, that the “true” global optimum for both metrics is a little bit off from what we measure during development. In that case, the even the optimal Pareto front from development renders no information about the real problem in production!

Constraint Active Search

For that reason, we want to consider a more robust mathematical formulation. That’s it, one that gives us a better understanding of promising parameter configurations, even in the presence of minor deviations. That’s why we propose Constraint Active Search (CAS)! This new problem formulation focuses on searching for high-performing regions of the parameter space. Here, we consider that configurations that satisfy all user’s constraints are high-performing, and we call the set of points that meet the given constraints the satisfactory region. Figure 4 exhibits the difference between traditional multimetric optimization and our proposed Constraint Active Search formulation.

Figure 4. Multimetric optimization solution is limited to a “thin” line in the parameter space. Constraint active search seeks to reveal information about a more extensive, high-performing region of the parameter space, where high-performance is judged based on the user imposed constraints to the metric values.

To solve this problem, we also propose a new algorithm called Expected Coverage Increase (ECI). As the name suggests, ECI searches for points that can “cover” a volume of the expected satisfactory region. ECI needs a parameter that controls our coverage radius. The expectation we mentioned earlier is a consequence of our uncertainty over the unknown function values. In the paper, we give formal guarantees about ECI’s ability to cover the satisfactory region.

Figure 5. An illustration of our utility function when constraint active search is run on the 2D sinusoids problem in Bryan et al. (2005). Each of the four subfigures above plots a single realization of the underlying stochastic model and outlines the decision boundary of the satisfactory region in magenta. The observations within the boundary are marked with blue crosses, and a dotted blue ball of a given radius is drawn around each to indicate coverage. A ball is also drawn around the candidate point, itself marked with a black star. The improvement of the utility is the volume of the blue region, and expected coverage improvement is the average value of all realizations.

Finally, we demonstrate the performance of our proposed algorithms empirically. Crucially, we evaluate multiobjective problems using multiple criteria. To evaluate the diversity of our final recommendations, we consider two metrics: the fill distance and coverage recall. The former is a standard metric of point diversity and the later represents the fraction of volume induced by the selected points. We introduce the coverage recall in Section 4.2 of the paper. We also consider the hypervolume of the region bounded by the Pareto points and the thresholds values. The hypervolume tends to be large if we approximate the Pareto Frontier well. The last metric we keep track is the number of positive points, points that meet all constraints. We compare our algorithm against multiple baselines and across several domains. Here we refer the reader to the original paper to see all the experiments and understand all baselines (Section 4). Still, one representative result is the additive manufacturing example we used for designing new types of glass. Figure 6 shows the results of this experiment.

Figure 6. Partial results for the additive manufacture example. Some baselines were omitted from this exposition to keep this blogpost shorter.

ECI is designed to find diverse samples. As a result, it performs best at the coverage recall metric, which evaluates how much volume of the satisfactory region was covered by the selected points. It also finds diverse configurations, as indicated by the low fill distance value. These results give strong evidence that ECI excels at what it was designed to do: find diverse configurations that meet the user’s constraints. If finding the Pareto efficient frontier is essential for your application, Bayesian Optimization is the method of choice: it outperforms all baselines at maximizing the hypervolume of the region in metric space bounded by the Pareto frontier and the defined thresholds. If finding points above the thresholds is of utmost importance (that is more common for discrete search spaces), then the best methods are active search methods such as the one-step used here or the more recent ENS. Nevertheless, we think ECI, or more generally CAS, are great tools for accelerating scientific discovery as it informs experts about both the metrics and parameters tradeoffs. 

Notice that this phenomenon is not restricted to experimental design! In fact, in many practical situations (for example, model development), the results that we gather during development will be different from the results obtained during production, either because these settings are different (numerical simulation/actual fabrication) or simply because time has passed (covariate shift).

What’s Next?

Here at SigOpt, we are committed to empower the world’s experts. This work provides an exciting new tool for experts around the world to work on new cutting-edge research, using constraint active search as an accelerator for scientific discovery. In fact, we are excited to add Constraint Active Search to SigOpt’s toolkit in the upcoming weeks! If you haven’t signed up yet, sign up today to know more about this upcoming feature.

Use SigOpt free. Sign up today.

Gustavo Malkomes Research Engineer
Harvey Cheng Research Engineer
Eric Lee Research Engineer
Michael McCourt Research Engineer