Applying Prior Beliefs in SigOpt

Harvey Cheng and Michael McCourt
Advanced Optimization Techniques, All Model Types, Modeling Best Practices

Over the past 5 years, SigOpt has been developed to accelerate and amplify the impact of modelers everywhere.  Our customers are experts in building and applying models in computer vision, finance, and risk management, among other fields.  In becoming experts in these fields, modelers often build up intuition regarding the values of parameters which are more often high performing.  SigOpt’s newest feature, Prior Beliefs, provides customers a structured fashion with which to inform our computations about this intuition.

Possible Prior Beliefs

In principle, prior beliefs can be very abstract; however, SigOpt has a very specific structure for defining prior beliefs over parameters.  We define prior beliefs through probability density functions: parameter assignments with pdf value 2 are interpreted to have prior preference twice as strong as those with pdf value 1.  Roughly speaking, users can create such a density for a given parameter to help guide SigOpt to initially favor searching regions with higher prior beliefs.

For parameters of type double, we allow two different prior distributions: normal and beta.  This likely will be expanded in the future, but our current requests from customers fall mostly into these two conditions.  The figure below shows some examples of the kinds of prior beliefs that we allow.

A variety of prior beliefs for normal and beta distributions.Figure 1: Examples of prior beliefs, and a variety of parametrizations, which can be passed into SigOpt.  left: Normal distribution. right: Beta distribution.

Utilizing these prior beliefs requires parameters to be chosen.  For the normal pdf, there is a \(\mu\) parameter mean defining the center of the prior belief and a \(\sigma\) parameter scale defining the shape of the prior belief.  For the beta pdf, there are \(a\) and \(b\) parameters defining the preference for low or high parameter values (passed as shape_a and shape_b).  This SigOpt web page provides the ability to experiment with different values to sculpt your prior beliefs before defining them in a SigOpt experiment.

A demonstration of how to use the prior belief development tool on SigOpt

Figure 2: Demonstration of how to utilize the SigOpt prior beliefs development tool.

Prior Beliefs in SigOpt Experiments

A special SigOpt experiment was run using prior beliefs to explicitly show how these can affect the optimization process.  The figure below shows the impact of different prior beliefs on the early portion of a SigOpt experiment (this experiment was special as it was set to only use the prior beliefs and perform no additional modeling).  The prior beliefs are most noticeable during the early portion of the experiment, while the prior beliefs are strongly guiding the search.

Demonstration of the impact of prior beliefs on SigOpt's initialization process.

Figure 3: Image from the Analysis page of a SigOpt experiment using Prior Beliefs; this is a special experiment, created to only use prior beliefs (and never learn from Observations).  The points are more densely sampled in the region with higher value of the prior densities (as shown with the adjoining graphs on the bottom and left).

For a standard SigOpt experiment, setting effective prior beliefs can produce a lift in performance.  When they are set poorly, with high pdf values in poor performing regions, SigOpt’s optimization process may suffer.  We can compare the results of SigOpt experiments conducted with effective prior beliefs versus those ineffective prior beliefs in the figure below.

A 2D problem for which good and bad prior beliefs were defined, and the comparison of the good and bad choices on performance.

Figure 4: A sample problem in 2D, with very effective prior beliefs and very ineffective prior beliefs.  Their performance is compared as a function of the number of observations, to demonstrate the gap between good and bad prior beliefs.

There is an obvious gap in improvement between the good prior beliefs and the bad prior beliefs. Eventually the SigOpt engine can overcome these bad prior beliefs, but this should demonstrate the need for effective prior knowledge to gain a performance boost. As problems become more complicated (note this is only a 2D problem), the gap between good priors, no priors, and bad priors will increase. We hope that users find this tool beneficial and inform us of other prior beliefs they want SigOpt to support.

HarveyCheng
Harvey Cheng Research Engineer
MichaelMcCourt
Michael McCourt Research Engineer