In previous posts on SigOpt Fundamentals, we introduced the concept of Gaussian processes and covariance kernels. Using these and other tools, SigOpt helps companies efficiently optimize their critical metrics, including revenue in A/B tests, the accuracy of machine learning models, and the output of complex simulations. To do this for problems which can have many variables/parameters, we suggest a sequence of experiments for a company to conduct. These experiments create data with as many dimensions as the company has parameters, and one of our goals at SigOpt is to use this observed data to predict unobserved values so as to determine the optimal values for these critical metrics. We call this process of taking data and making predictions approximation.
The need for approximation may not immediately be obvious; if you are driving and you need to know your speed, you probably just check the speedometer rather than approximating based on the last five times you looked down. Indeed, in a perfect world, everyone could conduct arbitrarily many experiments, thereby easily producing data at any desired location and eliminating the need for approximation or optimization. Unfortunately, the experiments most often of interest are costly, requiring a heavy commitment of time and/or resources. SigOpt tries to minimize this expenditure by extracting as much information as possible from a limited number of experiments. The figure below has a graph on the left showing observed data and three possible approximations to that data which can be used to make predictions.=
When SigOpt wants to create an approximation, we first turn to Gaussian processes; they are only part of a larger scheme, but probably the most important part. Approximation with Gaussian processes is significant because predictions from Gaussian processes perform interpolation on data. We say that a process interpolates observed data if the predictions perfectly reproduce all observations. The figure above on the right shows examples of predictions which interpolate given data — contrast the interpolants on the right with the approximations on the left which do not interpolate the data. Fidelity is one term we use to describe how exactly predictions match observations: an interpolant has perfect fidelity.
As suggested in the figure above, there is no unique approximation that can be used to make predictions from given data; there are in fact infinitely1 many predictive curves, even if we demand perfect fidelity. How, then, can we decide which of them should be used to conduct predictions? To do this, we use a mechanism called regularization 2 which gives us a criterion by which we can distinguish different interpolants with the same fidelity. For this setting, the regularity of a prediction curve is defined as how “calm” it is; in the figure below, we depict predictions with a variety of regularities. In general, it is preferable to have curves with high regularity because predictions are better behaved and less prone to erratic action.
So, let’s rehash where we are:
We can produce infinitely many interpolating predictions to observed data. Each of these has perfect fidelity. We can (somehow) measure the regularity of a prediction curve. We prefer high regularity curves to prevent erratic predictions. High fidelity (predictions closely follow the data) and high regularity (predictions are not erratic) are not related.
Based on what we have seen thus far, one strategy to make effective predictions is to create an interpolant with high regularity. Luckily, Gaussian processes (which interpolate observed data) automatically maximize a key measurement3 of regularity, making them a great tool for this job. Unfortunately, Gaussian process predictions are not appropriate in every situation. We will see below that perfect fidelity is not always desirable, requiring a modification from interpolation to make viable predictions.
One important reason that SigOpt modifies our Gaussian process predictions is that SigOpt customers often have observations in the presence of uncertainty. We refer to such observations as noisy4 data, where any observed value has some variance and cannot be fully trusted. The figure below introduces the idea that measurements can be accompanied by an estimate of variance, which describes that an observed value is only a guide to the possible values at that location.
When dealing with noisy data (as almost all of us are) the fidelity of predictions can be a tricky issue. On one hand, we want to respect the observed results, if for no other reason than because they were expensive to obtain and it is hard to justify an expensive experiment if the results are unused. On the other hand, for an observation with uncertainty, we should want to predict the most likely value at that location, not simply the value that we happened to observe. Below we see a figure that depicts this conundrum; in particular, as more noisy data is included, interpolating predictions can have unreasonable oscillations. In our search for optimal parameters, these oscillations could be misleading and cause unnecessary experimentation, thereby costing customers time and resources.
Any attempt to predict unobserved outcomes from noisy data requires a balance between fidelity and regularity: a balance between believing your eyes and recognizing that they may lie to you5. Choosing this balance appropriately is itself an interesting problem which may be addressed with, e.g., cross-validation. SigOpt provides customers the opportunity to define uncertainty with their observations and using this knowledge we can balance observed results against their variance to make predictions and identify the true behavior behind the uncertainty. By making this tradeoff appropriately we can guide you to better results with less trial and error. Sign up today for a free trial to help you cut through the noise and get to better results.