$\def\xx{\mathbf{x}} \def\yy{\mathbf{y}} \def\zz{\mathbf{z}} \def\mD{\mathbf{D}} \def\mG{\mathbf{G}} \def\mW{\mathbf{W}} \def\mX{\mathbf{X}} \def\mY{\mathbf{Y}} \def\RR{\mathbb{R}} \def\cF{\mathcal{F}} \def\cN{\mathcal{N}} \def\cR{\mathcal{R}} \def\cX{\mathcal{X}} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\me}{e}$

# Highlight: Efficient Nonmyopic Batch Active Search

Our research team at SigOpt has been very fortunate to be able to collaborate with outstanding researchers around the world, including through our academic and internship programs.  In our Highlight blogs, we take the opportunity to feature work by these collaborators.  This post introduces the article Efficient Nonmyopic Batch Active Search by Shali Jiang, Gustavo Malkomes, Matthew Abbott, Benjamin Moseley and Roman Garnett, appearing in the upcoming NeurIPS 2018 proceedings. Gustavo is a former intern in the SigOpt research team, who is currently pursuing his Ph. D. degree at Washington University in St. Louis.

Bayesian optimization is the efficient search for the optimum of a function.  This article considers an adjacent topic: active search. In lieu of having a continuous objective function over a continuous domain, we consider a finite domain $$\cX=\{\xx_i\}_{i=1}^n$$ such that there is a rare, valuable subset $$\cR\subset\cX$$.  We have the ability to test a selected point $$\xx$$ to determine, at great cost, whether it is valuable (with value 1) or irrelevant (with value 0). Our goal during this search is to identify as many of the items $$\xx\in\cR$$ as possible, within a limited budget of available selections. This is an analog of Bayesian optimization where the rewards are binary and we wish to maximize the cumulative reward.

This active search strategy has proven useful in drug discovery and product recommendations.  Garnett et al., 2012 introduced Bayesian optimal policy and its myopic approximations for active search, and recently we extended this work by considering the remaining budget when conducting the search. A similar idea has been used in Bayesian optimization (González et al. 2016). Our nonmyopic approach demonstrates remarkable empirical improvement over myopic policies, but has been limited by its sequential nature.  Our new article addresses this by providing two strategies for selecting a batch of points for parallel evaluation. Batch selection is fundamentally difficult, due to a combinatorial explosion: a batch acquisition function is a set function over proposed batch members, and maximizing this function to find the optimal batch is inherently difficult.

One strategy is sequential simulation. Analogous to the “Kriging believer” or “constant liar” strategies for batch construction in Bayesian optimization (Ginsbourger et al. 2010), we build a batch by simulating a sequential policy to select each member, using fictional labels (e.g. the most-likely label) to update the model when selecting the next point. The second strategy is to generalize the sequential nonmyopic score (the number of successfully identified valuable items) to the batch setting. This score becomes a set function of the batch, and we adopt the greedy approach to approximately maximize it, inspired by the conjecture that it is submodular.

Both strategies perform well on a wide range of batch sizes and application domains (e.g. drug/materials discovery). Interestingly, we find that if we always assume negative fictional labels for sequential simulation, the performance is the best. Our hypothesis is that assuming negative yields a more diverse batch, and this may give insight to oracle design for batch Bayesian optimization strategies.

Querying points in parallel certainly improves efficiency. However, we must also make each decision in a less adaptive fashion than the sequential case. This introduces an inherent, unavoidable cost in terms of the expected number of positives found given the same budget: the “cost of parallelism.” We prove that the cost of parallelism grows at least linearly in the batch size. This theoretical bound is shown to match our empirical result!

If you are not attending NeurIPS, you can still check out our poster.  Additionally, you can watch the video below to learn more about this work.  If you’re attending NeurIPS 2018, we invite you to the spotlight talk (Thu Dec 6th 03:35 — 03:40 PM @ Room 220 CD) and to stop by the poster presentation:

Thu Dec 6th 05:00 — 07:00 PM @ Room 210 & 230 AB #131

Michael McCourt Research Engineer
Shali Jiang Guest Author
Roman Garnett Guest Author