In an earlier post, we introduced the topic of multicriteria optimization and some mechanisms for solving multicriteria problems (also called multiobjective problems) using the SigOpt optimization tool; SigOpt provides a RESTful API to solve scalar black-box optimization problems for customers from fields such as industrial design and image processing. The earlier introduction discussed a somewhat simple problem of choosing the driving speed on a trip subject to concerns about minimizing both the time in transit and the cost of fuel. In this post, we want to analyze a more complex situation in which the parameters of a given model produce a random output, and our multicriteria problem involves maximizing the mean while minimizing the variance of that random variable.

The standard example in literature of simultaneously considering the mean behavior of a model/process and the variance is in financial portfolio management. Generally, investors want to maximize expected gain while minimizing risk (which exists in the form of variance) which motivated the development of the Nobel prize-winning theory of mean-variance analysis. The original work and a 2014 retrospective by the same author serve as helpful introductions to the theory of defining (even implicitly) a utility function and, potentially, an associated quadratic (mean-variance) approximation. Under some conditions (discussed in those references) an expert’s selection of a portfolio from the mean-variance Pareto frontier should approximately optimize the utility function.

Our example today is less practical and more fun^{1}. We start with a mouse that has some ability to intelligently traverse a maze and study how long it takes the mouse to solve a randomly generated maze. Then we try to tune the parameters of the random maze generation algorithm to create mazes which are very difficult for the mouse to solve. This will be phrased as a multicriteria optimization problem, with the goal of maximizing the mean time to solve while minimizing its variance.

**Variations on a theme**

The motivating goal of mean-variance analysis involves both wanting good performance (high portfolio return) and wanting to avoid bad performance (high risk and potential for loss). These are certainly related but are not necessarily the same thing.

For example, gamblers who play slot machines are told on the machine exactly how much they will win/lose on average by playing the slot machine. If the slot machine paid out $0.90 for every $1.00 put into it, with no variance whatsoever, I doubt anyone would play^{2}. But, gamblers may be willing to accept a low mean payout in exchange for a high variance because the higher variance implies a greater chance for a million dollar payout. In this setting, the gambler embracing the high likelihood of a bad performance in exchange for a potentially exceptional performance.

Other circumstances where our goal would not be to maximize the mean and minimize variance include many industrial processes: in the formation of a tool such as a nail, the goal is not to maximize the mean length of the nail but rather to minimize deviation from the desired length. This goal was formalized with the development of the Taguchi loss function^{3}, which served as a scalarization of the multicriteria optimization problem involving minimizing both the variance and the deviation from the desired outcome.

**Maze solver description**

Our maze solver strategy is the very simple wall follower strategy, which will, admittedly, only work on simply connected mazes (no loops, no levels). Given that, it is quite easy to implement and has predictable (non-random) behavior which helps push the entirety of the randomness to the maze generation algorithm.

For those of you interested, you can see the implementation of the maze construction and solution in our examples Github repo.

The strategy involves the mouse continually following the wall on the right all the way through the maze until reaching the end. If the mouse can turn right, it does so, otherwise, it moves forward; if both those paths are blocked it attempts to turn left. If all three directions are blocked, the mouse has reached a dead end and reverses course, continuing to hug the right wall to guide it forward. There are an outstanding explanation and graphic of this wall follower strategy available here which I will not try to supersede. The figure below, instead, shows why the wall follower strategy succeeds for simply connected mazes but may fail for mazes which have their exit inside a loop.

## Maze generator description

It is quite tempting for myself, as a research-oriented person, to see the problem of designing a maze as an opportunity to spend several days devising diabolical labyrinths under the cover of “developing content marketing”. I restrained, however, if only because so many great resources are already available on the internet. In particular, Wikipedia already had available a perfectly acceptable starting point involving a random depth-first search. Furthermore, the resulting maze structure guarantees that the failure to solve observed in Figure 1 cannot be encountered.

The only shortcoming with the code already available is that there were no tunable parameters. We needed to introduce a mechanism for making the mazes harder or easier for the mouse to solve; we did this by varying the relative probability of the depth-first search choosing each of the possible directions with which it could move, after eliminating those directions which it has already traversed. Essentially, at each step of the maze generation, there is a relative probability of punching through a wall. The figure below shows one such randomly generated maze, which has no preference for any direction.

By allowing the creation algorithm the possibility of different probabilities, mazes will start to gain a more identifiable structure, as demonstrated in the figure below. The probability of moving left, up, right and down are the parameters that we tune using Bayesian optimization.

**Problem Structure**

At this point, we have a relatively smart mouse with a fixed maze solving strategy and a random maze generation tool with 3 tunable parameters (the four construction direction relative probabilities minus the need for probabilities to sum to^{4}. Our goal is to create a maze which maximizes the average time required to solve the maze while minimizing the standard deviation^{5} of that time. Basically, how can we make a maze which is consistently difficult?

A precursor to that involves studying the impact randomness has on the mouse’s solution time, and how that random variable changes as a function of the various parameters. The histograms in the figure below would not be practical in a production setting but are provided here to hint at the impact of the parameters on the mouse’s performance. Mouse maze solves times are described in terms of the proportion of total cells in the maze; the minimum possible value is 0 and the max is 2^{6}. The mouse always starts at the top left cell and the exit is always at the bottom right cell, for simplicity.

**Devising a solution strategy**

So, how can we use scalar Bayesian optimization to provide a solution to this problem? Depending on the situation, there may be multiple ways to interpret “solution” for a multicriteria optimization problem:

- Some scalar function of the multicriteria components could be formed, which is then optimized to define the “solution”. This would require only a single scalar optimization but would require prior understanding as to how the scalarization balances the mean and variance.
- An approximation to the Pareto frontier can be formed by solving several scalarized versions of the problem. A selection from these scalarized solutions can then be made by a domain expert to determine which is most appropriate given the level of risk aversion. This strategy is the focus of our work today.

As a baseline, the figure below was generated using a 1000 point low discrepancy grid sampling over the 3D parameter space (varying *left, up, right* construction probabilities while holding *down* fixed): it shows the spread of mean/variance mouse solve times estimated over 100 randomly generated 30×30 mazes.

*left, up and right*turn parameters will be estimated over 100 maze constructions. The figure below shows the result of such optimizations, as conducted using SigOpt.

The mean and variance times computed for a given set of parameters are not unique to the weight in the scalarization; as such, we can reuse information from previous weight values as “initial data” for the next weighted sum experiment^{7}. This allows us to use only 20 function evaluations per scalarized experiment without a substantial loss in quality. Because the mean and variance are not convex functions of the turn parameters, however, some of the optimization solutions are not present on the true Pareto frontier.

To work around any potential lack of convexity in the component functions, we can instead phrase the problem as an epsilon-constrained scalar optimization, similarly to what was discussed in our earlier multicriteria post. Each of these scalar optimization problems can be phrased as minimizing only the standard deviation, subject to the mean being greater than a pre-chosen constraint. The image below was created with 10 constraint-based experiments, using mean constraints evenly spaced in [.95, 1.2].

**Choosing a solution**

So, after all of this, it is important to remember that simply forming the Pareto frontier does not solve the practical issue of designing the product: an expert must choose the “best” solution from the approximate frontier. Because we are not maze experts, we simply present some of the possible choices in the figure below and allow the reader to make their own judgment. In a more practical/industrial setting, the balance imposed by the expert may take the name utility or risk aversion.

**Conclusion**

In many settings there is a need to balance performance against robustness: Toyota may want to increase fuel efficiency but it cannot produce vehicles which gain 20% more efficiency at the cost of failing to start 1 out of every 5 times they are used. This tradeoff between improving average behavior and introducing a greater potential for bad behavior is the crux of this post, which we have addressed in the context of making a better mouse-stumping maze.

We have used SigOpt to identify multiple approximately Pareto efficient maze configurations from which an expert (maze designer, not an expert mouse) could select the maze which best balances difficulty and consistency. For these multicriteria problems, the need to efficiently conduct black box optimization is amplified because of the need to generate multiple solutions on the Pareto frontier. Using Bayesian optimization has a multiplicative benefit because each point on the Pareto frontier is generated at less experimental cost than with, e. g., grid search.

Again, any interested readers can feel free to experiment on their own using our implementation of the maze construction and solution available in our examples Github repo.

**Special Thanks**

Again, I would like to extend my thanks to Devon Sigler, an expert in multicriteria optimization who will shortly be finishing his Ph.D. He has helped introduce me to the wide range of literature that discusses multicriteria optimization and have provided valuable insights regarding the theoretical implications of popular solution strategies.