Deciding exactly what you want is hard. In a search system, you care about accuracy, but user engagement and revenue are also important. When doing medical research, efficacy is important, but the side effects and cost also play a role. Boiling all of the key metrics you care about down to a single value, or Overall Evaluation Criterion (OEC), is a difficult, but important process for achieving your goals.
Picking the wrong metrics can be detrimental to your business. Microsoft ran an experiment that decreased search relevance but increased queries and ad revenue per user metrics. The experiment was able to get users to submit more searches and click more ads because they were struggling to find the actual results they were looking for. It can be easy to pick an OEC that looks good and beats your control but ultimately does the wrong thing for your business.
Picking A Single Metric
When optimizing several metrics at once it can become difficult to make progress. It can be hard to move forward if the only way a change can be considered a win is if none of the metrics decrease. In practice, there are usually tradeoffs that are made and analyzed with any change. A slight drop in revenue might be acceptable when it is coupled with a large surge in user engagement. A much more effective drug is worth pursuing if it doesn’t raise the cost too much. Instead of deciding case-by-case if these tradeoffs are worth making, collapsing them into a single metric will help you make consistent decisions, and give your team a single target to build toward.
The key to building a good OEC is thinking about long-term metrics that are core to the business, such as user retention, active users, and user happiness. If long-term goals are sacrificed for short-term gains, you can inadvertently cause significant damage. A single metric allows for faster, more quantified iteration and the ability to leverage powerful optimization techniques.
At SigOpt, we can help you raise this metric automatically and optimally whether you are optimizing an A/B test, machine learning system, or physical experiment. We can also help you design an Overall Evaluation Criterion that fits your business and users. Check us out for free today!