Modeling with the Modern Machine Learning Stack

Scott Clark and Nick Payton

This post is part of a five-part series. Follow these links to read any post in the series:


Enterprise machine learning is a messy, experiment-driven, performance-oriented process that is different from either analytics or software engineering. Machine learning problems often lack linear solutions, which clouds the entire process in uncertainty and unpredictability. Modeling itself often requires a combination of skills, including domain, data science, machine learning, optimization and devops expertise. And any model is only as useful as the applicability of rules it produces. The modern machine learning stack is designed to give modelers the best odds of navigating this messy situation to produce models with impact. Why do an already hard job with one hand tied behind your back?

This is the fourth post in this series. The first post explained why Enterprise AI is actually a combination of three distinct markets. The second post made the case for differentiated modeling as the future of AI in the enterprise. And the third post weighed the difference in technology needs for teams building more basic versus more differentiated models. 

This post lays out a summary of the modern machine learning stack, including the core components and some of the capabilities that enterprise modeling teams need to build differentiated models. For a more comprehensive overview, check out TWIML’s Definitive Guide to Machine Learning Platforms that provides Facebook, Airbnb and LinkedIn case studies before breaking down the software capabilities that enable their world-class artificial intelligence. 

Summary of the Modern Machine Learning Stack

Data Management & Feature Engineering

Except for certain reinforcement learning or unsupervised learning techniques, the machine learning process starts with data and features of this data. Feature engineering and management plays a big role in the experimentation process as well, so should be considered as having one foot in each camp. Here is a short overview of capabilities and a brief on the landscape:

  • Data preparation, such as labeling, annotating or cleaning data, among other steps, is typically among the first steps. Labeling so critical to computer vision that multiple startups have raised more than $100M to solve this problem, largely for companies building autonomous vehicles. 
  • Data pipelines are a separate critical component. Data pipelines are an evolving space with a combination of startups with managed open source solutions, incumbent big data providers with embedded solutions and a collection of homegrown solutions, some of which have been open sourced by bigger companies. 
  • Feature stores support the feature engineering process by giving modelers a searchable database of features and relevant insights on them. These are mostly homegrown to date, but there are startups working to make this a standardizable part of the modeling stack. 

Experimentation, Training, & Tuning

Developing models is a messy process. There is an engineering component that requires coding to get a model up and running. But this is paired with experimentation that typically requires running into a few dead ends before getting a model-architecture-hyperparameter combination that works – and, ideally, is best suited – for a given problem. As a consequence, solutions in this space largely focus on either automating menial tasks that do not change from problem to problem, tidying up this messy process with workflow-oriented features, sharing back algorithmically-derived insights to modelers that help them advance their modeling process forward or a combination of all of these. As opposed to some of the data focused solutions like labeling, most modeling solutions are designed to be lightweight APIs that easily fit in any coding environment. In the interest of full disclosure, SigOpt offers a full suite of solutions that covers all of these capabilities. Here are a few of note: 

  • Experiment management includes experiment tracking, code snapshots, model artifact tracking, collaboration tooling and, where applicable, version control. This software automatically organizes a full history of these modeling attributes in a web dashboard experience that is easily searchable, filterable and analyzable by any individual or team in a collaborative way. This space includes open source solutions that tend to be rather prescriptive, commercial solutions that are far more flexible and fully featured, and basic offerings bundled into end-to-end cloud platforms. 
  • Training support is largely focused on deep learning where training models is a challenge in and of itself. This features set includes automated early stopping, checkpoint-enabled convergence monitoring, training curve visualizations and training run comparisons, among other analysis. There is open source for components of this, like Tensorboard for visualization, but most often this combination is only available in specialized commercial solutions or integrated into some end-to-end platform offerings.
  • Automated hyperparameter optimization includes an API for applying automated tuning search algorithms (including neural architecture search), parallelization, multitask optimization, best-seen trace, parallel coordinates and parameter importance, among other analytics. Fully featured tuning enablement solutions give modelers the freedom to use any combination of parameters, apply proprietary intelligent algorithms like Bayesian optimization, make it easy to parallelize jobs to take advantage of available compute and are engineered to return new suggestions in milliseconds. There are a variety of open source algorithms, but only commercial offerings include complete solutions fit for enterprises processes.
  • Metric tracking includes the ability to track a high number of metrics, apply them as constraints in any given experiment and tune multiple at the same time. This systematic approach to metrics helps teams evolve their metric strategy, define metrics more comprehensively and select the right set of metrics for any given modeling job. These capabilities are often related to and enabled by the other components, so are typically embedded in specialized commercial solutions. 
  • Cluster orchestration enables distributed training, distributed tuning, automated scheduling, resource allocation and resource optimization that are all designed to abstract away the devops complexity of training models from either individual modelers or modeling teams. Most companies are still implementing homegrown solutions to this problem, but this space is evolving quickly, and includes popular open source, specialized commercial vendors and some capabilities embedded into end-to-end platforms. 

Deployment & Monitoring

This end of the modeling process involves the most engineering and the least amount of data science work, though there is still valuable overlap. Once a model is built, it must be served as usable code (typically an API) for whichever products or services it will power. In some cases, they may need to be implemented in phases or with online experimentation to continuously fine-tune their performance. And because the circumstances around which these models were built is constantly evolving, these models need to be constantly monitored for performance drift. 

  • Serving: Serving is sometimes straightforward and at other times needs to be specialized to a particular set of needs. Autonomous vehicles, for example, have specialized needs for how they are serving their models when they are deployed to the edge. There are a combination of open source, specialized commercial and embedded services in big cloud offerings that have various capabilities related to these needs. 
  • Monitoring: It is critical to continuously monitor models in production so that as data drifts, you know when the model is no longer performing for a given task. There are specialized commercial offerings for this capability and some embedded in end-to-end systems. Most often these are bundled with serving functionality in a complete solution.
  • Online Experimentation: Continuously testing and improving models in production is the gold standard where possible or reasonable (which is not the case in many circumstances). There are capabilities for enabling A/B model testing in production, such as TensorFlow Extended and some embedded in big cloud offerings, but most of these capabilities are designed in house so they can meet specific requirements.

This ends our abbreviated guide to different capabilities that define the modern machine learning stack. In the next post we will come back to our own domain expertise and dive deeper on how we think about solutions in the model experimentation and engineering space. 

As always, we are happy to discuss in more detail with anyone interested in a conversation. Email [email protected]. Separately, you can try our solution, sign up for blog updates, or join our beta program for new functionality.

Scott Clark, Ph. D.
Scott Clark Co-Founder & Chief Executive Officer
Nick Payton
Nick Payton Head of Marketing & Partnerships

Want more content from SigOpt? Sign up now.