ICYMI – SigOpt Summit Recap Democratizing End-to-End Recommendation Systems with Jian Zhang

Steven Stein


In this blog post we are reviewing the SigOpt Summit presentation by Intel on how to Democratize End-to-End Recommendation Systems . During this talk, Jian Zhang, a Software Engineering Manager at Intel, gave an overview of how Intel is improving recommendation systems for customers. He reviewed the different types of recommendation systems in use today, some challenges with deploying recommendation systems, and how his team improved  the performance, training times, and ease of deploying recommendation systems.

This blog will detail how Intel used SigOpt in to improve the training time for Wide and Deep (WnD) and DLRM models. As an example Intel used SigOpt was able to improve their training times on WnD networks by 2.86 while gaining key insights into which parameters affected their models the most.

Recommendation Systems

Recommendation systems  are software agents that elicit the interests and preferences of individual consumers […] and make recommendations accordingly. Recommendation Systems aim to predict if an item would be useful to a user based on given information (context). Steadily growing within the last few years, including retail & e-commerce, healthcare, transportation etc. Recommendation accounts for as much as 35% of the revenue on some commercial platforms and optimizing these systems yields large returns for customers. A 1% improvement in quality of recommendations can translate into billions of dollars in revenue.

Recommendation systems vary in design based on availability of exploitable data, using implicit and explicit user feedback, as well as the domain characteristics.

There are four types of recommendation systems organizations typically create:

  • Collaborative recommendation is leveraging community data (the taste info from many users) to provide predicts, like tell me what’s popular among my peers, it collects and analyze user’s behavioral information like their feedback, ratings, preferences, and activities, then exploits similarities amongst several users or items to make suitable recommendations
  • Content based recommendation works like “show me more of the same what I’ve liked”. It is based on the description of the item and a profile of the user’s tastes, it recommends items similar to those which a given user has liked in the item rate list. It is suited to situations where there is known data on an item (name, location, description, etc.). 
  • Knowledge based recommendation based on explicit knowledge about the item assortment, user preferences, and recommendation criteria, it works like “Tell me what fits based on my needs, e.g., when I buy a car, I have specific requirements on the color I want”.  The strength is the non-existence of cold start (ramp-up) problems
  • Hybrid: combinations of various inputs and/or composition of different mechanism. Most recommender systems now use a hybrid approach, combining collaborative filtering, content-based filtering, and other approaches. 


Challenges with Recommendation Systems

Creating effective recommendation systems is not without its challenges. Here are some of the major challenges modeles face when creating recommendation systems.

  • Huge datasets: Recommendation systems are often trained on large datasets (TB or PB), which requires a large cluster to store and process the huge dataset, usually introduces slow data movement.
  • Data preprocessing: Datasets need to be loaded, cleaned, preprocessed and transformed into a suitable format for DL models and frameworks. Needs various data processing technologies like Batch/Streaming data.
  • Feature Engineering: numerous sets of features needs to be created and tested.
  • Models & algorithm: complex models and algorithms create an entry barrier for citizen data scientists
  • Repeated experiments:  ETL, feature engineering, training, and evaluation process need to be repeated many times, with many model architectures; also needs periodic retraining to maintain high accuracy.
  • Huge embedding tables: Categorical features requires embedding and requires large amount of memory, lookup is bandwidth intensive.
  • Scale-out training: extremely high computation power requires distributed training which makes scalability critical

End-to-End Democratization

Today, AI is restricted to data scientists and data analysts who are specifically trained in AI. Intel’s goal is to make AI accessible & affordable to everyone and expand it to citizen data science users. One major issue is AI requires high-end hardware so one solution is to make it scalable to commodity hardware.

What to democratize?

  • Data accessibility & quality: Make data access easier & simpler & faster by democratizing the data management – Data ingestion, data warehouses and data lakes. The goal is to explore & visualize & process the data seamlessly.
  • Storage & compute platforms: The Infrastructure E2E AI should runs on: Cloud computing, auto-scaling, GPU vs. CPU. The goal is to democratize the HW so the models can run on scalable infrastructure on commodity hardware
  • Algorithms: improve the use, development and sharing of ML & DL algorithms. The goal is to democratize the algorithms – Reduce entry barriers, automatic model searching, AutoML, improve explanation of the results
  • Model development: select the most suitable model and democratize the end-to-end model development, training and deployment 
  • Marketplace: improve access to the models and dmocratize the use, exchange, and monetization of data, algorithms, models and outcomes

Intel’s Strategy

  • Prototype w/ selected top models (DLRM, DIEN, WnD), optimize it from end-to-end perspective and reduce the E2E time to an acceptable range
  • Identify the critical optimizations, missing functions and implement new features
  • Build scale-out democratization solution w/ user-guided automation via parameterized models & Sigopt autoML


  • E2E democratization delivered 103.61x, 17.48x and 9.45x E2E (ETL & training) performance speedup for DLRM, WnD and DIEN respectively
  • Key optimizations/contributors:
    • Scale-out data processing with Spark to replace single thread python data processing (DLRM, DIEN) and scalable distributed training with multiple Xeon nodes
    • Intel optimized training frameworks  – IPEX & Intel optimized TF 
    • Lighter models – reduced MLP & NN layers, reduced communication overhead on Ethernet
    • Optimizer tuning (DLRM) to converge faster with larger batch size
    • Feature Engineering (embedding table, encoding) optimizations 

Spark Speed-ups

  • Significant speedup w/ Spark parallel data processing
  • Spark tunings & scale out delivered 2.32x speedup
  • RecDP: enhanced RecSys data processing
    • Pluggable module based on PySpark based on pySpark
    • Easy to use API  (Categorify, FillNA, DropNA, UDF…) that familiar for data scientist
    • Transparent performance optimizations on PySpark – (1) Adaptive Join Strategy, (2) inline scala UDFs, (3) leveraging Intel OAP Native Engine to improve Spark performance
    • Support multiple Recsys models like DLRM, DIEN, WnD, RecSys, more WIP
    • Plugin mode that can be easily integrated into other E2E platforms

Training Speed-ups

Key optimizations

  • Intel optimized TensorFlow: apply OpenMP and KMP optimizations (AFFINITY, NUM_THREADS etc.) for CPU
    • Distributed training: horovod scaling delivered 1.93x speedup from 1 node to 4 nodes
    • Lighter Model: reducing sparse embedding size helped to reduce horovod communication data size, delivered better scaling performance, 4 nodes training delivered 2.7x speed up over 1 node; reducing deep hidden unit from [1024, 1024, 1024, 1024, 1024] to [1024, 512, 256] delivered 1.14x speedup
    • Early stop: stop training when [email protected] reached pre-defined value (0.6553) , training took 904 steps delivered 4.14x speedup
  • Input pipeline optimization: data loader prefetch and tuned data loader thread # and buffer size to improve efficiency
  • Hyper-parameter tuning learning rate tuning (0.00048 -> 0.001)

SigOpt: enable SigOpt for hyper parameter tuning, and multi metrics & metric threshold, delivered 2.86x speedup

Why SigOpt?

  • Key features Intel used from SigOpt
    • A model development platform that makes it easy to track runs, visualize training, and scale hyperparameter optimization for any type of model built with any library on any infrastructure.
    • An experimentation platform that brings together experiment management, job orchestration, and intelligent optimization in a seamless user experience.
  • Scale-out democratization w/ SigOpt
    • Leverage SigOpt for hyper-parameter tuning
    • Leverage SigOpt to keep track of the experiments
    • Leverage SigOpt for AutoML
    • Customized, parameterized models though SigOpt yaml configuration files for advanced optimizations
    • Integrate SigOpt w/ RecDP and other framework/libraries

SigOpt Results

SigOpt E2E democratization

The SigOpt optimized models delivered excellent results by transforming the heavy lifting manual tunings & optimizations to autoML. The SigOpt optimizations delivered the same performance (same MAP value) for WnD and AUC for DLR. This means modelers can automatically get the best tuned results without having to manually tune their parameters

  • Best value experiments training time reduced 3.18x for WnD, 1.07x for DLRM

SigOpt E2E Democratization – parameter analysis

SigOpt provided an easy to use and a standard user interface to help compare and understand the models. Visuals are used to show which parameters matter the most to know which parameters and parameter ranges will best affect the models.

SigOpt experiments analysis

  • The multiple vertical axis graph with metrics and parameters showed that larger dnn_hidden1 and smaller learning_rate get higher MAP. So, users can change the parameter range and rerun the optimization loop to find optimal parameters.
  • The horizontal axis is ordered by parameter importance, as SigOpt optimization proceeds, the most important param will be limited in a small range, so for further optimization, we can freeze those parameters to reduce the number of parameters.
  • With metric thresholds, SigOpt can focus on sampling more promising parameter regions.
  • With SigOpt auto optimization, we can get optimal parameters with higher MAP and lower training time over manually optimized parameters.

SigOpt E2E Democratization – mutimetric and metric threshold

Performance regression after switch to SigOpt AutoML

  • Best value training time was better than manual tuned model, and requires less experiments

SigOpt multimetric and metric thresholds

  • It considers multiple competing metrics which have optimal values for different parameters, maximize model performance and minimize training time
  • Reduced the number of experiments to get the desired best models
  • Significant training time reduction compared with w/o multimetric & metric thresholds


Use CaseSigOpt Improvement
WnD Training Speedup3.18x
DLRM Speedup1.07x
AutoML Model performance
Similar performance without manual tuning
DLRM Best Model with SigOpt Multimetric and multi-Threshold1.07x
WnD Training Speedup - Multi-metric and multi-threshold2.86x
DLRM Training Speedup - Multi-metric and multi-threshold1.36x
  • E2E AI democratization brings AI to commodity hardware through assisted autoML, improved AI accessibility, delivers good-enough performance w/ lower TCO
  • Prototype on DLRM, DIEN, W&D, and Recsys showed E2E AI democratization delivered significant improvement in E2E training time w/ same performanc
  • Major democratization techniques including parallel data processing, scale-out training, lighter models and hyper-parameter tunings
  • Those optimization can be scaled out through enhanced data processing and assisted SigOpt AutoML. SigOpt with SDA delivered similar results as manual optimized models and makes it easy to scale-out the democratized models/workloads

Take Action

To try SigOpt today, you can access it for free at sigopt.com/signup.

Steven Stein Product Marketing Lead