Transformers have unlocked significant potential for natural language processing (NLP). No architecture has demonstrated the potential of Transformers more than BERT, which beat state-of-the-art benchmarks in question answering, general language understanding, and commonsense inference. But BERT is also very, very large. So, practically, using BERT in an applied setting requires the capacity to tune at scale, track and visualize runs to understand model behavior, and weigh tradeoffs between model size and accuracy.
In this talk at Ray Summit, SigOpt Machine Learning Engineer Meghana Ravikumar applies Experiment Management, Metric Management, and Multimetric Bayesian Hyperparameter Optimization with Ray to weigh practical tradeoffs for BERT that hold implications for applied machine learning settings. Through these experiments, Meghana develops Efficient BERT: configurations of BERT that are much smaller than a relevant benchmark model, but attempt to retain similar accuracy.
Watch the recording of Meghana’s talk, Efficient BERT, at Ray Summit
Are you interested in trying SigOpt? Take advantage of a limited time offer to test our product for free or check out our docs to learn how our API and dashboard work.
Are you interested in reproducing Meghana’s work? Here are a few resources to get you started:
- Free access to SigOpt
- Blog post explaining how to get BERT up and running
- Colab notebook to get started
- Code repo to get started
- Example experiment to see how training and tuning are tracked in the dashboard
- Blog post explaining how to integrate SigOpt and Ray for distributed tuning jobs
Are you interested in learning more? Here are a variety of resources on Efficient BERT:
- A short literature review on Transformers and their importance for NLP
- How to integrate SigOpt and Ray for scalable, distributed hyperparameter optimization
- How to get Efficient BERT up and running with code examples
- Explanation of SigOpt features that are useful for Efficient BERT
- Recommendations for how to understand BERT model behavior
- First blog post introducing Efficient BERT with a focus on the architecture
- Second blog post defining the experiments run for Efficient BERT
- Third blog post discussing the results of the Efficient BERT experiments
- Summary of the three-part blog post series explaining these Efficient BERT experiments
Use SigOpt free. Sign up today.