# OpenVino Quantization with SigOpt

Artificial Intelligence, Classification, Deep Learning, Hyperparameter Optimization, Machine Learning, ResNet, Supervised, Training & Tuning

Deep Learning for classification tasks involves training the parameters of a neural network to identify a variety of object classes. This is achieved by feeding multiple images of labeled data to the neural network, while updating the parameters to increase performance. However, there are a huge number of parameters, which makes the inference computations slow and memory-intensive.

Quantization helps reduce the size of the neural network, while also maintaining high performance accuracy. This is especially important for on-device applications, which have limited memory size and computation capabilities. Quantization for deep learning is the process of approximating a neural network that uses floating-point numbers by a neural network of low bit width numbers. This significantly reduces the memory footprint and computational cost of using neural networks.

Today, I’m going to walk you through the quantization of a simple ImageNet Classifier using a Residual Network. We selected the ResNet-18 model, which is the version of ResNet models that contains the fewest layers (18). Using the smaller model and dataset will speed up training and download time. We can further speed up inference time with compression techniques, such as Quantization. Quantization can be supercharged with hyperparameter tuning. Step by step, I’ll explain how you can use SigOpt to test out multiple hyperparameter configurations in an automated fashion, arriving at a classification model that dramatically reduces memory requirements and computational costs.

Intel NNCF provides a suite of advanced algorithms for Neural Network inference optimization in Intel OpenVINO™ with minimal accuracy drop. Today, we’ll start off with a PyTorch example of a ResNet-18 classification model. To see other ResNet models, visit PyTorch hub.

### Starting with data:

To keep things simple, we’ll use a standard dataset to train our model: Tiny ImageNet-200. The dataset is a subset of the larger ImageNet dataset that that consists of 200 classes. The image is the input and the class label is the output. It has 100,000 images of shape 3x64x64, with classes such as snake, spider, cat, truck, grasshopper, gull, etc.

If you uncomment the first two lines, it should install a recent version of the OpenVino and NNCF, in case you don’t have it in your environment already:

#!pip install openvino-dev[onnx,tensorflow2]==2021.4.*
#!pip install nncf[torch]

from pathlib import Path

import time
import zipfile

from urllib.request import urlretrieve

import torch
import nncf  # Important - should be imported directly after torch
from nncf import NNCFConfig
from nncf.torch import create_compressed_model
from nncf.torch import register_default_init_args

import torch.nn as nn
import torchvision.datasets as datasets
import torchvision.models as models
import torchvision.transforms as transforms

The pip install method will work with virtualenv or your own preferred Python environment management system.

### Setting a baseline:

Using NNCF for model compression assumes that the user has a pre-trained model and a training pipeline. Here we demonstrate one possible training pipeline: a ResNet-18 model pre-trained on 1000 classes from ImageNet is fine-tuned with 200 classes from Tiny-Imagenet. Let’s go ahead and retrieve the pre-trained model:

fp32_pth_url = "https://storage.openvinotoolkit.org/repositories/nncf/openvino_notebook_ckpts/302_resnet18_fp32.pth"
fp32_pth_path = Path(MODEL_DIR / (BASE_MODEL_NAME + "_fp32")).with_suffix(".pth")
urlretrieve(fp32_pth_url, fp32_pth_path)
model = models.resnet18(pretrained=True)

We’ve now downloaded the pretrained ResNet-18 model and applied the pre-trained weights. From here, we can start the quantization process.

nncf_config_dict = {
"input_info": {"sample_size": [1, 3, image_size, image_size]},
"log_dir": str(OUTPUT_DIR),  # log directory for NNCF-specific logging outputs
"compression": {
"algorithm": "quantization",  # specify the algorithm here
"initializer": {
"range": {
"num_init_samples": 128
}
}
}
}
nncf_config = NNCFConfig.from_dict(nncf_config_dict)
compression_ctrl, model_quantization = create_compressed_model(model, nncf_config)
print(f"Accuracy of initialized quantization model: {acc1:.3f}")

Here we define a baseline, non-tuned model, and then proceed to score it. We’ll use this to compare the full set of our SigOpt-optimized models in a moment. The above code shows a compression algorithm using quantization with default values for most parameters and some baseline values for other parameters. Depending on your use case, these values may make or break your model in production.

### Setting up SigOpt:

# Install SigOpt's client library
!pip install sigopt
import sigopt

# Create a connection to SigOpt using either your Development or API token
from sigopt import Connection

api_token = "YOUR_API_TOKEN_HERE"
conn = Connection(client_token=api_token)

Now that we’ve established our connection to SigOpt, it’s time to define the functions that create and evaluate our model:

def create_config_dict(assignments):
ret_nncf_config_dict = {
"input_info": {"sample_size": [1, 3, image_size, image_size]},
"log_dir": str(OUTPUT_DIR),  # log directory for NNCF-specific logging outputs
"compression": {
"algorithm": "quantization",  # specify the algorithm here
"initializer": {
"range": {
"num_init_samples": assignments['num_init_samples']
}
},
},
}
return ret_nncf_config_dict

def evaluate_model(assignments):
nncf_config_dict = create_config_dict(assignments)
nncf_config = NNCFConfig.from_dict(nncf_config_dict)
compression_ctrl, model_quantization = create_compressed_model(model, nncf_config)
return ret_acc1

Note that we’ll only be tuning 1 parameter in this model for simplicity; however, we can tune multiple different parameters in this model. Multiple parameters would be resource intensive for grid search and even random search to sweep. But with Bayesian optimization and other global algorithms in SigOpt, we can run a multi-parameter hyperparameter optimization job efficiently.

Now it’s time to configure the SigOpt experimentation loop, including its metrics, the parameters you want to test, and their bounds (minimum and maximum):

experiment = conn.experiments().create(
name="Tiny ResNet Compression - Vanilla SigOpt",

parameters=[
dict(name="num_init_samples", bounds=dict(min=64,max=2048), type="int")
],

metrics=[
dict(name="acc1", objective="maximize", strategy="optimize")
],

observation_budget = 120,
)

print("Explore your experiment: https://app.sigopt.com/experiment/" + experiment.id + "/analysis")

We use a much higher observation_budget here, not only to get better results, but also because this is a relatively fast-training model on a relatively small dataset. You can visit the link output by the above cell, but you won’t see any data until you run the following code block to actually execute the experiment loop you just set up:

#Optimization Loop

for _ in range(experiment.observation_budget):

suggestion = conn.experiments(experiment.id).suggestions().create()
assignments = suggestion.assignments
value = evaluate_model(assignments)

conn.experiments(experiment.id).observations().create(
suggestion=suggestion.id,
value=value
)

#update experiment object
experiment = conn.experiments(experiment.id).fetch()

assignments = conn.experiments(experiment.id).best_assignments().fetch().data[0].assignments

print("BEST ASSIGNMENTS \n", assignments)

At the bottom of your notebook or interpreter’s output, you should see the best set of parameters SigOpt was able to find in 120 automated training runs.

### Background on NNCF’s parameters:

If you’d like to experiment with tuning more parameters, here is some background on the parameters from the Quantization configuration:

• range: num_init_samples, number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
• range: type, type of initializer. Determines which statistics gathered during initialization will be used to initialize the quantization ranges. “mean_min_max” is used by default.
• precision: type, type of precision initialization – either “manual” or “hawq”. With “manual”, precisions are defined explicitly via “bitwidth_per_scope”. With “hawq”, these are determined automatically using the HAWQ algorithm.
• precision: bits, list of bitwidth to choose from when performing precision initialization. Overrides bitwidth constraints specified in weight and activation sections”.
• precision: iter_number, maximum number of iterations of Hutchinson algorithm to estimate Hessian trace, 200 by default.
• precision: tolerance, minimum relative tolerance for stopping the Hutchinson algorithm. It’s calculated between mean average trace from previous iteration and current one. 1e-4 by default.
• precision: compression_ratio, desired ratio between bits complexity of fully INT8 model and mixed-precision lower-bit one.

## Wrapping up:

While we explored classification today on a toy dataset, you can also apply these techniques to ResNet-50, ImageNet-1000, or a number of other models and datasets in which you want to classify an outcome based on many numerical features. Keep in mind that your application may require extensive data cleaning, which you’ll have to account for on your own. But once you’ve cleaned and prepped your data, you can easily see how SigOpt facilitates a robust, efficient, and well-tracked experimentation process. Aside from ease-of-use, SigOpt delivers much better optimization performance, when compared with human-tuned parameters or an exhaustive approach like grid search. Next time we’ll explore a more complex setup.