Optimizing Your Model for Inference with PyTorch Quantization

What is Quantization

How to Use PyTorch Quantization

import copyfrom torch.ao.quantization import get_default_qconfig
from torch.ao.quantization.quantize_fx import convert_fx, prepare_fx
from torchvision.models import resnet50
fp32_model = resnet50().eval()model = copy.deepcopy(fp32_model)
# `qconfig` means quantization configuration, it specifies how should we
# observe the activation and weight of an operator
# `qconfig_dict`, specifies the `qconfig` for each operator in the model
# we can specify `qconfig` for certain types of modules
# we can specify `qconfig` for a specific submodule in the model
# we can specify `qconfig` for some functioanl calls in the model
# we can also set `qconfig` to None to skip quantization for some operators
qconfig = get_default_qconfig("fbgemm")
qconfig_dict = {"": qconfig}
# `prepare_fx` inserts observers in the model based on the configuration in `qconfig_dict`
model_prepared = prepare_fx(model, qconfig_dict)
# calibration runs the model with some sample data, which allows observers to record the statistics of
# the activation and weigths of the operators
calibration_data = [torch.randn(1, 3, 224, 224) for _ in range(100)]
for i in range(len(calibration_data)):
model_prepared(calibration_data[i])
# `convert_fx` converts a calibrated model to a quantized model, this includes inserting
# quantize, dequantize operators to the model and swap floating point operators with quantized operators
model_quantized = convert_fx(copy.deepcopy(model_prepared))
# benchmark
x = torch.randn(1, 3, 224, 224)
%timeit fp32_model(x)
%timeit model_quantized(x)

How to Do Numerical Debugging after Quantization

# Compare weights of float_model and qmodel.
import torch.ao.ns._numeric_suite_fx as ns
# Note: when comparing weights in models with Conv-BN for PTQ, we need to compare
# weights after Conv-BN fusion for a proper comparison. Because of this, we use
# `prepared_model` instead of `float_model` when comparing weights.
# Extract conv and linear weights from corresponding parts of two models, and save
# them in `wt_compare_dict`.
resnet50_wt_compare_dict = ns.extract_weights(
'fp32', # string name for model A
model_prepared, # model A
'int8', # string name for model B
model_quantized, # model B
)
# calculate SQNR between each pair of weights
# SQNR is a measure of quantization loss, large SQNR value means the quantization loss is small
ns.extend_logger_results_with_comparison(
resnet50_wt_compare_dict, # results object to modify inplace
'fp32', # string name of model A (from previous step)
'int8', # string name of model B (from previous step)
torch.ao.ns.fx.utils.compute_sqnr, # the function to use to compare two tensors
'sqnr', # the name to use to store the results under
)
# massage the data into a format easy to graph and print
# Note: no util function for this since use cases may be different per user
# Note: there is a lot of debugging data, and it will be challenging to print all of it
# and fit on a laptop screen. It is up to the user to decide which data is useful for them.
resnet50_wt_to_print = []
for idx, (layer_name, v) in enumerate(resnet50_wt_compare_dict.items()):
resnet50_wt_to_print.append([
idx,
layer_name,
v['weight']['int8'][0]['prev_node_target_type'],
v['weight']['int8'][0]['values'][0].shape,
v['weight']['int8'][0]['sqnr'][0],
])
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
# a simple line graph
def plot(xdata, ydata, xlabel, ylabel, title):
fig = plt.figure(figsize=(10, 5), dpi=100)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.title(title)
ax = plt.axes()
ax.plot(xdata, ydata)
# plot the SQNR between fp32 and int8 weights for each layer
# Note: we may explore easier to read charts (bar chart, etc) at a later time, for now
# line chart + table is good enough.
plot([x[0] for x in resnet50_wt_to_print], [x[4] for x in resnet50_wt_to_print], 'idx', 'sqnr', 'weights, idx to sqnr')

Summary

--

--

--

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Why you should learn machine Learning?

Replication of the CRFNet

Deployment and Productionization of Machine Learning Models — From Acme to Nadir — II

Holopix50k: A Large-Scale, In-the-Wild Stereo Image Dataset

Unsupervised Learning with k-means Clustering With Large Datasets

Crossbow: Scaling Deep Learning with Batch Sizes on Multi-GPU Servers

Genetic Algorithm: Part 3 — Knapsack Problem

The Best Resources on Artificial Intelligence and Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ODSC - Open Data Science

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

More from Medium

Running Multiple Applications on The Same GPU

Best Practices for Neural Network Exports to ONNX

Solliance makes headlines with cryptocurrency news analysis platform powered by Azure Machine…

Distributed Training across multiple nodes in fastai