Olive - hardware-aware model optimization tool

Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation. It works with ONNX Runtime as an E2E inference optimization solution.

Given a model and targeted hardware, Olive composes the best suitable optimization techniques to output the most efficient model(s) and runtime configurations for inferencing with ONNX Runtime, while taking a set of constraints such as accuracy and latency into consideration. Techniques Olive has integrated include ONNX Runtime Transformer optimizations, ONNX Runtime performance tuning, HW-dependent tunable post training quantization, quantize aware training, and more. Olive is the recommended tool for model optimization for ONNX Runtime.


  1. BERT optimization on CPU (with post training quantization)
  2. BERT optimization on CPU (with quantization aware training)

For more details, pls refer to Olive repo and Olive documentation.