Olive - hardware-aware model optimization tool

Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation. It works with ONNX Runtime as an E2E inference optimization solution.

Given a model and targeted hardware, Olive composes the best suitable optimization techniques to output the most efficient model(s) and runtime configurations for inferencing with ONNX Runtime, while taking a set of constraints such as accuracy and latency into consideration. Techniques Olive has integrated include ONNX Runtime Transformer optimizations, ONNX Runtime performance tuning, HW-dependent tunable post training quantization, quantize aware training, and more. Olive is the recommended tool for model optimization for ONNX Runtime.

Examples:

  1. BERT optimization on CPU (with post training quantization)
  2. BERT optimization on CPU (with quantization aware training)

For more details, pls refer to Olive repo and Olive documentation.