ONNX Runtime Performance Tuning

ONNX Runtime provides high performance for running deep learning models on a range of hardwares. Based on usage scenario requirements, latency, throughput, memory utilization, and model/application size are common dimensions for how performance is measured.

While ORT out-of-box aims to provide good performance for the most common usage patterns, there are model optimization techniques and runtime configurations that can be utilized to improve performance for specific use cases and models.


Table of contents