ONNX Runtime Performance Tuning

ONNX Runtime provides high performance for running deep learning models on a range of hardwares. Based on usage scenario requirements, latency, throughput, memory utilization, and model/application size are common dimensions for how performance is measured.

While ORT out-of-box aims to provide good performance for the most common usage patterns, there are model optimization techniques and runtime configurations that can be utilized to improve performance for specific use cases and models.

Profiling tools
Logging & Tracing
Memory consumption
Thread management
I/O Binding
Troubleshooting

ONNX Runtime Performance Tuning

Table of contents