Welcome to ONNX Runtime

ONNX Runtime is a cross-platform machine-learning model accelerator, with a flexible interface to integrate hardware-specific libraries. ONNX Runtime can be used with models from PyTorch, Tensorflow/Keras, TFLite, scikit-learn, and other frameworks.

How to use ONNX Runtime

Get started with ORT	API Docs
Tutorials	Ecosystem
ONNX Runtime YouTube

Contribute and Customize

Build ORT Packages

ONNX Runtime GitHub

QuickStart Template

ORT Web JavaScript Site Template

ORT C# Console App Template

ONNX Runtime for Inferencing

ONNX Runtime Inference powers machine learning models in key Microsoft products and services across Office, Azure, Bing, as well as dozens of community projects.

Examples use cases for ONNX Runtime Inferencing include:

Improve inference performance for a wide variety of ML models
Run on different hardware and operating systems
Train in Python but deploy into a C#/C++/Java app
Train and perform inference with models created in different frameworks

How it works

The premise is simple.

Get a model. This can be trained from any framework that supports export/conversion to ONNX format. See the tutorials for some of the popular frameworks/libraries.
Load and run the model with ONNX Runtime. See the basic tutorials for running models in different languages.
(Optional) Tune performance using various runtime configurations or hardware accelerators. There are lots of options here - see the Performance section as a starting point.

Even without step 3, ONNX Runtime will often provide performance improvements compared to the original framework.

ONNX Runtime applies a number of graph optimizations on the model graph then partitions it into subgraphs based on available hardware-specific accelerators. Optimized computation kernels in core ONNX Runtime provide performance improvements and assigned subgraphs benefit from further acceleration from each Execution Provider.