ONNX Runtime for Inferencing

ONNX Runtime provides a performant solution to inference models from varying source frameworks (PyTorch, Hugging Face, TensorFlow) on different software and hardware stacks. ONNX Runtime Inference takes advantage of hardware accelerators, supports APIs in multiple languages (Python, C++, C#, C, Java, and more), and works on cloud servers, edge and mobile devices, and in web browsers.

Learn how to install ONNX Runtime for inferencing →

Benefits

Improve inference latency, throughput, memory utilization, and binary size

Run on different hardware using device-specific accelerators

Use a common interface to run models trained in different frameworks

Deploy a classic ML Python model in a C#/C++/Java app

ONNX Runtime Mobile

ONNX Runtime Mobile runs models on mobile devices using the same API used for cloud-based inferencing. Developers can use their mobile language and development environment of choice to add AI to Android, iOS, react-native, MAUI/Xamarin applications in Swift, Objective-C, Java, Kotlin, JavaScript, C, and C++.

Examples

Image Classification

This example app uses image classification to continuously classify the objects detected from the device's camera in real-time and displays the most probable inference results on the screen.

Android Image Classifier →

Speech Recognition

This example app uses speech recognition to transcribe speech from the audio recorded by the device.

iOS Speech Recognition →

Object Detection

This example app uses object detection to continuously detect the objects in the frames seen by the iOS device's back camera and display the detected object's bounding boxes, detected class, and corresponding inference confidence.

Android Object Detection → iOS Object Detection →

Question Answering

This example app showcases usage of question answering models with pre and post processing.

Android Question Answering → iOS Question Answering →

See more examples of ONNX Runtime Mobile on GitHub. →

ONNX Runtime Web

ONNX Runtime Web allows JavaScript developers to run and deploy machine learning models in browsers, which provides cross-platform portability with a common implementation. This can simplify the distribution experience as it avoids additional libraries and driver installations.

Video Tutorial: Inference in JavaScript with ONNX Runtime Web →

Examples

ONNX Runtime Web Demo is an interactive demo portal that showcases live use of ONNX Runtime Web in VueJS. View these examples to experience the power of ONNX Runtime Web.

MobileNet, trained on ImageNet → SqueezeNet, trained on ImageNet → Emotion FerPlus → Yolo → MNIST →

Image Classification

This example demonstrates how to use a GitHub repository template to build an image classification web app using ONNX Runtime Web.

Classify images in a web application →

Speech Recognition

This example demonstrates how to run whisper tiny.en in your browser using ONNX Runtime Web and the browser's audio interfaces.

Run whisper tiny.en in your browser →

Natural Language Processing (NLP)

This example demonstrates how to create custom Excel functions to implement BERT NLP models with ONNX Runtime Web to enable deep learning in spreadsheet tasks.

Custom Excel Functions for BERT NLP Tasks → Custom Excel Functions for BERT NLP →

On-Device Training

ONNX Runtime on-device training extends the Inference ecosystem to leverage data on the device to train models.

Learn more about on-device training →