ONNX Runtime Release Roadmap

ONNX Runtime is released on a quarterly basis. Patch releases are published between major releases as necessary.

Previous release
1.19.2
Release date: 9/4/2024
In-Progress Release
1.20
Release date: 10/30/2024
Next release
1.21
Release date: Feb. 2025

Announcements

All ONNX Runtime Training packages have been deprecated. ORT 1.19.2 was the last release for which onnxruntime-training (PyPI), onnxruntime-training-cpu (PyPI), Microsoft.ML.OnnxRuntime.Training (Nuget), onnxruntime-training-c (CocoaPods), onnxruntime-training-objc (CocoaPods), and onnxruntime-training-android (Maven Central) were published. ONNX Runtime packages will stop supporting Python 3.8 and Python 3.9. This decision aligns with NumPy Python version support. To continue using ORT with Python 3.8 and Python 3.9, you can use ORT 1.19.2 and earlier.

New Packages

We are planning to start releasing the following packages:

  • Maven package with Android support for QNN EP
  • CocoaPods package with Mac / iOS support for ORT GenAI

Versioning Updates

We are planning to upgrade ONNX Runtime support for the following (where the first value is the highest version previously supported and the second value is the version support that will be added in ORT 1.20):

  • ONNX 1.16.1 --> 1.17.0
  • TensorRT 10.2 --> 10.4
  • DirectML 1.15.1 --> 1.15.2

Major Updates

In addition to various bug fixes and performance improvements, ORT 1.20 will include the following major updates:

  • Add MultiLoRA support.
  • Improve CPU FP16 and INT4 performance.
  • Increase GenAI API model support, including Whisper, Phi-3.5-vision multi-frame, and more.
  • Publish Phi-3.5 ONNX model variants to Hugging Face.
  • Expand mobile support to include GPU EP and FP16 support for CoreML EP and XNNPACK kernels.
  • Add Apple support for AI Toolkit for VS Code.

Feature Requests

To request new ONNX Runtime features to be included in a future release, please submit a feature request through GitHub Issues or through GitHub Discussions.

To ensure that your request is addressed as quickly as possible, please:

  • Include a detailed title.
  • Provide as much detail as possible in the body of your request (e.g., use case for the feature, the platform(s) or EP(s) this feature is needed for, etc.).
  • Apply a label corresponding to the appropriate ONNX Runtime area (e.g., "platform:mobile", "platform:web", "ep:CUDA", etc.) if you know it.

Note: All timelines and features listed on this page are subject to change.

ONNX Runtime 1.20

Tentative release date: 10/30/2024

Announcements

All ONNX Runtime Training packages have been deprecated. ORT 1.19.2 was the last release for which onnxruntime-training (PyPI), onnxruntime-training-cpu (PyPI), Microsoft.ML.OnnxRuntime.Training (Nuget), onnxruntime-training-c (CocoaPods), onnxruntime-training-objc (CocoaPods), and onnxruntime-training-android (Maven Central) were published. ONNX Runtime packages will stop supporting Python 3.8 and Python 3.9. This decision aligns with NumPy Python version support. To continue using ORT with Python 3.8 and Python 3.9, you can use ORT 1.19.2 and earlier.

Build System & Packages
  • Upgrade ONNX support from 1.16.1 to 1.17.0.
  • Add Python 3.12 support for Windows ARM64.
  • Add vcpkg support.
  • Digitally sign DLLs in Maven build.
Core
  • Add MultiLoRA support.
  • Improve ThreadPool to spend less time busy waiting.
  • Improve memory utilization, particularly related to external weights.
  • Improve partitioning.
Performance
  • Add FP16 SLM model support on CPU.
  • Add INT4 quantized embedding support on CPU and CUDA.
EPs

TensorRT

  • Upgrade TensorRT support from 10.2 to 10.4.
  • Enable DDS, including performance fixes for NMS.

QNN

  • Add HTP shared weights context binary.
  • Add runtime support for HTP shared weights in multiple ORT sessions.
  • Add efficient mode support.

OpenVINO

  • Add context generation memory optimizations.
  • Add efficient mode support.

DirectML

  • Upgrade DirectML support from 1.15.1 to 1.15.2.
Mobile
  • Add Android QNN support, including a pre-build package, performance improvements, and Phi-3 model support.
  • Add GPU EP support for ORT Mobile.
  • Add FP16 support for CoreML EP and XNNPACK kernels.
Web
  • Add quantized embedding support.
  • Add on-demand weight loading support, which offloads wasm32 heap and enables 8B-parameter LLM models.
  • Add support for wasm64 through a custom build (will not be included in released packages).
  • Add GQA support.
  • Improve performance for integrated Intel GPU.
  • Add support for Opset 21, including Reshape, Shape, and Gelu.
GenAI
  • Add continuous decoding support, including chat mode and system prompt caching.
  • Introduce MultiLoRA API.
  • Add Whisper model support.
  • Add Phi-3.5-vision multi-frame model support.
  • Add Phi-3.5 and Llama-3.1 model support on Qualcomm NPU.
  • Introduce packages for Mac/iOS.
Extensions
  • Improve performance profiling and optimize tokenization.
  • Increase multi-modal model support, including more kernel attributes.
  • Add Unigram tokenization model support.
  • Remove OpenCV dependency from C API build.