ONNX Runtime Release Roadmap

ONNX Runtime is released on a quarterly basis. Patch releases are published between major releases as necessary.

Previous release
1.19.2
Release date: 9/4/2024
In-Progress Release
1.20
Release date: 10/30/2024
Next release
1.21
Release date: Feb. 2025

Announcements

  • All ONNX Runtime Training packages have been deprecated. ORT 1.19.2 was the last release for which onnxruntime-training (PyPI), onnxruntime-training-cpu (PyPI), Microsoft.ML.OnnxRuntime.Training (Nuget), onnxruntime-training-c (CocoaPods), onnxruntime-training-objc (CocoaPods), and onnxruntime-training-android (Maven Central) were published.
  • ONNX Runtime packages will stop supporting Python 3.8 and Python 3.9. This decision aligns with NumPy Python version support. To continue using ORT with Python 3.8 and Python 3.9, you can use ORT 1.19.2 and earlier.
  • ONNX Runtime 1.20 CUDA packages will include new dependencies that were not required in 1.19 packages. The following dependencies are new: libcudnn_adv.so.9, libcudnn_cnn.so.9, libcudnn_engines_precompiled.so.9, libcudnn_engines_runtime_compiled.so.9, libcudnn_graph.so.9, libcudnn_heuristic.so.9, libcudnn_ops.so.9, libnvrtc.so.12, and libz.so.1.

New Packages

We are planning to start releasing the following packages:

  • Maven package with Android support for QNN EP
  • CocoaPods package with Mac / iOS support for ORT generate() API

Versioning Updates

We are planning to upgrade ONNX Runtime support for the following (where the first value is the highest version previously supported and the second value is the version support that will be added in ORT 1.20):

  • TensorRT 10.2 --> 10.4
  • DirectML 1.15.1 --> 1.15.2
  • Python 3.13 support will also be added.
  • ONNX 1.17 support will be included in a future release.

Major Updates

In addition to various bug fixes and performance improvements, ORT 1.20 will include the following major updates:

  • Add MultiLoRA support.
  • Improve CPU FP16 and INT4 performance.
  • Increase generate() API model support, including Phi-3.5-vision multi-frame and more.
  • Expand mobile support to include GPU EP and FP16 support for CoreML EP and XNNPACK kernels.
  • Add Apple support for AI Toolkit for VS Code.

Feature Requests

To request new ONNX Runtime features to be included in a future release, please submit a feature request through GitHub Issues or through GitHub Discussions.

To ensure that your request is addressed as quickly as possible, please:

  • Include a detailed title.
  • Provide as much detail as possible in the body of your request (e.g., use case for the feature, the platform(s) or EP(s) this feature is needed for, etc.).
  • Apply a label corresponding to the appropriate ONNX Runtime area (e.g., "platform:mobile", "platform:web", "ep:CUDA", etc.) if you know it.

Note: All timelines and features listed on this page are subject to change.

ONNX Runtime 1.20

Tentative release date: 10/30/2024

Release candidate now available on GitHub here.

Announcements
  • All ONNX Runtime Training packages have been deprecated. ORT 1.19.2 was the last release for which onnxruntime-training (PyPI), onnxruntime-training-cpu (PyPI), Microsoft.ML.OnnxRuntime.Training (Nuget), onnxruntime-training-c (CocoaPods), onnxruntime-training-objc (CocoaPods), and onnxruntime-training-android (Maven Central) were published.
  • ONNX Runtime packages will stop supporting Python 3.8 and Python 3.9. This decision aligns with NumPy Python version support. To continue using ORT with Python 3.8 and Python 3.9, you can use ORT 1.19.2 and earlier.
  • ONNX Runtime 1.20 CUDA packages will include new dependencies that were not required in 1.19 packages. The following dependencies are new: libcudnn_adv.so.9, libcudnn_cnn.so.9, libcudnn_engines_precompiled.so.9, libcudnn_engines_runtime_compiled.so.9, libcudnn_graph.so.9, libcudnn_heuristic.so.9, libcudnn_ops.so.9, libnvrtc.so.12, and libz.so.1.
Build System & Packages
  • Python 3.13 support is included in PyPI packages.
  • ONNX 1.17 support will be delayed until a future release, but the ONNX version used by ONNX Runtime has been patched to include a shape inference change to the Einsum op.
  • DLLs in the Maven build are now digitally signed.
  • (Experimental) vcpkg support added for the CPU EP. The DML EP does not yet support vcpkg, and other EPs have not been tested.
Core
  • MultiLoRA support.
  • Memory utilization (specifically related to external weights) and partitioning improvements.
Performance
  • FP16 SLM model support on CPU EP.
  • INT4 quantized embedding support on CPU and CUDA EPs.
EPs

TensorRT

  • TensorRT 10.4 support.
  • DDS enablement and performance improvements for NMS.

QNN

  • HTP shared weights context binary (offline tool).
  • Runtime support for QNN HTP shared weights in multiple ORT sessions.
  • Efficient mode support.

OpenVINO

  • Context generation memory optimizations.
  • Efficient mode support.

DirectML

  • DirectML 1.15.2 support.
Mobile
  • Android QNN support, including a pre-built Maven package, performance improvements, and Phi-3 model support.
  • Mobile GPU EP for support.
  • FP16 support for CoreML EP and XNNPACK kernels.
Web
  • Quantized embedding support.
  • On-demand weight loading support (offloads Wasm32 heap and enables 8B-parameter LLMs).
  • wasm64 support (available in custom builds but not included in released packages).
  • GQA support.
  • Integrated Intel GPU performance improvements.
  • Opset-21 support (Reshape, Shape, Gelu).
generate() API
  • Continuous decoding support, including chat mode and system prompt caching.
  • MultiLoRA API.
  • Additional model support, including Phi-3.5 Vision Multi-Frame and Qualcomm NPU support for Phi-3.5 and Llama-3.1.
  • Mac/iOS support available in pre-built packages.
Extensions
  • Tokenization performance improvements.
  • Additional multi-modal model support (CLIP and Mllama), including more kernel attributes.
  • Unigram tokenization model support.
  • OpenCV dependency removed from C API build.

Full release notes for ONNX Runtime Extensions v0.13 will be found here once they are available (10/30 target).

Olive
  • Olive command line interface (CLI) now available with support to execute well-defined, concrete workflows without the need to create or edit configs manually.
  • Additional improvements, including support for YAML-based workflow configs, streamlined DataConfig management, simplified workflow configuration, and more.
  • Llama and Phi-3 model updates, including an updated MultiLoRA example using the ORT generate() API.

Full release notes for Olive v0.7.0 can be found here.