Generative AI with ONNX Runtime

Note: this API is in preview and is subject to change.

Run generative AI models with ONNX Runtime. Source code: (https://github.com/microsoft/onnxruntime-genai)

This library provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.

Users can call a high level generate() method, or run each iteration of the model in a loop, generating one token at a time, and optionally updating generation parameters inside the loop.

It has support for greedy/beam search and TopP, TopK sampling to generate token sequences and built-in logits processing like repetition penalties. You can also easily add custom scoring.


Table of contents