ONNX Runtime can be used to accelerate well over 130,000 of the models available on Hugging Face.
The top 30 most popular model architectures on Hugging Face are all supported by ONNX Runtime, and over 80 Hugging Face model architectures in total boast ORT support. This list includes BERT, GPT2, T5, Stable Diffusion, Whisper, and many more.
ONNX models can be found directly from the Hugging Face Model Hub in its ONNX model library.
Hugging Face also provides ONNX support for a variety of other models not listed in the ONNX model library. With Hugging Face Optimum, you can easily convert pretrained models to ONNX, and Transformers.js lets you run Hugging Face Transformers directly from your browser!
Hugging Face also provides an Open LLM Leaderboard with more detailed tracking and evaluation of recently releases LLMs from the community.
Models accelerated by ONNX Runtime can be easily deployed to the cloud through Azure Machine Learning, which improves time-to-value, streamlines MLOps, provides built-in AI governance, and designs responsible AI solutions.
Azure Machine Learning publishes a curated model list that is updated regularly and includes the most popular models. You can run the vast majority of the models on the curated list with ONNX Runtime, using HuggingFace Optimum.
Powered by ONNX Runtime Web, it enables you to execute cutting-edge Machine Learning tasks in areas such as Natural Language Processing, Computer Vision, Audio, and Multimodal directly within your web browser, eliminating the need for a server.