Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Azure Execution Provider (Preview)

The Azure Execution Provider enables ONNX Runtime to invoke a remote Azure endpoint for inference. The endpoint must be deployed beforehand. To consume the endpoint, a model with same inputs and outputs must be first loaded locally.

One use case for Azure Execution Provider is for small-big models. E.g. A smaller model can be deployed on edge devices for faster inference, while a bigger model can be deployed on Azure for higher precision. Using the Azure Execution Provider, switching between the two can be easily achieved (assuming same inputs and outputs).

Azure Execution Provider is in preview stage, and all API(s) and usage are subject to change.



Pre-built Python binaries of ONNX Runtime with Azure EP are published on Pypi: onnxruntime-azure


For Windows, please install zlib and re2, and add their binaries into the system path. If built from source, zlib and re2 binaries could be easily located with:

cd <build_output_path>
dir /s zlib1.dll re2.dll

For Linux, please make sure openssl is installed.


For build instructions, please see the BUILD page.



from onnxruntime import *
import numpy as np
import os

sess_opt = SessionOptions()
sess_opt.add_session_config_entry('azure.endpoint_type', 'triton'); # only support triton server for now
sess_opt.add_session_config_entry('azure.uri', 'https://...')
sess_opt.add_session_config_entry('azure.model_name', 'a_simple_model');
sess_opt.add_session_config_entry('azure.model_version', '1'); # optional, default 1
sess_opt.add_session_config_entry('azure.verbose', 'true'); # optional, default false

sess = InferenceSession('a_simple_model.onnx', sess_opt, providers=['CPUExecutionProvider','azureExecutionProvider'])

run_opt = RunOptions()
run_opt.add_run_config_entry('use_azure', '1') # optional, default '0' to run inference locally.
run_opt.add_run_config_entry('azure.auth_key', '...') # optional, required only when use_azure set to 1

x = np.array([1,2,3,4]).astype(np.float32)
y = np.array([4,3,2,1]).astype(np.float32)

z =, {'X':x, 'Y':y}, run_opt)[0]

Current Limitations

  • Only supports Triton Inference Server on AML.
  • Only builds and run on Windows and Linux.
  • Available only as Python package, but can be built from source and used via C/C++ API(s).
  • Known Issue: For certain ubuntu versions, https call made by AzureEP might report error - “error setting certificate verify location …”. To silence it, please create file “/etc/pki/tls/certs/ca-bundles.crt” that link to “/etc/ssl/certs/ca-certificates.crt”.