![]() |
ONNX Runtime
|
The OrtEp struct provides functions to implement for an execution provider. More...
#include <onnxruntime_ep_c_api.h>
Public Member Functions | |
| OrtStatus * | GetCapability (OrtEp *this_ptr, const OrtGraph *graph, OrtEpGraphSupportInfo *graph_support_info) |
| Get information about the nodes supported by the OrtEp instance. | |
| OrtStatus * | Compile (OrtEp *this_ptr, const OrtGraph **graphs, const OrtNode **fused_nodes, size_t count, OrtNodeComputeInfo **node_compute_infos, OrtNode **ep_context_nodes) |
| Compile OrtGraph instances assigned to the OrtEp. Implementer must set a OrtNodeComputeInfo instance for each OrtGraph in order to define its computation function. | |
| OrtStatus * | GetPreferredDataLayout (OrtEp *this_ptr, OrtEpDataLayout *preferred_data_layout) |
| Get the EP's preferred data layout. | |
| OrtStatus * | ShouldConvertDataLayoutForOp (OrtEp *this_ptr, const char *domain, const char *op_type, OrtEpDataLayout target_data_layout, int *should_convert) |
Given an op with domain domain and type op_type, determine whether an associated node's data layout should be converted to target_data_layout. If the EP prefers a non-default data layout (see GetPreferredDataLayout()), this function will be called during layout transformation with target_data_layout set to the EP's preferred data layout. | |
| OrtStatus * | SetDynamicOptions (OrtEp *this_ptr, const char *const *option_keys, const char *const *option_values, size_t num_options) |
| Set dynamic options on this EP. | |
| OrtStatus * | OnRunStart (OrtEp *this_ptr, const OrtRunOptions *run_options) |
| Called by ORT to notify the EP of the start of a run. | |
| OrtStatus * | OnRunEnd (OrtEp *this_ptr, const OrtRunOptions *run_options, bool sync_stream) |
| Called by ORT to notify the EP of the end of a run. | |
| OrtStatus * | CreateAllocator (OrtEp *this_ptr, const OrtMemoryInfo *memory_info, OrtAllocator **allocator) |
| Create an OrtAllocator for the given OrtMemoryInfo for an OrtSession. | |
| OrtStatus * | CreateSyncStreamForDevice (OrtEp *this_ptr, const OrtMemoryDevice *memory_device, OrtSyncStreamImpl **stream) |
| Create a synchronization stream for the given memory device for an OrtSession. | |
| OrtStatus * | GetKernelRegistry (OrtEp *this_ptr, const OrtKernelRegistry **kernel_registry) |
| Gets the execution provider's kernel registry, if any. | |
| OrtStatus * | IsConcurrentRunSupported (OrtEp *this_ptr, bool *is_supported) |
| Gets whether the execution provider supports concurrent run calls made on the session. | |
| OrtStatus * | Sync (OrtEp *this_ptr) |
| Called by ORT to block until the device has completed all preceding requested tasks. | |
| OrtStatus * | CreateProfiler (OrtEp *this_ptr, OrtEpProfilerImpl **profiler) |
| Return a new profiler for the execution provider. | |
| OrtStatus * | ReplayGraph (OrtEp *this_ptr, int graph_annotation_id) |
| Run the instantiated (captured) graph. | |
| OrtStatus * | GetAvailableResource (const OrtEp *this_ptr, OrtResourceCount *available) |
| Query the available device resource for partitioning budget. | |
| OrtStatus * | OnSessionInitializationEnd (OrtEp *this_ptr) |
| Called by ORT when session initialization is complete. | |
| OrtStatus * | ReleaseCapturedGraph (OrtEp *this_ptr, int graph_annotation_id) |
| Release a previously captured graph and its associated resources. | |
Public Attributes | |
| uint32_t | ort_version_supported |
| The ONNX Runtime version the execution provider was compiled with. | |
| const char *(* | GetName )(const OrtEp *this_ptr) |
| Get the execution provider name. | |
| void(* | ReleaseNodeComputeInfos )(OrtEp *this_ptr, OrtNodeComputeInfo **node_compute_infos, size_t num_node_compute_infos) |
| Release OrtNodeComputeInfo instances. | |
| const char *(* | GetCompiledModelCompatibilityInfo )(OrtEp *this_ptr, const OrtGraph *graph) |
| Get a string with details about the EP stack used to produce a compiled model. | |
| bool(* | IsGraphCaptureEnabled )(const OrtEp *this_ptr) |
| Indicate whether the graph capturing mode (e.g., CUDA graph) is enabled for the provider. | |
| bool(* | IsGraphCaptured )(const OrtEp *this_ptr, int graph_annotation_id) |
| Indicate whether a graph has been captured and instantiated. | |
| OrtGraphCaptureNodeAssignmentPolicy(* | GetGraphCaptureNodeAssignmentPolicy )(const OrtEp *this_ptr) |
| Get the node assignment validation policy for graph capture. | |
The OrtEp struct provides functions to implement for an execution provider.
| OrtStatus * OrtEp::Compile | ( | OrtEp * | this_ptr, |
| const OrtGraph ** | graphs, | ||
| const OrtNode ** | fused_nodes, | ||
| size_t | count, | ||
| OrtNodeComputeInfo ** | node_compute_infos, | ||
| OrtNode ** | ep_context_nodes | ||
| ) |
Compile OrtGraph instances assigned to the OrtEp. Implementer must set a OrtNodeComputeInfo instance for each OrtGraph in order to define its computation function.
If the session is configured to generate a pre-compiled model, the execution provider must return EPContext nodes, as OrtNode instances, that ONNX Runtime uses to create a pre-compiled model, known as an "EPContext model". An EPContext model contains EPContext nodes. Each EPContext node encapsulates the pre-compiled binary data for a OrtGraph compiled for a specific execution provider. For more details about the EPContext design, refer to: EPContext design document.
| [in] | this_ptr | The OrtEp instance. |
| [in] | graphs | Array of count OrtGraph instances to compile. Each graph contains only the nodes for which the execution provider indicated support. Nested subgraphs contained by a node, such as an If or Loop, have separate OrtGraph instances. |
| [in] | fused_nodes | Array of count fused nodes that will replace the compiled graphs. Each fused node is an OrtNode initialized with the intended fused node name and input/output information. |
| [in] | count | The number of OrtGraph instances to compile. |
| [out] | node_compute_infos | Array of count OrtNodeComputeInfo instances that define each OrtGraph instance's computation function. The implementer allocates the OrtNodeComputeInfo instances. ORT calls ReleaseNodeComputeInfos() to release multiple instances in a batch. |
| [out] | ep_context_nodes | Output array of count OrtNode instances, each representing an EPContext node for a compiled OrtGraph. The execution provider must use OrtModelEditorApi::CreateNode to create the OrtNode instances. ONNX Runtime takes ownership of the OrtNode instances, so the execution provider must NOT call OrtApi::ReleaseNode. Should be ignored if the session is not configured to generate an EPContext model. |
| OrtStatus * OrtEp::CreateAllocator | ( | OrtEp * | this_ptr, |
| const OrtMemoryInfo * | memory_info, | ||
| OrtAllocator ** | allocator | ||
| ) |
Create an OrtAllocator for the given OrtMemoryInfo for an OrtSession.
The OrtMemoryInfo instance will match one of the values set in the OrtEpDevice using EpDevice_AddAllocatorInfo. Any allocator specific options should be read from the session options.
If nullptr OrtEpFactory::CreateAllocator will be used.
| [in] | this_ptr | The OrtEpFactory instance. |
| [in] | memory_info | The OrtMemoryInfo to create the allocator for. May be nullptr. |
| [out] | allocator | The created OrtAllocator instance. Set to nullptr if the default CPU allocator is used. |
| OrtStatus * OrtEp::CreateProfiler | ( | OrtEp * | this_ptr, |
| OrtEpProfilerImpl ** | profiler | ||
| ) |
Return a new profiler for the execution provider.
If the EP supports profiling, it should create and return an OrtEpProfilerImpl instance. ORT takes ownership of each non-NULL instance returned and will call OrtEpProfilerImpl::Release when it is no longer needed.
ORT may call this function multiple times over the lifetime of a single OrtEp instance, for example during EP registration and again per run if run-level profiling is enabled. Each call is independent and the EP must return a new profiler instance (or NULL if profiling is not supported).
| [in] | this_ptr | The OrtEp instance. |
| [out] | profiler | Output parameter set to a new OrtEpProfilerImpl instance created by the EP. Set to NULL if the EP does not support profiling. |
| OrtStatus * OrtEp::CreateSyncStreamForDevice | ( | OrtEp * | this_ptr, |
| const OrtMemoryDevice * | memory_device, | ||
| OrtSyncStreamImpl ** | stream | ||
| ) |
Create a synchronization stream for the given memory device for an OrtSession.
This is used to create a synchronization stream for the execution provider and is used to synchronize operations on the device during model execution. Any stream specific options should be read from the session options.
If nullptr OrtEpFactory::CreateSyncStreamForDevice will be used.
| [in] | this_ptr | The OrtEpFactory instance. |
| [in] | memory_device | The OrtMemoryDevice to create the synchronization stream for. |
| [out] | stream | The created OrtSyncStreamImpl instance. nullptr if the execution provider is not stream aware. |
| OrtStatus * OrtEp::GetAvailableResource | ( | const OrtEp * | this_ptr, |
| OrtResourceCount * | available | ||
| ) |
Query the available device resource for partitioning budget.
Called by ORT during graph partitioning when no explicit resource budget threshold has been configured via session options. The EP should query its device for the currently available resource (e.g., free GPU memory) and return it as an OrtResourceCount.
If the EP does not support resource querying, set this function pointer to NULL. ORT will skip threshold-based budget enforcement in that case.
| [in] | this_ptr | The OrtEp instance. |
| [out] | available | The available device resource. |
| OrtStatus * OrtEp::GetCapability | ( | OrtEp * | this_ptr, |
| const OrtGraph * | graph, | ||
| OrtEpGraphSupportInfo * | graph_support_info | ||
| ) |
Get information about the nodes supported by the OrtEp instance.
IMPORTANT: This is not the final version of this API function. This is currently experimental but will be stabilized by the ONNX Runtime 1.23 release.
| [in] | this_ptr | The OrtEp instance. |
| [in] | graph | The OrtGraph instance for which to populate node support. The OrtGraph could be a nested subgraph contained by a node (e.g., an If or Loop node). ONNX Runtime calls this function separately for each nested subgraph. |
| [in,out] | graph_support_info | OrtEpGraphSupportInfo instance that the implementer must fill out in order to specify the supported nodes. |
| OrtStatus * OrtEp::GetKernelRegistry | ( | OrtEp * | this_ptr, |
| const OrtKernelRegistry ** | kernel_registry | ||
| ) |
Gets the execution provider's kernel registry, if any.
A kernel registry contains kernel creation information for operator kernels supported by an EP.
| [in] | this_ptr | The OrtEp instance. |
| [out] | kernel_registry | Output parameter set to the EP's kernel registry, which must remain valid throughout the lifetime of the EP. Can be NULL if the EP doesn't use a kernel registry.
|
| OrtStatus * OrtEp::GetPreferredDataLayout | ( | OrtEp * | this_ptr, |
| OrtEpDataLayout * | preferred_data_layout | ||
| ) |
Get the EP's preferred data layout.
OrtEpDataLayout::NCHW.| [in] | this_ptr | The OrtEp instance. |
| [out] | preferred_data_layout | The EP's preferred data layout. |
Gets whether the execution provider supports concurrent run calls made on the session.
| [in] | this_ptr | The OrtEp instance. |
| [out] | is_supported | Whether concurrent runs are supported. |
| OrtStatus * OrtEp::OnRunEnd | ( | OrtEp * | this_ptr, |
| const OrtRunOptions * | run_options, | ||
| bool | sync_stream | ||
| ) |
Called by ORT to notify the EP of the end of a run.
| [in] | this_ptr | The OrtEp instance. |
| [in] | run_options | The run options for this run. |
| [in] | sync_stream | Whether any associated stream should be synchronized during this call. Only applicable if there is such a stream. |
| OrtStatus * OrtEp::OnRunStart | ( | OrtEp * | this_ptr, |
| const OrtRunOptions * | run_options | ||
| ) |
Called by ORT to notify the EP of the start of a run.
| [in] | this_ptr | The OrtEp instance. |
| [in] | run_options | The run options for this run. |
Called by ORT when session initialization is complete.
This provides an opportunity for execution providers to optionally synchronize and clean up temporary resources to reduce memory usage and ensure the first inference run is fast.
| [in] | this_ptr | The OrtEp instance. |
Release a previously captured graph and its associated resources.
Called when the caller no longer needs the captured graph for the given annotation ID. This allows the EP to free buffers and other resources tied to this graph.
| [in] | this_ptr | The EP instance. |
| [in] | graph_annotation_id | The annotation ID of the graph to release. |
Run the instantiated (captured) graph.
Called by ORT instead of normal execution when IsGraphCaptured() returns true.
| [in] | this_ptr | The OrtEp instance. |
| [in] | graph_annotation_id | Identifies which captured graph to replay. Applications can set this value via OrtApi::AddRunConfigEntry() with the key "gpu_graph_id". The default value is 0 when the run config entry is not set. A value of -1 means graph replay should be skipped for this run. |
OrtEp::IsGraphCaptureEnabled is implemented and may return true.| OrtStatus * OrtEp::SetDynamicOptions | ( | OrtEp * | this_ptr, |
| const char *const * | option_keys, | ||
| const char *const * | option_values, | ||
| size_t | num_options | ||
| ) |
Set dynamic options on this EP.
Dynamic options can be set by the user at any time after session creation with OrtApi::SetEpDynamicOptions().
| [in] | this_ptr | The OrtEp instance. |
| [in] | option_keys | The dynamic option keys. |
| [in] | option_values | The dynamic option values. |
| [in] | num_options | The number of dynamic options. |
| OrtStatus * OrtEp::ShouldConvertDataLayoutForOp | ( | OrtEp * | this_ptr, |
| const char * | domain, | ||
| const char * | op_type, | ||
| OrtEpDataLayout | target_data_layout, | ||
| int * | should_convert | ||
| ) |
Given an op with domain domain and type op_type, determine whether an associated node's data layout should be converted to target_data_layout. If the EP prefers a non-default data layout (see GetPreferredDataLayout()), this function will be called during layout transformation with target_data_layout set to the EP's preferred data layout.
| [in] | this_ptr | The OrtEp instance. |
| [in] | domain | The op domain. An empty string means the ONNX domain. |
| [in] | op_type | The op type. |
| [in] | target_data_layout | The target data layout. |
| [out] | should_convert | Whether the associated node's data layout should be converted to target_data_layout. If greater than 0, convert. If 0, don't convert. Otherwise, if less than 0, leave the decision to ORT. |
Called by ORT to block until the device has completed all preceding requested tasks.
Currently this is primarily used by the IOBinding object to ensure that all inputs have been copied to the device before execution begins.
| [in] | this_ptr | The OrtEp instance. |
Get a string with details about the EP stack used to produce a compiled model.
This function gets a compatibility information string that contains details about the execution provider used to compile a given model. This string can later be used with ValidateCompiledModelCompatibilityInfo to determine if a compiled model is compatible with the EP.
The returned string should be a null-terminated, UTF-8 encoded string. ORT will copy it.
| [in] | this_ptr | The OrtEp instance. |
| [in] | graph | The OrtGraph instance for which to generate compatibility information. |
| OrtGraphCaptureNodeAssignmentPolicy( * OrtEp::GetGraphCaptureNodeAssignmentPolicy) (const OrtEp *this_ptr) |
Get the node assignment validation policy for graph capture.
When graph capture is enabled, ORT validates that nodes are assigned to EPs in a way that is compatible with graph capture. This function tells ORT which validation policy to apply.
| [in] | this_ptr | The OrtEp instance. |
| const char *( * OrtEp::GetName) (const OrtEp *this_ptr) |
Get the execution provider name.
The returned string should be a null-terminated, UTF-8 encoded string. ORT will copy it.
| [in] | this_ptr | The OrtEp instance. |
| bool( * OrtEp::IsGraphCaptured) (const OrtEp *this_ptr, int graph_annotation_id) |
Indicate whether a graph has been captured and instantiated.
ORT calls this before each Session::Run(). If true, ORT calls ReplayGraph() instead of normal execution. After a run where this returns false, ORT automatically retries until it returns true (handling warm-up runs transparently).
| [in] | this_ptr | The OrtEp instance. |
| [in] | graph_annotation_id | Identifies which captured graph to query. Applications can set this value via OrtApi::AddRunConfigEntry() with the key "gpu_graph_id". The default value is 0 when the run config entry is not set. Setting different IDs allows the EP to capture and manage multiple graphs (e.g., one per distinct input shape). A value of -1 means graph capture/replay should be skipped for this run. |
OrtEp::IsGraphCaptureEnabled is implemented and may return true.| bool( * OrtEp::IsGraphCaptureEnabled) (const OrtEp *this_ptr) |
Indicate whether the graph capturing mode (e.g., CUDA graph) is enabled for the provider.
Graph capture allows an EP to record a sequence of device (e.g., GPU) operations during an initial run and replay them on subsequent runs, bypassing per-kernel CPU launch overhead.
Applications enable graph capture via EP-specific provider options (e.g., enable_cuda_graph=1 for the CUDA EP). An EP should return true from this function if it has been configured to enable graph capture/replay.
ORT graph capture/replay summary: During OrtSession initialization, ORT calls OrtEp::IsGraphCaptureEnabled() on each EP in the order specified during provider registration with the session. If an EP returns true, ORT validates that the graph is suitable for graph capture, and if so, caches the EP for graph capture during the next run. The graph validation ensures that there are no control flow nodes and that node-to-EP assignments are compatible with the policy specified by the EP via OrtEp::GetGraphCaptureNodeAssignmentPolicy(). Note that an OrtSession only supports graph capture for one EP (i.e., the first EP to claim support).
During the first call to OrtApi::Run() for the OrtSession, ORT performs multiple internal runs of the model until the EP indicates that the graph has been captured by returning true from OrtEp::IsGraphCaptured(). If the EP is unable to capture the graph within 8 runs, the call to OrtApi::Run() returns an error OrtStatus. Each internal run invokes OrtEp::OnRunStart(), normal execution, and OrtEp::OnRunEnd(). EPs should use these run callbacks to track the number of necessary warm-up runs and begin/end graph capture when ready.
After successful graph capture, subsequent calls to OrtApi::Run() skip normal execution and ORT instead calls OrtEp::ReplayGraph() directly.
Applications can capture and replay multiple graphs (e.g., one per distinct input shape) by setting the "gpu_graph_id" run config entry via OrtApi::AddRunConfigEntry() to different integer values. ORT passes the value as the graph_annotation_id parameter to OrtEp::IsGraphCaptured() and OrtEp::ReplayGraph().
| [in] | this_ptr | The OrtEp instance. |
OrtEp::IsGraphCaptured and OrtEp::ReplayGraph must also be implemented. If either is NULL, ORT will log a warning and ignore this EP for graph capture.| uint32_t OrtEp::ort_version_supported |
The ONNX Runtime version the execution provider was compiled with.
Implementation should set to ORT_API_VERSION. ORT will use this to ensure it does not call functions that were not available when the library was compiled.
| void( * OrtEp::ReleaseNodeComputeInfos) (OrtEp *this_ptr, OrtNodeComputeInfo **node_compute_infos, size_t num_node_compute_infos) |
Release OrtNodeComputeInfo instances.
| [in] | this_ptr | The OrtEp instance. |
| [in,out] | node_compute_infos | The OrtNodeComputeInfo instances to release. |
| [in] | num_node_compute_infos | The number of OrtNodeComputeInfo instances. |