![]() |
ONNX Runtime
|
Contains functions that an OrtEp implements to specify the computation for an operator kernel. More...
#include <onnxruntime_ep_c_api.h>
Public Member Functions | |
| OrtStatus * | Compute (OrtKernelImpl *this_ptr, OrtKernelContext *context) |
| Computation function called to execute the kernel on an EP. | |
| OrtStatus * | PrePackWeight (OrtKernelImpl *this_ptr, const OrtValue *tensor, int input_index, OrtAllocator *allocator, OrtSharedPrePackedWeightCache *prepacked_weight_cache, bool *is_packed) |
| Optional function to pre-pack a constant tensor (i.e., a weight) to the kernel's preferred data layout. | |
| OrtStatus * | SetSharedPrePackedWeight (OrtKernelImpl *this_ptr, const void *const *buffer_data_ptrs, const size_t *buffer_data_sizes, size_t num_buffers, int input_index) |
| Optional function that receives data for a shared pre-packed weight from ORT. | |
Public Attributes | |
| uint32_t | ort_version_supported |
| Must be initialized to ORT_API_VERSION. | |
| uint32_t | flags |
| EP must initialize to 0. Used internally by ORT. | |
| void(* | Release )(OrtKernelImpl *this_ptr) |
| Called by ORT to release the OrtKernelImpl instance and its resources. | |
Contains functions that an OrtEp implements to specify the computation for an operator kernel.
| OrtStatus * OrtKernelImpl::Compute | ( | OrtKernelImpl * | this_ptr, |
| OrtKernelContext * | context | ||
| ) |
Computation function called to execute the kernel on an EP.
| [in] | this_ptr | The OrtKernelImpl instance. |
| [in] | context | The OrtKernelContext instance that provides access to the inputs and outputs. |
| OrtStatus * OrtKernelImpl::PrePackWeight | ( | OrtKernelImpl * | this_ptr, |
| const OrtValue * | tensor, | ||
| int | input_index, | ||
| OrtAllocator * | allocator, | ||
| OrtSharedPrePackedWeightCache * | prepacked_weight_cache, | ||
| bool * | is_packed | ||
| ) |
Optional function to pre-pack a constant tensor (i.e., a weight) to the kernel's preferred data layout.
For example, a Conv kernel can define this function to pack input W to the channel-last data layout before inference.
Pre-packing can operate in three different modes: no pre-packing mode, sharing mode, and non-sharing mode. 1) No pre-packing mode: The kernel can forgo any weight pre-packing for the given input_index by setting is_packed to false and returning a successful OrtStatus. In this mode, the kernel's OrtKernelImpl::SetSharedPrePackedWeight() function is not called for that specific input_index. 2) Sharing mode: Sharing is allowed if the prepacked_weight_cache argument is not NULL and the EP stores weight data in CPU-accessible memory. In this case, the kernel can optionally choose to share the packed weight with other kernels that use the same weight (compared by content hash). To do so, the kernel must allocate the packed weight with the provided allocator, then it stores the packed weight data into prepacked_weight_cache via SharedPrePackedWeightCache_StoreWeightData(), sets is_packed to true, and returns a successful OrtStatus. ORT will subsequently call OrtKernelImpl::SetSharedPrePackedWeight() to provide this kernel with the actual shared weight data, whose memory location could differ (i.e., if shared data was allocated by a previously processed kernel). 3) Non-sharing mode: In non-sharing mode, the prepacked_weight_cache argument is ignored. In this mode, the implementation allocates the packed data with the provided allocator, sets is_packed to true, and returns a successful OrtStatus. The kernel is ultimately responsible for releasing the packed data for the weight with allocator. ORT may release the original (unpacked) weight, which must not be accessed in OrtKernelImpl::Compute(). Note that in this mode, the kernel's OrtKernelImpl::SetSharedPrePackedWeight() function is not called by ORT for that specific input_index.
| [in] | this_ptr | The OrtKernelImpl instance. |
| [in] | tensor | The OrtValue instance representing the constant tensor (weight). Do not cache in the kernel. |
| [in] | input_index | The input index of the tensor in this kernel. |
| [in] | allocator | Allocator for allocating the pre-packed data. Its use is required in sharing mode and recommended, but not required, in the non-sharing mode. This will be an allocator set by the application for the session/environment (e.g., via CreateAndRegisterAllocator[V2] or RegisterAllocator), or an allocator on the OrtEpDevice (read-only or default) otherwise. The allocator remains valid throughout the lifetime of the OrtKernelImpl instance. |
| [in] | prepacked_weight_cache | May be NULL. If not NULL, the kernel may choose to share a packed weight by first storing it in the OrtSharedPrePackedWeightCache instance and then receiving the actual shared weight data in the call to OrtKernelImpl::SetSharedPrePackedWeight(). See the above description for "sharing mode". |
| [out] | is_packed | Output parameter that the implementation sets to true if the kernel packed the tensor data. |
is_packed defaults to false).| OrtStatus * OrtKernelImpl::SetSharedPrePackedWeight | ( | OrtKernelImpl * | this_ptr, |
| const void *const * | buffer_data_ptrs, | ||
| const size_t * | buffer_data_sizes, | ||
| size_t | num_buffers, | ||
| int | input_index | ||
| ) |
Optional function that receives data for a shared pre-packed weight from ORT.
ORT calls this function after calling OrtKernelImpl::PrePackWeight for a specific input_index if:
is_packed to true.prepacked_weight_cache) via the API SharedPrePackedWeightCache_StoreWeightData.Refer to the description of the "sharing-mode" in the documentation for OrtKernelImpl::PrePackWeight().
input_index that a previous call to OrtKernelImpl::PrePackWeight() did not elect to pre-pack and share.| [in] | this_ptr | The OrtKernelImpl instance. |
| [in] | buffer_data_ptrs | An array of buffer data pointers that collectively hold the pre-packed data for a single shared weight. The buffers are provided in the same order and with the same contents (in a potentially different memory location) as the buffers passed into SharedPrePackedWeightCache_StoreWeightData() within the OrtKernelImpl::PrePackWeight() call for the same input_index. |
| [in] | buffer_data_sizes | An array of buffer byte sizes, one per element in buffer_data_ptrs. |
| [in] | num_buffers | The number of buffers used to store the data for the shared pre-packed weight. Specifies the number of elements in the buffer_data_ptrs and buffer_data_sizes arrays. |
| [in] | input_index | The input index of the tensor in this kernel. This index identifies the identity of the weight. |
| uint32_t OrtKernelImpl::flags |
EP must initialize to 0. Used internally by ORT.
| uint32_t OrtKernelImpl::ort_version_supported |
Must be initialized to ORT_API_VERSION.
| void( * OrtKernelImpl::Release) (OrtKernelImpl *this_ptr) |
Called by ORT to release the OrtKernelImpl instance and its resources.
| [in] | this_ptr | The OrtKernelImpl instance. |