Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Custom operators

ONNX Runtime provides options to run custom operators that are not official ONNX operators.


Register a custom operator

A new op can be registered with ONNX Runtime using the Custom Operator API in onnxruntime_c_api.

  1. Create an OrtCustomOpDomain with the domain name used by the custom ops.
  2. Create an OrtCustomOp structure for each op and add them to the OrtCustomOpDomain with OrtCustomOpDomain_Add.
  3. Call OrtAddCustomOpDomain to add the custom domain of ops to the session options.


Calling a native operator from custom operator

To simplify implementation of custom operators, native onnxruntime operators can directly be invoked. For example, some custom ops might have to do GEMM or TopK in between other computations. This may also be useful for preprocessing and postprocessing on a node, such as Conv, for state management purpose. To achieve this, the Conv node can be wrapped up by a custom operator such as CustomConv, within which the input and output could be cached and processed.

This feature is supported from ONNX Runtime 1.12.0+. See: API and examples.

CUDA custom ops

When a model is run on a GPU, ONNX Runtime will insert a MemcpyToHost op before a CPU custom op and append a MemcpyFromHost after it to make sure tensors are accessible throughout calling.

When using CUDA custom ops, to ensure synchronization between ORT’s CUDA kernels and the custom CUDA kernels, they must all use the same CUDA compute stream. To ensure this, you may first create a CUDA stream and pass it to the underlying Session via SessionOptions (use the OrtCudaProviderOptions struct). This will ensure ORT’s CUDA kernels use that stream and if the custom CUDA kernels are launched using the same stream, synchronization is now taken care of implicitly.

For example, see how the afore-mentioned MyCustomOp is being launched and how the Session using this custom op is created.

Contrib ops

The contrib ops domain contains ops that are built in to the runtime by default. However most new operators should not be added here to avoid increasing binary size of the core runtime package.

See for example the Inverse op added in #3485.

The custom op’s schema and shape inference function should be added in using ONNX_CONTRIB_OPERATOR_SCHEMA.

    .SetDomain(kMSDomain) // kMSDomain = ""
    .SinceVersion(1) // Same version used at op (symbolic) registration

A new operator should have complete reference implementation tests and shape inference tests.

Reference implementation python tests should be added in onnxruntime/test/python/contrib_ops. E.g.,

Shape inference C++ tests should be added in onnxruntime/test/contrib_ops. E.g.,

The operator kernel should be implemented using Compute function under contrib namespace in onnxruntime/contrib_ops/cpu/ for CPU and onnxruntime/contrib_ops/cuda/ for CUDA.

namespace onnxruntime {
namespace contrib {

class Inverse final : public OpKernel {
  explicit Inverse(const OpKernelInfo& info) : OpKernel(info) {}
  Status Compute(OpKernelContext* ctx) const override;


        .TypeConstraint("T", BuildKernelDefConstraints<float, double, MLFloat16>()),

Status Inverse::Compute(OpKernelContext* ctx) const {
... // kernel implementation

}  // namespace contrib
}  // namespace onnxruntime

The kernel should be registered in for CPU and for CUDA.

Now you should be able to build and install ONNX Runtime to start using your custom op.

Contrib Op Tests

Tests should be added in onnxruntime/test/contrib_ops/. For example:

namespace onnxruntime {
namespace test {

// Add a comprehensive set of unit tests for custom op kernel implementation

TEST(InverseContribOpTest, two_by_two_float) {
  OpTester test("Inverse", 1, kMSDomain); // custom opset version and domain
  test.AddInput<float>("X", {2, 2}, {4, 7, 2, 6});
  test.AddOutput<float>("Y", {2, 2}, {0.6f, -0.7f, -0.2f, 0.4f});


}  // namespace test
}  // namespace onnxruntime