Deploying ONNX Runtime Web

This document provides some guidance on how to deploy ONNX Runtime Web in a production environment.

Contents

Assets

When deploying ONNX Runtime Web in a production environment, the following assets are required:

  • JavaScript code bundle: The JavaScript code bundle that contains the application code and maybe the ONNX Runtime Web JavaScript code as well, depending on the how the application is built.

  • WebAssembly binaries: The WebAssembly binary file(s) of ONNX Runtime Web library.

  • Model file(s): The ONNX model file(s) that you want to run in the browser.

JavaScript code bundle

The JavaScript code bundle is usually a minified JavaScript file that contains the application code, generated by a bundler such as Webpack, Rollup or ESBuild. Depending on the bundler’s configuration, the ONNX Runtime Web JavaScript code may be included in the bundle or not (if specified as an external dependency).

Conditional Importing

To reduce the size of the JavaScript code bundle, you can use Conditional Importing to import only the necessary parts of ONNX Runtime Web library. For example, you can import onnxruntime-web/wasm if you only uses the WebAssembly execution provider, which can reduce the size of the JavaScript code bundle.

Inlined worker

The ONNX Runtime Web JavaScript code include 3 inlined source code:

  1. the web worker for proxy feature
  2. the web worker for WebAssembly multi-threading feature
  3. the WebAssembly entry generated by function.toString() required by (2) for multi-threading feature

The use of inlined worker helps to keep ONNX Runtime Web to a single JavaScript file, which is easier to deploy and use. However, it may not work in some environments, such as Content Security Policy (CSP) restricted environments. See Security considerations for more details.

WebAssembly binaries

The standard ONNX Runtime Web library includes the following WebAssembly binary files:

File SIMD Multi-threading JSEP Training
ort-wasm.wasm
ort-wasm-simd.wasm ✔️
ort-wasm-threaded.wasm ✔️
ort-wasm-simd-threaded.wasm ✔️ ✔️
ort-wasm-simd.jsep.wasm ✔️ ✔️
ort-wasm-simd-threaded.jsep.wasm ✔️ ✔️ ✔️
ort-training-wasm-simd.wasm ✔️ ✔️

The columns indicate whether the feature is supported by the WebAssembly artifact.

  • SIMD: whether the Single Instruction, Multiple Data (SIMD) feature is supported.
  • Multi-threading: whether the WebAssembly multi-threading feature is supported.
  • JSEP: whether the JavaScript Execution Provider (JSEP) feature is enabled. This feature powers the WebGPU and WebNN execution providers.
  • Training: whether the training feature is enabled.

When deploying ONNX Runtime Web in a production environment, you should consider which WebAssembly binary file(s) to include in the application. By default, ONNX Runtime Web JavaScript code will check the environment and load the appropriate WebAssembly binary file(s) automatically. This means you should include all combinations of WebAssembly binary file(s) in the deployment for the best compatibility.

However, when your application code imports ONNX Runtime Web with WebGPU or WebNN support, you can just include the 2 WebAssembly binary file(s) for JSEP. Furthermore, if you set the ort.env.wasm.numThreads to 1, you can just include file ort-wasm-simd.jsep.wasm in your deploy.

Ensure the WebAssembly binary file(s) are correctly served

You should ensure that the WebAssembly binary file(s) are correctly served on the server. If you didn’t copy the necessary WebAssembly binary file(s) when building the application, or if the WebAssembly binary file(s) are not in the expected path, ONNX Runtime Web will fail to initialize.

Override WebAssembly file path

ONNX Runtime Web tries to locate the WebAssembly binary file(s) by using the relative path of the JavaScript code bundle. If the WebAssembly binary file(s) are not located in the same directory as the JavaScript code bundle, you can override the file path by setting the value of ort.env.wasm.wasmPaths.

You can also set the ort.env.wasm.wasmPaths to an absolute URL to a public CDN, like jsdelivr or unpkg, if you are using a release version of ONNX Runtime Web:

// Set the WebAssembly binary file path to jsdelivr CDN for latest dev version
ort.env.wasm.wasmPaths = 'https://cdn.jsdelivr.net/npm/onnxruntime-web@dev/dist/';

// Set the WebAssembly binary file path to unpkg CDN for latest dev version
ort.env.wasm.wasmPaths = 'https://unpkg.com/onnxruntime-web@dev/dist/';

See API reference: env.wasm.wasmPaths for more details.

Model file(s)

If your ONNX model file(s) are large and they need some time to download, you can consider to use IndexedDB to cache the model file(s) to avoid loading the model every time the page is refreshed.

If the model contains external data, you need to pass the external data information to ONNX Runtime Web. See External Data for more details.

File size considerations

The size of the artifacts is an important factor to consider when deploying ONNX Runtime Web in a production environment. Reducing the file size can improve the load time of the application and reduce the memory consumption on the client’s device.

To reduce the deployment size, you can consider the following options:

  • Use Conditional Importing to import only the necessary parts of ONNX Runtime Web library.
  • Serve only necessary WebAssembly binaries, or use the ort.env.wasm.wasmPaths to set the WebAssembly binary file path to a public CDN.

If you want ultimate control over the size of the artifacts, you can also perform a custom build of ONNX Runtime Web.

Custom build

By using a custom build of ONNX Runtime Web, you can build ONNX Runtime Web with only the kernels that required by your model, which can significantly reduce the size of the WebAssembly binary file(s). The steps are however more complex and require some knowledge of the ONNX Runtime Web build system.

The content of this part is under construction.

Security considerations

Secure Context

WebGPU is accessible only to secure contexts. In short, a page loaded using HTTPS or using HTTP from localhost/127.0.0.1 is considered secure context.

See Secure Context and WebGPU: Troubleshooting tips and fixes for more details.

Content Security Policy (CSP) restricted environments

Currently, ONNX Runtime Web uses inline web workers to enable the proxy feature and WebAssembly multi-threading feature. This means in a CSP restricted environment, the features mentioned above may not work. We are working on a solution to make it work in a CSP restricted environment.