Working with Large Models

The size of ONNX models can vary greatly depending on the complexity of the model and the number of parameters. They can be as small as a few KBs or as large as several GBs. While ONNX Runtime Web is designed to run all models in the browser, there are some considerations to keep in mind when working with large models.

Contents

Platform restrictions

There are some platform restrictions that you should be aware of when working with large models in the browser:

Maximum size of ArrayBuffer

Although there is no hard limit on the size of an ArrayBuffer in JavaScript, each browser has its own limitations. For example, the maximum size of an ArrayBuffer in Chrome is 0x7fe00000 bytes (around 2GB). Be careful when using the fetch API to load large models, as it may fail if you call response.arrayBuffer() on a large file.

ONNX Runtime Web creates array buffers > 2GB by using new WebAssembly.Memory() to bypass this limitation. However, an ArrayBuffer instance created by new WebAssembly.Memory() is not transferable, so it cannot work with the Proxy feature.

Protobuf file size limit

The ONNX model is serialized in the protobuf format. The maximum size of a protobuf file is 2GB. If an ONNX model is larger than 2GB, it’s usually generated with external data. See External Data for more details.

WebAssembly memory limit

WebAssembly has a memory limit of 4GB. This is the maximum amount of memory that a WebAssembly module can access because of the 32-bit addressing. Currently, there is no way for ONNX Runtime Web to run models larger than 4GB. We may support it in the future either by using WASM64 or by using direct GPU weight loading.

Cache the model

To avoid loading the model every time the page is refreshed, you can cache the model by using the Cache API or Origin private file system. This way, the model can be loaded from the cache instead of being fetched from the server every time.

See Cache API and Origin private file system for more details.

External Data

When you work with an large ONNX model, it is usually generated with external data. Because of the protobuf file size limit, ONNX models larger than 2GB have to work with external data. The external data is one or more separate file(s) and it’s usually generated by an ONNX exporter. The external data is usually put in the same directory as the ONNX model file.

ONNX Runtime supports loading models with external data. This is done automatically in C/C++/Python APIs without any extra steps because ONNX Runtime for these language bindings can access the file system. However, in the browser, the JavaScript code cannot access the file system directly. Therefore, you need one more step to pass the external data information to ONNX Runtime Web.

How external data works

Before we dive into the details, let’s first understand how external data works in ONNX Runtime. This information is important because otherwise the steps may look confusing.

An ONNX model is technically a protobuf file. The protobuf file contains the model graph and the weights. The ONNX spec allows the weights to be stored either in the protobuf file or in external data files. When the weights are stored in the protobuf file, it is fully included in the protobuf file. When the weights are stored in external data files, the protobuf file contains the following information of that specific weight:

  • “location” (a string) that specifies the relative file path of the external data file
  • “offset” (an integer) that specifies the byte offset in the external data file where the weight starts
  • “length” (an integer) that specifies the length of the weight in bytes

The “location” is usually determined by the ONNX exporter. For example, an exporter may output the model file as model_a.onnx, and the external data file as model_a.data in the same directory. Some weights in the model are stored in the model_a.data file, so the “location” of these weights is set to ./model_a.data. This information is stored in file model_a.onnx.

This explains why in native platforms it is important to make sure you never rename the external data file. If you rename the external data file, the “location” in the protobuf file will mismatch the actual file name, and ONNX Runtime will fail to load the model.

For ONNX Runtime Web, you always need to pass the external data information to ONNX Runtime Web. It’s important to understand that the “location” defined in the protobuf file is a different concept from the actual external file path. These 2 different concepts will be represented as “path” and “data” in the JavaScript code in the following section.

Load the model with external data in ONNX Runtime Web

We use an example to illustrate how to load an ONNX model with external data in the browser. Suppose we have an ONNX model model_a.onnx and an external data file model_a.data. The following code shows how to load the model with external data in the browser:

const modelUrl = 'https://example.com/path/model_a.onnx';
const externalDataUrl = 'https://example.com/path/model_a.data';

const mySession = await ort.InferenceSession.create(modelUrl, {
    ...,
    externalData: [
        {
            path: './model_a.data',
            data: externalDataUrl
        }
    ]
});

In the code above, we pass the external data information to the InferenceSession.create() method. The externalData is an array of objects, each object representing one external data file. The object has two properties:

  • path (a string) that should match the weights’ “location” info in the protobuf file
  • data (a string) that specifies the external data file. It can be a URL, a Blob or a Uint8Array.

When you store the model and external data in the IndexedDB, you can load the model with external data from the IndexedDB. The following code shows how to load the model with external data from the IndexedDB:

// assume loadFromIndexedDB() is a function implemented by your app that loads the data from the IndexedDB

// Load the model and external data from the IndexedDB
const modelBlob = await loadFromIndexedDB('model_a.onnx');
const externalDataBlob = await loadFromIndexedDB('model_a.data');

const mySession = await ort.InferenceSession.create(modelBlob, {
    ...,
    externalData: [
        {
            path: './model_a.data',
            data: externalDataBlob
        }
    ]
});

See ONNX External Data for more details.

Troubleshooting

This section is under construction.