Build for On-Device Training


  • Python 3.x
  • CMake

Build Instructions for the Training Phase

  1. Clone the repository

     git clone --recursive
     cd onnxruntime
  2. Build ONNX Runtime for On-Device Training

    a. For Windows

     .\build.bat --config RelWithDebInfo --cmake_generator "Visual Studio 17 2022" --build_shared_lib --parallel --enable_training_apis

    b. For Linux

     ./ --config RelWithDebInfo --build_shared_lib --parallel --enable_training_apis

    c. For Android

    Refer to the Android build instructions and add the --enable_training_apis build flag.

    d. For MacOS

    Refer to the macOS inference build instructions and add the --enable_training_apis build flag.

    e. For iOS

    Refer to the iOS build instructions and add the --enable_training_apis build flag.

    f. For web

    Refer to the web build instructions.


  • To build the C# bindings, add the --build_nuget flag to the build command above.

  • To build the Python wheel:
    • add the --build_wheel flag to the build command above.
    • install the wheel using python -m pip install build/Linux/RelWithDebInfo/dist/*.whl
  • The config flag can be one of Debug, RelWithDebInfo, Release, MinSizeRel. Use the one that suits your use case.

  • The --enable_training_apis flag can be used in conjunction with the --minimal_build flag.

  • The offline phase of generating the training artifacts can only be done with Python (using the --build_wheel flag).

  • The build commands above only build for the cpu execution provider. To build for cuda execution provider, add these flags
    • --use_cuda
    • --cuda_home {directory to your cuda home, for example /usr/local/cuda/}
    • --cudnn_home {directory to your cuda home, for example /usr/local/cuda/}
    • --cuda_version={version for example 11.8}

Build for Large Model Training



./ --config RelWithDebInfo --build_shared_lib --parallel --enable_training



The default NVIDIA GPU build requires CUDA runtime libraries installed on the system:

Build instructions

  1. Checkout this code repo with

     git clone
     cd onnxruntime
  2. Set the environment variables: adjust the paths for locations on your build machine
     export CUDA_HOME=<location for CUDA libs> # e.g. /usr/local/cuda
     export CUDNN_HOME=<location for cuDNN libs> # e.g. /usr/local/cuda
     export CUDACXX=<location for NVCC> #e.g. /usr/local/cuda/bin/nvcc
  3. Create the ONNX Runtime Python wheel

    ./ --config=RelWithDebInfo --enable_training --build_wheel --use_cuda --cuda_home {location of cuda libs eg. /usr/local/cuda/} --cudnn_home {location of cudnn libs eg./usr/local/cuda/} --cuda_version={version for eg. 11.8}
  4. Install the .whl file in ./build/Linux/RelWithDebInfo/dist for ONNX Runtime Training.

     python -m pip install build/Linux/RelWithDebInfo/dist/*.whl

That’s it! Once the build is complete, you should be able to use the ONNX Runtime libraries and executables in your projects. Note that these steps are general and may need to be adjusted based on your specific environment and requirements. For more information, you can ask for help on the ONNX Runtime GitHub community.



The default AMD GPU build requires ROCm software toolkit installed on the system:

Build instructions

  1. Checkout this code repo with

     git clone
     cd onnxruntime
  2. Create the ONNX Runtime Python wheel

    ./ --config Release --enable_training --build_wheel --parallel --skip_tests --use_rocm --rocm_home /opt/rocm
  3. Install the .whl file in ./build/Linux/RelWithDebInfo/dist for ONNX Runtime Training.

     python -m pip install build/Linux/RelWithDebInfo/dist/*.whl


Build Instructions


./ --enable_training --use_dnnl


.\build.bat --enable_training --use_dnnl

Add --build_wheel to build the ONNX Runtime wheel.

This will produce a .whl file in build/Linux/RelWithDebInfo/dist for ONNX Runtime Training.