Overview

ONNX (Open Neural Network Exchange) is an open format built to represent machine learning models. Exporting PyTorch models to ONNX allows them to be executed in environments without Python or PyTorch, using high-performance engines like ONNX Runtime.

When to Use

Use ONNX for cross-language deployment (C++, Java, C#), edge deployment (mobile/IoT), or to leverage specialized hardware accelerators (like TensorRT) that support ONNX as an input format.

Decision Tree

Does your model accept variable batch sizes?
- SPECIFY: dynamic_axes in the torch.onnx.export call.
Do you need the fastest possible inference on a CPU?
- APPLY: Quantization using the ONNX Runtime quantization tool.
Are you deploying to a C++ environment without Python?
- EXPORT: To ONNX and load using the ONNX Runtime C++ API.

pytorch-onnx

Overview

When to Use

Decision Tree

Workflows

More from cuba6112/skillfactory

ollama-rag

unsloth-sft

torchaudio

unsloth-lora

pytorch-quantization

torchvision