Overview

TorchVision provides models, datasets, and transforms for computer vision. It has recently transitioned to "v2" transforms, which support more complex data types like bounding boxes and masks alongside images, using a unified API.

When to Use

Use TorchVision for standard CV tasks like classification, detection, or segmentation. Use the v2 transforms for performance-critical pipelines or when applying augmentations like MixUp/CutMix that require batch-level processing.

Decision Tree

Are you starting a new project?
- YES: Use torchvision.transforms.v2.
Do you need a pretrained model?
- YES: Use the weights parameter (e.g., ResNet50_Weights.DEFAULT).
Do you have bounding boxes that need to move with the image?
- YES: Use TVTensors for automatic coordinate transformation.

torchvision

Overview

When to Use

Decision Tree

Workflows

More from cuba6112/skillfactory

ollama-rag

unsloth-sft

torchaudio

pytorch-onnx

unsloth-lora

pytorch-quantization