torchaudio
Overview
TorchAudio provides signal processing tools for PyTorch, enabling users to treat audio processing as part of the neural network graph. This allow transforms to be run on GPUs and handled via nn.Sequential pipelines.
When to Use
Use TorchAudio for converting raw audio waveforms into features like Mel Spectrograms, performing data augmentation (SpecAugment), or when high-performance resampling is required.
Decision Tree
- Do you need to transform many audio files quickly?
- MOVE: The transform module to GPU using
.to('cuda').
- MOVE: The transform module to GPU using
- Are you training an Automatic Speech Recognition (ASR) model?
- USE: SpecAugment (TimeMasking, FrequencyMasking) on the spectrogram.
- Do you need to align text to audio?
- USE: The
forced_alignfunctional API with a Wav2Vec2 model.
- USE: The
Workflows
More from cuba6112/skillfactory
ollama-rag
Build RAG systems with Ollama local + cloud models. Latest cloud models include DeepSeek-V3.2 (GPT-5 level), Qwen3-Coder-480B (1M context), MiniMax-M2. Use for document Q&A, knowledge bases, and agentic RAG. Covers LangChain, LlamaIndex, ChromaDB, and embedding models.
17unsloth-sft
Supervised fine-tuning using SFTTrainer, instruction formatting, and multi-turn dataset preparation with triggers like sft, instruction tuning, chat templates, sharegpt, alpaca, conversation_extension, and SFTTrainer.
6pytorch-onnx
Exporting PyTorch models to ONNX format for cross-platform deployment. Includes handling dynamic axes, graph optimization in ONNX Runtime, and INT8 model quantization. (onnx, onnxruntime, torch.onnx.export, dynamic_axes, constant-folding, edge-deployment)
5unsloth-lora
Configuring and optimizing 16-bit Low-Rank Adaptation (LoRA) and Rank-Stabilized LoRA (rsLoRA) for efficient LLM fine-tuning using triggers like lora, qlora, rslora, rank selection, lora_alpha, lora_dropout, and target_modules.
4pytorch-quantization
Techniques for model size reduction and inference acceleration using INT8 quantization, including Post-Training Quantization (PTQ) and Quantization Aware Training (QAT). (quantization, int8, qat, fbgemm, qnnpack, ptq, dequantize)
3torchvision
Computer vision library for PyTorch featuring pretrained models, advanced image transforms (v2), and utilities for handling complex data types like bounding boxes and masks. (torchvision, transforms, tvtensor, resnet, cutmix, mixup, pretrained models, vision transforms)
3