multimodal-models

Installation
SKILL.md

Multimodal Models

Pre-trained models for vision, audio, and cross-modal tasks.


Model Overview

Model Modality Task
CLIP Image + Text Zero-shot classification, similarity
Whisper Audio → Text Transcription, translation
Stable Diffusion Text → Image Image generation, editing

CLIP (Vision-Language)

Zero-shot image classification without training on specific labels.

Related skills
Installs
43
Repository
eyadsibai/ltk
GitHub Stars
4
First Seen
Jan 28, 2026