gemma-tuner-multimodal

Installation
SKILL.md

Gemma Multimodal Fine-Tuner

Skill by ara.so — Daily 2026 Skills collection.

Fine-tune Gemma 4 and Gemma 3n models on text, images, and audio data entirely on Apple Silicon (MPS), with support for streaming large datasets from GCS/BigQuery without filling local storage.


What It Does

  • Text LoRA: instruction-tuning or completion fine-tuning from local CSV
  • Image + Text LoRA: captioning and VQA from local CSV
  • Audio + Text LoRA: the only Apple-Silicon-native path for this modality
  • Cloud streaming: train on terabytes from GCS/BigQuery without local copy
  • MPS-native: no NVIDIA GPU required — runs on MacBook Pro/Air/Mac Studio

Installation

Related skills
Installs
405
GitHub Stars
4
First Seen
Apr 8, 2026