minimax-multimodal-toolkit
Installation
SKILL.md
MiniMax Multi-Modal Toolkit
Generate voice, music, video, and image content via MiniMax APIs — the unified entry for MiniMax multimodal use cases (audio + music + video + image). Includes voice cloning & voice design for custom voices, image generation with character reference, and FFmpeg-based media tools for audio/video format conversion, concatenation, trimming, and extraction.
Setup & Configuration
Prerequisites
brew install ffmpeg jq # macOS
sudo apt install ffmpeg jq # Linux (Debian/Ubuntu)
bash scripts/check_environment.sh # verify environment
No Python or pip required — all scripts are pure bash using curl, ffmpeg, jq, and xxd.
Note:
ffmpegis required for TTS voice bubble conversion (.mp3→.opus). Without it, TTS audio sends as a file attachment instead of a native voice bubble.