ai-multimodal
AI Multimodal
Process audio, images, videos, documents, and generate images/videos using Google Gemini's multimodal API.
Setup
export GEMINI_API_KEY="your-key" # Get from https://aistudio.google.com/apikey
pip install google-genai python-dotenv pillow
Quick Start
Verify setup: python scripts/check_setup.py
Analyze media: python scripts/gemini_batch_process.py --files <file> --task <analyze|transcribe|extract>
- TIP: When you're asked to analyze an image, check if
geminicommand is available, then use"<prompt to analyze image>" | gemini -y -m gemini-2.5-flashcommand. Ifgeminicommand is not available, usepython scripts/gemini_batch_process.py --files <file> --task analyzecommand. Generate content:python scripts/gemini_batch_process.py --task <generate|generate-video> --prompt "description"
More from bmad-labs/skills
typescript-e2e-testing
E2E and integration testing for TypeScript/NestJS projects using Jest, supertest, and real infrastructure via Docker (Kafka, PostgreSQL, MongoDB, Redis) with the Given-When-Then pattern. Use whenever the user is working on `.e2e-spec.ts` files or anything under `test/e2e/`, or asks to set up, write, review, run, debug, or optimize E2E or integration tests — including flaky tests, docker-compose for tests, Kafka/Redpanda consumers, test isolation, or GWT compliance.
1.9Ktypescript-unit-testing
Unit testing for TypeScript/NestJS projects using Jest, @golevelup/ts-jest (DeepMocked/createMock), and in-memory databases, with AAA structure. Use whenever the user is working on `.spec.ts` files or asks to set up Jest, write/add tests for a service/usecase/controller/guard/interceptor/pipe/filter, mock dependencies, review test quality or coverage, run unit tests, debug failing or flaky tests, or optimize test performance and open handles.
282typescript-clean-code
|
197slides-generator
Generate interactive presentation slides using React + Tailwind, and export to standalone single-file HTML. Triggers on keywords like "slides", "presentation", "PPT", "demo", "benchmark", or when user requests export. Uses agent-browser skill for browser verification before export (install with `npx skills add vercel-labs/agent-browser` if not available).
124skill-from-book
|
82ui-ux-pro-max
UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.
78