voice-note-to-midi
π΅ Voice Note to MIDI
Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.
What It Does
This skill provides a complete audio-to-MIDI conversion pipeline that:
- Stem Separation - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds
- ML-Powered Pitch Detection - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction
- Key Detection - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles
- Intelligent Quantization - Snaps notes to a configurable timing grid with optional key-aware pitch correction
- Post-Processing - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output
Pipeline Architecture
Audio Input (WAV/M4A/MP3)
β
More from thinkfleetai/thinkfleet-engine
local-whisper
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
160flyio-cli-public
Use the Fly.io flyctl CLI for deploying and operating apps on Fly.io: deploys (local or remote builder), viewing status/logs, SSH/console, secrets/config, scaling, machines, volumes, and Fly Postgres (create/attach/manage databases). Use when asked to deploy to Fly.io, debug fly deploy/build/runtime failures, set up GitHub Actions deploys/previews, or safely manage Fly apps and Postgres.
25kagi-search
Web search using Kagi Search API. Use when you need to search the web for current information, facts, or references. Requires KAGI_API_KEY in the environment.
24feishu-bridge
Connect a Feishu (Lark) bot to ThinkFleet via WebSocket long-connection. No public server, domain, or ngrok required. Use when setting up Feishu/Lark as a messaging channel, troubleshooting the Feishu bridge, or managing the bridge service (start/stop/logs). Covers bot creation on Feishu Open Platform, credential setup, bridge startup, macOS launchd auto-restart, and group chat behavior tuning.
13bambu-local
Control Bambu Lab 3D printers locally via MQTT (no cloud). Supports A1, A1 Mini, P1P, P1S, X1C.
12video-subtitles
Generate SRT subtitles from video/audio with translation support. Transcribes Hebrew (ivrit.ai) and English (whisper), translates between languages, burns subtitles into video. Use for creating captions, transcripts, or hardcoded subtitles for WhatsApp/social media.
12