chat-with-anyone
Clone real voices from online video or design voices from photos, then roleplay as that person with synthetic speech.
- Two workflows: extract voice from public video (interviews, speeches) by name, or generate a matching voice from an uploaded image of an unrecognizable person
- Requires
ffmpeg,yt-dlp, thettsskill, and a Noiz API key; includes setup verification and dependency installation steps - Built-in ethical guardrails: agent must refuse requests targeting non-consenting private individuals or clearly intended for deception, harassment, or fraud
- Automated reference extraction finds the densest speech segment from downloaded video subtitles and audio, then reuses it across multiple generated replies for voice consistency
Chat with Anyone
Clone a real person's voice from online video, or design a voice from a photo, then roleplay as that person with TTS.
Important: Ethical Use & Copyright
This skill synthesizes speech that imitates real voices. Before proceeding, the agent must:
- Never impersonate someone to deceive, defraud, or harass.
- Only use publicly available media (public speeches, interviews, press conferences) as reference audio.
- Inform the user that generated audio is synthetic and should not be presented as genuine recordings.
- Decline requests that target private individuals who have not consented, or that are clearly intended for deception, harassment, or defamation.
If the user's intent appears harmful, refuse politely and explain why.
Prerequisites
More from noizai/skills
tts
Use this skill whenever the user wants to convert text into speech, generate audio from text, or produce voiceovers. Triggers include: any mention of 'TTS', 'text to speech', 'speak', 'say', 'voice', 'read aloud', 'audio narration', 'voiceover', 'dubbing', or requests to turn written content into spoken audio. Also use when converting EPUB/PDF/SRT/articles to audio, cloning voices from reference audio, controlling emotion or speed in speech, aligning speech to subtitle timelines, or producing per-segment voice-mapped audio.
3.6Kcharacteristic-voice
Use this skill whenever the user wants speech to sound more human, companion-like, or emotionally expressive. Triggers include: any mention of 'say like', 'talk like', 'speak like', 'companion voice', 'comfort me', 'cheer me up', 'sound more human', 'good night voice', 'good morning voice', or requests to add fillers, emotion, or personality to generated speech. Also use when the user wants to mimic a specific character's voice, apply speaking style presets (goodnight, morning, comfort, celebration, chatting), tune emotional parameters like warmth or tenderness, or make TTS output feel like a real person talking. If the user asks for a 'voice message', 'companion audio', 'character voice', or wants speech that sighs, laughs, hesitates, or sounds genuinely warm, use this skill. Do NOT use for plain text-to-speech without personality, music generation, sound effects, or general coding tasks unrelated to expressive speech.
2.5Kvideo-translation
Translate and dub videos from one language to another, replacing the original audio with TTS while keeping the video intact.
1.8Kdaily-news-caster
Fetches the latest news using news-aggregator-skill, formats it into a podcast script in Markdown format, and uses the tts skill to generate a podcast audio file. Use when the user asks to get the latest news and read it out as a podcast.
1.7Ktemplate-skill
Reusable template for authoring new Agent Skills with clear triggers, workflow, and I/O contracts.
1.4Ksound-fx
Use this skill whenever the user wants to generate sound effects, ambient audio, or short audio clips from a text description. Triggers include: any mention of 'sound effect', 'sfx', 'generate sound', 'make a sound', 'audio effect', 'ambient sound', 'foley', 'sound clip', 'noise', or requests to produce a specific sound (e.g. 'make a gunshot sound', 'generate thunder', 'create the sound of rain'). Also use when the user describes an action or scenario and wants the corresponding audio (e.g. 'someone getting spanked', 'a door slamming', 'cartoon boing'). Do NOT use for speech synthesis, music generation with melody/lyrics, or voice cloning.
146