defuddle
Installation
Summary
Extract clean article content from web pages, removing ads and clutter to return readable Markdown with metadata.
- Parses URLs or local HTML files and outputs clean Markdown with frontmatter (title, author, publication date, word count)
- Supports JSON metadata extraction including featured images, domain, favicon, and parse timing
- Includes a guided workflow: extract content, preview summary, save to user-specified directory, and confirm file location
- Works best on article-style pages (blogs, news, documentation); not designed for JavaScript-heavy or single-page applications
- Requires Node.js, npm, and jsdom as a peer dependency
SKILL.md
Defuddle - Web Content Extraction
Extract main article content from web pages, removing ads, sidebars, navigation, and other clutter. Output clean Markdown with metadata.
Prerequisites
Before first use, check if defuddle is installed:
command -v defuddle >/dev/null 2>&1 || npm install -g defuddle jsdom
Default Workflow
When user provides a URL, follow this workflow:
Step 1: Extract content as Markdown + JSON metadata
Always use both -m and -j flags to get markdown content with full metadata: