content-extraction

Installation
SKILL.md

This skill extracts ALL content from an existing website and outputs it as structured, reusable data files. It crawls every page, downloads every asset, and produces a complete content inventory.

The user provides: the URL of the site to extract from, and optionally the target format (TypeScript, JSON, Markdown).

What Gets Extracted

For each page on the site:

Content type Output
Text Headings, paragraphs, lists, quotes — preserved with hierarchy
Metadata <title>, <meta description>, OG tags, canonical URL, lang
Images Downloaded to public/images/ with original filenames. Alt text cataloged
Links Internal + external, with anchor text and destination URL
PDFs & assets Downloaded to public/assets/. Filenames and original URLs cataloged
Forms Field names, types, labels, validation rules, action URLs
Navigation Menu structure, link hierarchy, active states
Structured data JSON-LD, microdata, schema.org markup
Related skills

More from saccoai/agent-skills

Installs
2
First Seen
Feb 27, 2026