html-parsing-strategies
Installation
SKILL.md
HTML Parsing Strategies for html-to-markdown
Overview
The html-to-markdown project uses two complementary HTML parsers to handle different conversion scenarios and performance requirements:
- astral-tl (primary, lightweight): Fast, incremental parser via the
tlcrate - html5ever (robust fallback): Spec-compliant HTML5 parser for edge cases and malformed HTML
Parser Trade-offs and Selection
astral-tl (tl crate)
Strengths:
- Ultra-fast parsing performance with minimal memory overhead
- Ideal for streaming/large document processing
- Lightweight footprint in binary distributions
- Fast DOM traversal without full tree reconstruction
- Direct integration in core conversion pipeline
Related skills