regex-vs-llm-structured-text

Installation
Summary

Hybrid regex-and-LLM framework for parsing structured text, optimizing cost by handling 95–98% with regex and reserving LLM calls for edge cases.

  • Combines regex extraction with confidence scoring to flag low-confidence items, then validates only those items with an LLM, reducing LLM calls by ~95% versus all-LLM approaches
  • Includes production-ready Python patterns for regex parsing, confidence scoring, and hybrid pipeline orchestration with real metrics from a 410-item quiz parsing example
  • Best suited for structured, repeating text patterns like quizzes, forms, invoices, and documents where deterministic extraction is possible
  • Emphasizes test-driven development, immutable data structures, and metric logging to track pipeline health and identify when regex thresholds degrade
SKILL.md

Regex vs LLM for Structured Text Parsing

A practical decision framework for parsing structured text (quizzes, forms, invoices, documents). The key insight: regex handles 95-98% of cases cheaply and deterministically. Reserve expensive LLM calls for the remaining edge cases.

When to Activate

  • Parsing structured text with repeating patterns (questions, forms, tables)
  • Deciding between regex and LLM for text extraction
  • Building hybrid pipelines that combine both approaches
  • Optimizing cost/accuracy tradeoffs in text processing

Decision Framework

Is the text format consistent and repeating?
├── Yes (>90% follows a pattern) → Start with Regex
│   ├── Regex handles 95%+ → Done, no LLM needed
│   └── Regex handles <95% → Add LLM for edge cases only
└── No (free-form, highly variable) → Use LLM directly
Related skills
Installs
3.5K
GitHub Stars
179.7K
First Seen
Feb 17, 2026