ai-matching-records

Installation
SKILL.md

Build an AI Record Matcher

Match and deduplicate records across datasets with DSPy - blocking to narrow candidates, pairwise LM scoring, and transitive closure to group all matches.

Step 1: Understand the matching task

Ask the user:

  1. What records are you matching? (contacts, companies, tickets, products, etc.)
  2. Which fields matter? (name, email, phone, address, description, etc.)
  3. How many records? (100s vs millions changes blocking strategy significantly)
  4. What defines a match? (exact same entity, or "close enough to merge"?)
  5. What to do with matches? (deduplicate, merge fields, link IDs, flag for review)

When NOT to use AI matching

  • Single-field exact match — if email == email or id == id covers your case, use SQL JOIN or a hash lookup. No LM needed.
  • Clean data with unique identifiers — if records already have a shared key (user_id, EIN, ISBN), join on it directly.
  • Small datasets where manual review is faster — under 50 records, a human can review pairs in minutes.
  • Simple fuzzy string matching covers it — tools like rapidfuzz or fuzzywuzzy handle typos and abbreviations cheaply. Add an LM only when semantic understanding is needed ("IBM" = "International Business Machines").
Related skills

More from lebsral/dspy-programming-not-prompting-lms-skills

Installs
2
GitHub Stars
5
First Seen
5 days ago