ai-matching-records

Installation

SKILL.md

Build an AI Record Matcher

Match and deduplicate records across datasets with DSPy - blocking to narrow candidates, pairwise LM scoring, and transitive closure to group all matches.

Step 1: Understand the matching task

Ask the user:

What records are you matching? (contacts, companies, tickets, products, etc.)
Which fields matter? (name, email, phone, address, description, etc.)
How many records? (100s vs millions changes blocking strategy significantly)
What defines a match? (exact same entity, or "close enough to merge"?)
What to do with matches? (deduplicate, merge fields, link IDs, flag for review)

When NOT to use AI matching

Single-field exact match — if email == email or id == id covers your case, use SQL JOIN or a hash lookup. No LM needed.
Clean data with unique identifiers — if records already have a shared key (user_id, EIN, ISBN), join on it directly.
Small datasets where manual review is faster — under 50 records, a human can review pairs in minutes.
Simple fuzzy string matching covers it — tools like rapidfuzz or fuzzywuzzy handle typos and abbreviations cheaply. Add an LM only when semantic understanding is needed ("IBM" = "International Business Machines").

Related skills

More from lebsral/dspy-programming-not-prompting-lms-skills

Installs

Repository

lebsral/dspy-pr…s-skills

GitHub Stars

First Seen

5 days ago

Security Audits

Gen Agent Trust HubPass

SocketWarn

SnykPass

ai-matching-records

Build an AI Record Matcher

Step 1: Understand the matching task

When NOT to use AI matching

More from lebsral/dspy-programming-not-prompting-lms-skills

ai-switching-models

ai-stopping-hallucinations

ai-do

ai-reasoning

ai-building-chatbots

ai-improving-accuracy