postgres-hybrid-text-search
Hybrid Text Search
Hybrid search combines keyword search (BM25) with semantic search (vector embeddings) to get the best of both: exact keyword matching and meaning-based retrieval. Use Reciprocal Rank Fusion (RRF) to merge results from both methods into a single ranked list.
This guide covers combining pg_textsearch (BM25) with pgvector. Requires both extensions. For high-volume setups, filtering, or advanced pgvector tuning (binary quantization, HNSW parameters), see the pgvector-semantic-search skill.
pg_textsearch is a new BM25 text search extension for PostgreSQL, fully open-source and available hosted on Tiger Cloud as well as for self-managed deployments. It provides true BM25 ranking, which often improves relevance compared to PostgreSQL's built-in ts_rank and can offer better performance at scale. Note: pg_textsearch is currently in prerelease and not yet recommended for production use. pg_textsearch currently supports PostgreSQL 17 and 18.
When to Use Hybrid Search
- Use hybrid when queries mix specific terms (product names, codes, proper nouns) with conceptual intent
- Use semantic only when meaning matters more than exact wording (e.g., "how to fix slow queries" should match "query optimization")
- Use keyword only when exact matches are critical (e.g., error codes, SKUs, legal citations)
Hybrid search typically improves recall over either method alone, at the cost of slightly more complexity.
Data Preparation
Chunk your documents into smaller pieces (typically 500–1000 tokens) and store each chunk with its embedding. Both BM25 and semantic search operate on the same chunks—this keeps fusion simple since you're comparing like with like.