rag-implementer
RAG Implementer
Build production-ready retrieval-augmented generation systems. RAG = Retrieval + Context Assembly + Generation. Use RAG when LLMs need access to fresh, domain-specific, or proprietary knowledge not in their training data. Do not use RAG when simpler alternatives (FAQ pages, keyword search, semantic search) suffice. For KB architecture selection and governance, use the knowledge-base-manager skill. For knowledge graph implementation, use the knowledge-graph-builder skill.
Overview
Before building RAG, validate the need: try FAQ pages, keyword search, concierge MVP, or simple semantic search first. Only proceed with RAG for 50k+ documents with validated user demand and $200-500/month budget. RAG systems range from Naive (prototype) through Advanced (production) to Modular (enterprise), each tier adding complexity and cost.
The RAG pipeline has three core stages. First, retrieval finds relevant documents using hybrid search (semantic + keyword). Second, context assembly ranks, deduplicates, and compresses retrieved chunks into an optimal prompt. Third, generation produces a grounded response with source attribution. Each stage has distinct failure modes: retrieval can miss relevant documents (low recall), context assembly can overwhelm the model (lost in the middle), and generation can hallucinate despite good context (low faithfulness).
Modern RAG extends beyond basic vector similarity. Hybrid search combining dense embeddings with sparse BM25 is now the baseline. Re-ranking with cross-encoders improves precision after initial retrieval. Contextual chunking and late chunking preserve document-level semantics that fixed-size chunking loses. GraphRAG enables multi-hop reasoning over entity relationships by building knowledge graphs from documents. Proposition chunking breaks documents into atomic facts for precise retrieval of individual claims.
Choose techniques based on your query complexity and document structure. Start with hybrid search and re-ranking as the foundation, then layer contextual chunking, GraphRAG, or query expansion as needed. Measure everything: Precision@K, Recall@K, faithfulness, and end-to-end latency. The difference between a good and bad chunking strategy alone can create a 9% gap in recall performance.
Quick Reference
| Phase | Goal | Key Actions |
|---|---|---|
| 1. Knowledge Base Design | Structured knowledge foundation | Map sources, define chunking, add metadata |
More from oakoss/agent-skills
playwright
|
200ui-ux-polish
Iterative UI/UX polishing workflow for web applications. Use when improving visual polish, refining desktop and mobile UX separately, running iterative enhancement cycles, applying design patterns like glassmorphism or bento grids, or auditing accessibility and WCAG compliance. Use for Stripe-level visual quality, responsive optimization, and design system alignment.
153find-skills
|
118knowledge-graph-builder
>
101tailwind
Tailwind CSS v4 patterns and design systems. Use when configuring Tailwind themes, building components, implementing dark mode, using container queries, migrating from v3, integrating shadcn/ui, or fixing build errors. Use for tailwind, css, styling, theme, design-tokens.
85pnpm-workspace
pnpm workspace monorepo management with filtering, catalogs, and shared configs. Use when setting up monorepos, managing workspace dependencies, filtering package commands, or sharing configuration across packages.
78