rag-architect
Installation
SKILL.md
RAG Architect
The agent designs, implements, and optimizes production-grade Retrieval-Augmented Generation pipelines, covering the full lifecycle from document chunking through evaluation.
Workflow
- Analyse corpus -- Profile the document collection: count, average length, format mix (PDF, HTML, Markdown), language(s), and domain. Validate that sample documents are accessible before proceeding.
- Select chunking strategy -- Choose from the Chunking Strategy Matrix based on corpus characteristics. Set chunk size, overlap, and boundary rules. Run a test split on 100 sample documents.
- Choose embedding model -- Select an embedding model from the Embedding Model table based on domain, latency budget, and cost constraints. Verify dimension compatibility with the target vector database.
- Select vector database -- Pick a vector store from the Vector Database Comparison based on scale, query patterns, and operational requirements.
- Design retrieval pipeline -- Configure retrieval strategy (dense, sparse, or hybrid). Add reranking if precision requirements exceed 0.85. Set the top-K parameter and similarity threshold.
- Implement query transformations -- If query-document style mismatch exists, enable HyDE. If queries are ambiguous, enable multi-query generation. Validate each transformation improves retrieval metrics on a held-out set.
- Configure guardrails -- Enable PII detection, toxicity filtering, hallucination detection, and source attribution. Set confidence score thresholds.
- Evaluate end-to-end -- Run the RAGAS evaluation framework. Verify faithfulness > 0.90, context relevance > 0.80, answer relevance > 0.85. Iterate on weak components.
Chunking Strategy Matrix
| Strategy | Best For | Chunk Size | Overlap | Pros | Cons |
|---|---|---|---|---|---|
| Fixed-size (token) | Uniform docs, consistent sizing | 512-2048 tokens | 10-20% | Predictable, simple | Breaks semantic units |
Related skills