chromadb
Installation
SKILL.md
ChromaDB
Overview
ChromaDB is an open-source vector database for storing, searching, and managing embeddings. It provides a simple API for document ingestion, semantic similarity search, and metadata filtering, supporting both Python and JavaScript/TypeScript clients with embedded, server, and cloud deployment options.
Instructions
- When initializing, use
get_or_create_collectionfor idempotent collection setup, choosePersistentClientfor development andHttpClientfor production server connections. - When adding documents, batch
add()calls in chunks of 5,000 documents, always store source metadata (filename, URL, page number) for RAG citations, and useupsert()for incremental updates to avoid duplicates. - When querying, use
collection.query(query_texts=..., n_results=...)for text-based search, combine metadatawherefilters to narrow results before semantic search, and setn_resultsbased on the LLM's context window (5-10 for most RAG pipelines). - When choosing embeddings, use the default Sentence Transformers for local development without API keys, OpenAI or Cohere embedding functions for production, or pass pre-computed vectors directly.
- When filtering metadata, use operators like
$eq,$gt,$inwith$and/$orlogical operators, and combine withwhere_documentfor content-based filtering alongside semantic similarity. - When deploying, use the embedded
PersistentClientfor single-node applications, Docker for server mode, or Chroma Cloud for managed hosting with multi-tenancy support. - When tuning performance, configure HNSW parameters (
hnsw:M,hnsw:construction_ef,hnsw:search_ef) for the quality-speed tradeoff and choosecosinedistance for normalized embeddings (OpenAI, Cohere).
Examples
Example 1: Build a document Q&A pipeline
Related skills