ChromaDB

Overview

ChromaDB is an open-source vector database for storing, searching, and managing embeddings. It provides a simple API for document ingestion, semantic similarity search, and metadata filtering, supporting both Python and JavaScript/TypeScript clients with embedded, server, and cloud deployment options.

Instructions

When initializing, use get_or_create_collection for idempotent collection setup, choose PersistentClient for development and HttpClient for production server connections.
When adding documents, batch add() calls in chunks of 5,000 documents, always store source metadata (filename, URL, page number) for RAG citations, and use upsert() for incremental updates to avoid duplicates.
When querying, use collection.query(query_texts=..., n_results=...) for text-based search, combine metadata where filters to narrow results before semantic search, and set n_results based on the LLM's context window (5-10 for most RAG pipelines).
When choosing embeddings, use the default Sentence Transformers for local development without API keys, OpenAI or Cohere embedding functions for production, or pass pre-computed vectors directly.
When filtering metadata, use operators like $eq, $gt, $in with $and/$or logical operators, and combine with where_document for content-based filtering alongside semantic similarity.
When deploying, use the embedded PersistentClient for single-node applications, Docker for server mode, or Chroma Cloud for managed hosting with multi-tenancy support.
When tuning performance, configure HNSW parameters (hnsw:M, hnsw:construction_ef, hnsw:search_ef) for the quality-speed tradeoff and choose cosine distance for normalized embeddings (OpenAI, Cohere).

chromadb

ChromaDB

Overview

Instructions

Examples

Example 1: Build a document Q&A pipeline

More from terminalskills/skills

api-tester

instagram-marketing

directus

coolify

agent-memory

reddit-insights