ChromaDB

ChromaDB

What is ChromaDB?

ChromaDB is an open-source vector database used to store and retrieve embeddings (numerical vectors created by LLMs).

It is commonly used in RAG (Retrieval-Augmented Generation).

In simple terms:

When you break documents into chunks and convert them into embeddings, you need a place to store them.
ChromaDB is where you store,search , and retrieve these embeddings.


Why do we need ChromaDB?

LLMs cannot “remember” large documents.
So we do this:

  1. Convert text → embeddings

  2. Store embeddings in ChromaDB

  3. When a user asks a question, we find similar embeddings

  4. Send the relevant chunks back to the LLM

This makes the AI:

  • Accurate

  • Context-aware

  • Domain-aligned


W here is ChromaDB used?

  • Chatbots

  • Multi-agent systems

  • RAG pipelines

  • Document Q&A

  • Code assistants

  • Knowledge bases


🟧 ChromaDB Features

  • Fully open-source

  • Fast local search

  • Simple Python API

  • Works with:

    • LangChain

    • LangGraph

    • LlamaIndex

  • Good for small to medium productions

  • Persistent or in-memory modes


Examples (Very Simple)

Store & query embeddings in Python:

import chromadb
chroma = chromadb.Client()

collection = chroma.create_collection("docs")
collection.add(
    documents=["Hello world"],
    ids=["1"]
)

collection.query(query_texts=["world"])
  

Alternatives to ChromaDB (Vector Databases)

Here are the most popular vector DBs used in AI/RAG:


1. Pinecone

  • Fully managed cloud vector DB

  • Very fast and scalable

  • Used in enterprise RAG systems

  • Integrates with LangChain, LlamaIndex, OpenAI, etc.

Example Use Case:

Chatbot with millions of documents.


2. Weaviate

  • Open-source + cloud option

  • Hybrid search (vector + keyword)

  • Schema-based

Example Use Case:

Semantic search for product catalogs.


3. FAISS (Meta)

  • A library , not a full database

  • Extremely fast similarity search

  • Runs locally

  • No persistence unless manually handled

Example Use Case:

Local vector search in small RAG apps.


4. Milvus

  • Distributed, scalable vector database

  • Great for big data or millions of vectors

  • Cloud + open-source

Example Use Case:

Large-scale AI search engines.


5. Elasticsearch + KNN

  • Traditional search engine

  • Now supports vector search

  • Good mix of keyword + semantic

Example Use Case:

Enterprise search with logs + documents.


6. Qdrant

  • Open-source

  • Very fast ANN (approx nearest neighbor)

  • Strong community

  • Cloud and self-hosted options

Example Use Case:

Chatbots, RAG, multi-agent AI memory.


7. Redis Vector Store

  • Redis now supports vector search

  • Good for:

    • Caching

    • Real-time apps

    • Session memory

Example Use Case:

AI agents with short-term memory.


Summary Table

image.png