Building RAG Applications with LangChain and Vector Databases

Learn how to build production-ready Retrieval-Augmented Generation systems using LangChain, ChromaDB, and OpenAI embeddings for accurate, grounded AI responses.

LangChainRAGAIVector DB

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that combines the power of large language models with external knowledge retrieval. Instead of relying solely on the model's training data, RAG fetches relevant documents at query time and uses them as context.

This approach solves two critical problems:

  1. Hallucination — grounding responses in real data
  2. Stale knowledge — accessing up-to-date information beyond the training cutoff

Architecture Overview

A typical RAG pipeline consists of three stages:

import { ChatOpenAI } from "@langchain/openai";
import { ChromaClient } from "chromadb";
import { OpenAIEmbeddings } from "@langchain/openai";
 
// 1. Embed your documents
const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});
 
// 2. Store in a vector database
const client = new ChromaClient();
const collection = await client.getOrCreateCollection({
  name: "knowledge-base",
});
 
// 3. Query and generate
async function askQuestion(query: string): Promise<string> {
  const queryEmbedding = await embeddings.embedQuery(query);
  const results = await collection.query({
    queryEmbeddings: [queryEmbedding],
    nResults: 5,
  });
 
  const context = results.documents.flat().join("\n\n");
 
  const llm = new ChatOpenAI({ model: "gpt-4o" });
  const response = await llm.invoke(
    `Context:\n${context}\n\nQuestion: ${query}`
  );
 
  return response.content as string;
}

Chunking Strategies

How you split your documents matters enormously for retrieval quality:

StrategyBest ForChunk Size
Fixed-sizeSimple documents500-1000 tokens
RecursiveStructured text500-800 tokens
SemanticComplex docsVariable
Sentence-basedQ&A systems3-5 sentences

Key Takeaways

  • Start with recursive character splitting — it works well for most cases
  • Use hybrid search (keyword + semantic) for better recall
  • Always include a reranking step before feeding context to the LLM
  • Monitor your pipeline with tools like LangSmith for observability

RAG is not a silver bullet, but when implemented correctly, it dramatically improves the reliability and accuracy of your AI applications.