Building RAG Applications with LangChain and Vector Databases

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that combines the power of large language models with external knowledge retrieval. Instead of relying solely on the model's training data, RAG fetches relevant documents at query time and uses them as context.

This approach solves two critical problems:

Hallucination — grounding responses in real data
Stale knowledge — accessing up-to-date information beyond the training cutoff

Architecture Overview

A typical RAG pipeline consists of three stages:

import { ChatOpenAI } from "@langchain/openai";
import { ChromaClient } from "chromadb";
import { OpenAIEmbeddings } from "@langchain/openai";
 
// 1. Embed your documents
const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});
 
// 2. Store in a vector database
const client = new ChromaClient();
const collection = await client.getOrCreateCollection({
  name: "knowledge-base",
});
 
// 3. Query and generate
async function askQuestion(query: string): Promise<string> {
  const queryEmbedding = await embeddings.embedQuery(query);
  const results = await collection.query({
    queryEmbeddings: [queryEmbedding],
    nResults: 5,
  });
 
  const context = results.documents.flat().join("\n\n");
 
  const llm = new ChatOpenAI({ model: "gpt-4o" });
  const response = await llm.invoke(
    `Context:\n${context}\n\nQuestion: ${query}`
  );
 
  return response.content as string;
}

Chunking Strategies

How you split your documents matters enormously for retrieval quality:

Strategy	Best For	Chunk Size
Fixed-size	Simple documents	500-1000 tokens
Recursive	Structured text	500-800 tokens
Semantic	Complex docs	Variable
Sentence-based	Q&A systems	3-5 sentences

Key Takeaways

Start with recursive character splitting — it works well for most cases
Use hybrid search (keyword + semantic) for better recall
Always include a reranking step before feeding context to the LLM
Monitor your pipeline with tools like LangSmith for observability

RAG is not a silver bullet, but when implemented correctly, it dramatically improves the reliability and accuracy of your AI applications.