How to Build a RAG App with LangChain and Supabase in 2026
Learn how to build a production-ready Retrieval-Augmented Generation (RAG) application using LangChain for orchestration and Supabase for vector storage. Step-by-step architecture guide with code examples.
Retrieval-Augmented Generation (RAG) is the most practical pattern for building AI applications that need access to private or up-to-date data. Instead of fine-tuning a model, you retrieve relevant context at query time and feed it to the LLM. In this guide, we'll build a full RAG pipeline using LangChain for orchestration and Supabase as our vector store.
1. What is RAG and Why Does It Matter?
Large Language Models are trained on static datasets. They don't know about your company docs, your product updates, or anything after their training cutoff. RAG solves this by:
- Indexing: Converting your documents into vector embeddings and storing them in a database.
- Retrieving: When a user asks a question, finding the most semantically similar documents.
- Generating: Passing the retrieved context to the LLM alongside the question to produce an accurate, grounded answer.
2. Architecture Overview
Our stack consists of four layers:
| Layer | Tool | Purpose |
|---|---|---|
| Orchestration | LangChain | Chain management, document loading, text splitting |
| Embeddings | OpenAI text-embedding-3-small | Convert text to 1536-dim vectors |
| Vector Store | Supabase + pgvector | Store and query embeddings with SQL |
| LLM | GPT-4o | Generate answers from retrieved context |
3. Setting Up Supabase as a Vector Store
Supabase supports pgvector natively. Enable the extension and create a documents table:
-- Enable the vector extension
create extension if not exists vector;
-- Create documents table
create table documents (
id bigserial primary key,
content text,
metadata jsonb,
embedding vector(1536)
);
-- Create similarity search function
create function match_documents (
query_embedding vector(1536),
match_count int default 5
) returns table (
id bigint,
content text,
metadata jsonb,
similarity float
) language plpgsql as $$
begin
return query
select
documents.id,
documents.content,
documents.metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
order by documents.embedding <=> query_embedding
limit match_count;
end;
$$;
4. Indexing Documents with LangChain
Use LangChain's document loaders and text splitters to chunk and embed your data:
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";
import { OpenAIEmbeddings } from "@langchain/openai";
import { createClient } from "@supabase/supabase-js";
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_KEY!
);
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const docs = await splitter.createDocuments([rawText]);
await SupabaseVectorStore.fromDocuments(docs, new OpenAIEmbeddings(), {
client: supabase,
tableName: "documents",
});
5. Querying with RAG
At query time, embed the user's question, retrieve matching chunks, and pass them to the LLM:
import { ChatOpenAI } from "@langchain/openai";
import { RetrievalQAChain } from "langchain/chains";
const vectorStore = new SupabaseVectorStore(new OpenAIEmbeddings(), {
client: supabase,
tableName: "documents",
});
const llm = new ChatOpenAI({ modelName: "gpt-4o" });
const chain = RetrievalQAChain.fromLLM(
llm,
vectorStore.asRetriever({ k: 5 })
);
const response = await chain.invoke({
query: "How do I set up authentication?",
});
6. Production Considerations
- Chunk size matters: Too small loses context, too large dilutes relevance. 500-1000 tokens is a good starting point.
- Use Row Level Security: Supabase RLS lets you scope vector searches per user.
- Cache embeddings: Don't re-embed unchanged documents on every deployment.
- Monitor with LangSmith: Trace every chain execution to debug retrieval quality.