How to Build a RAG App with LangChain and Supabase in 2026

Retrieval-Augmented Generation (RAG) is the most practical pattern for building AI applications that need access to private or up-to-date data. Instead of fine-tuning a model, you retrieve relevant context at query time and feed it to the LLM. In this guide, we'll build a full RAG pipeline using LangChain for orchestration and Supabase as our vector store.

1. What is RAG and Why Does It Matter?

Large Language Models are trained on static datasets. They don't know about your company docs, your product updates, or anything after their training cutoff. RAG solves this by:

Indexing: Converting your documents into vector embeddings and storing them in a database.
Retrieving: When a user asks a question, finding the most semantically similar documents.
Generating: Passing the retrieved context to the LLM alongside the question to produce an accurate, grounded answer.

2. Architecture Overview

Our stack consists of four layers:

Layer	Tool	Purpose
Orchestration	LangChain	Chain management, document loading, text splitting
Embeddings	OpenAI text-embedding-3-small	Convert text to 1536-dim vectors
Vector Store	Supabase + pgvector	Store and query embeddings with SQL
LLM	GPT-4o	Generate answers from retrieved context

3. Setting Up Supabase as a Vector Store

Supabase supports pgvector natively. Enable the extension and create a documents table:

-- Enable the vector extension
create extension if not exists vector;

-- Create documents table
create table documents (
  id bigserial primary key,
  content text,
  metadata jsonb,
  embedding vector(1536)
);

-- Create similarity search function
create function match_documents (
  query_embedding vector(1536),
  match_count int default 5
) returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
) language plpgsql as $$
begin
  return query
  select
    documents.id,
    documents.content,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
$$;

4. Indexing Documents with LangChain

Use LangChain's document loaders and text splitters to chunk and embed your data:

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";
import { OpenAIEmbeddings } from "@langchain/openai";
import { createClient } from "@supabase/supabase-js";

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_KEY!
);

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const docs = await splitter.createDocuments([rawText]);

await SupabaseVectorStore.fromDocuments(docs, new OpenAIEmbeddings(), {
  client: supabase,
  tableName: "documents",
});

5. Querying with RAG

At query time, embed the user's question, retrieve matching chunks, and pass them to the LLM:

import { ChatOpenAI } from "@langchain/openai";
import { RetrievalQAChain } from "langchain/chains";

const vectorStore = new SupabaseVectorStore(new OpenAIEmbeddings(), {
  client: supabase,
  tableName: "documents",
});

const llm = new ChatOpenAI({ modelName: "gpt-4o" });

const chain = RetrievalQAChain.fromLLM(
  llm,
  vectorStore.asRetriever({ k: 5 })
);

const response = await chain.invoke({
  query: "How do I set up authentication?",
});

6. Production Considerations

Chunk size matters: Too small loses context, too large dilutes relevance. 500-1000 tokens is a good starting point.
Use Row Level Security: Supabase RLS lets you scope vector searches per user.
Cache embeddings: Don't re-embed unchanged documents on every deployment.
Monitor with LangSmith: Trace every chain execution to debug retrieval quality.