FastRAGDocumentation

Documentation

Everything you need to set up, run, and deploy FastRAG.

Overview

FastRAG is a production-ready RAG (Retrieval-Augmented Generation) starter kit built with Next.js, LangChain, Pinecone, and OpenAI. It saves you 40+ hours of setting up vector ingestion pipelines, handling stream responses, and managing context windows.

16+
Next.js
Latest
LangChain
v3
Pinecone
GPT-3.5
OpenAI
What you get: Multi-file PDF ingestion, URL scraping with Puppeteer, streaming chat with citations, mobile UI, and full source code you actually understand.

Prerequisites

Make sure you have accounts and API keys ready before starting. All services have free tiers.

Node.js v18+Required

Required to run Next.js locally.

Get key
OpenAI API KeyRequired

Used for embeddings and chat completions. Requires $5 credit added to the account — a new API key alone isn't enough.

Get key
Pinecone API KeyRequired

Vector database for storing and querying embeddings. Free Starter plan is sufficient.

Get key
Browserless.io TokenOptional

Headless browser for URL scraping. Only needed if you use the web scraping feature.

Get key

Installation

1

Clone or unzip the project

If you have GitHub repo access:

bash
git clone fastrag.git
cd fastrag

Or unzip the downloaded file and open the folder in your terminal.

2

Install dependencies

bash
npm install
If you see peer dependency warnings due to rapid LangChain updates, run: npm install --legacy-peer-deps

Environment Setup

Rename .env.example to .env.local and fill in your keys:

.env.local
# OpenAI — platform.openai.com/api-keys
OPENAI_API_KEY=sk-proj-...

# Pinecone — app.pinecone.io
PINECONE_API_KEY=pc-sk-...

# Must match the index name you create in Pinecone
PINECONE_INDEX=fast-rag

# Optional: only needed for URL scraping
BROWSERLESS_TOKEN=your-token-here
OPENAI_API_KEYRequired

Powers both text-embedding-3-small (for ingestion) and GPT-3.5-turbo (for chat).

PINECONE_API_KEYRequired

Used to upsert and query your vector index.

PINECONE_INDEXRequired

Must exactly match the index name in your Pinecone dashboard. Case-sensitive — 'fast-rag' ≠ 'Fast-RAG'.

BROWSERLESS_TOKENOptional

Powers the headless Chromium instance for scraping JS-rendered sites. Skip if you don't use URL ingestion.

Pinecone Setup

Critical: This is the most common setup mistake. Using wrong settings will crash the app immediately.
1

Go to app.pinecone.io and sign in

2

Click "Create Index"

3

Use these exact settings:

Namefast-ragMust match PINECONE_INDEX in .env.local
Dimensions1024⚠️ DO NOT use the default 1536
MetricCosine
CloudAWS — us-east-1Recommended
4

Click Create and wait ~30 seconds for the index to initialize

Why 1024 dimensions? FastRAG uses text-embedding-3-small at 1024 dims instead of the default 1536. This cuts Pinecone storage costs by ~33% with negligible quality loss.

Running Locally

bash
npm run dev

Open http://localhost:3000 in your browser.

Quick test: Upload a small PDF (<1MB), wait for ingestion, then ask a question about it. If you get a cited answer — everything works.

Architecture

FastRAG is a standard two-phase RAG pipeline:

PDF / URL parse chunk embed Pinecone
↑ ingest (once per document)
User Question embed query top-k chunks
↓ retrieval
System Prompt + chunks GPT-3.5 Streamed Answer
pages/api/ingest.js

PDF uploads, chunking, vector upsert

pages/api/ingest-url.js

URL scraping, Puppeteer, vectorize

pages/api/chat.js

Retrieve chunks, stream GPT response

PDF Ingestion

Handled by pages/api/ingest.js. Supports multiple files simultaneously.

01
Form Parsing formidable handles the multipart upload and exposes file paths.
02
Loading LangChain's PDFLoader extracts raw text from each file.
03
Splitting RecursiveCharacterTextSplitter cuts text into 1000-char chunks with 200-char overlap. The overlap preserves sentence context at boundaries.
04
Embedding text-embedding-3-small converts each chunk to a 1024-dim vector.
05
Storage Vectors are upserted to Pinecone under a 'global' namespace so all docs are searched together.

URL Ingestion

Handled by pages/api/ingest-url.js. Paste any URL to scrape, clean, and vectorize it.

01
Headless Browser puppeteer-core connects to Browserless.io — a remote Chromium instance that renders JavaScript, so it works on React and Next.js sites.
02
Extraction Pulls full body text after JS execution completes.
03
Metadata Tags each vector with the source URL so the AI can cite it in responses.
Puppeteer scrapes a single page, not an entire site. It will not follow links or crawl multiple pages automatically.

Chat & Retrieval

Handled by pages/api/chat.js.

01
Embed Question The user's message is converted to a vector using the same model as ingestion.
02
Pinecone Query Top matching chunks are retrieved via similarity search (top-4 by default).
03
Prompt Construction Retrieved chunks are injected into a system prompt with instructions to cite the source.
04
Streaming GPT-3.5-turbo streams the response via LangChainAdapter and Vercel AI SDK.

Frontend

Lives in pages/index.js. A single-page chat interface with two modes.

File Upload Mode
Drag-and-drop or click to upload PDFs. Multiple files supported. Triggers /api/ingest.
URL Mode
Paste any URL to scrape and ingest. Triggers /api/ingest-url.
Chat Interface
useChat from ai/react manages the full streaming lifecycle.
Mobile Ready
Fully responsive. Works natively on iOS and Android browsers.

Deploy to Vercel

FastRAG is optimized for Vercel. Deployment takes about 5 minutes.

1

Push your code to GitHub

bash
git init && git add .
git commit -m "initial"
git remote add origin https://github.com/you/fastrag.git
git push -u origin main
2

Import to Vercel

Go to vercel.com/new, import your repo, select Next.js as the framework.

3

Add all environment variables

In Vercel project settings → Environment Variables, add all keys from your .env.local:

+OPENAI_API_KEY
+PINECONE_API_KEY
+PINECONE_INDEX
+BROWSERLESS_TOKEN
4

Click Deploy — your app goes live in ~2 minutes

Vercel's free Hobby plan has a 10s function timeout. Large PDFs or slow scraping jobs may timeout. Upgrade to Pro for a 60s limit.

Troubleshooting

429 — "You exceeded your current quota"

Cause: OpenAI accounts require pre-paid credits. A new API key alone isn't enough.

Fix: Go to platform.openai.com/settings/organization/billing and add $5. May take 5–10 minutes to activate.

"Vector dimension 1536 does not match index 1024"

Cause: Your Pinecone index was created with default settings (1536 dims).

Fix: Delete the index and recreate it with Dimensions: 1024. See Pinecone Setup above.

"PineconeNotFoundError: 404"

Cause: PINECONE_INDEX env var doesn't match the index name in your dashboard.

Fix: Check .env.local — value must match exactly, including case.

Scraping returns empty content

Cause: Site may be heavily client-side rendered, behind auth, or blocking scrapers.

Fix: Try a different URL. Docs sites and blogs work best. Paywalled pages won't work.

FAQ

Q: Do I need a paid Pinecone plan?

A: No. Free Starter plan supports 1 index and up to 100K vectors — plenty for development and small projects.

Q: Can I swap OpenAI for another model?

A: Yes. Chat and embedding logic is in pages/api/chat.js and ingest.js. Swap to any LangChain-compatible provider — Anthropic, Mistral, Cohere, etc.

Q: Can I use this commercially?

A: Yes. MIT license. Build products on top of FastRAG and sell them.

Q: What file types are supported besides PDF?

A: Currently only PDF. Extend ingest.js with LangChain's other loaders to support .txt, .docx, or .md.

Q: How do I clear all uploaded documents?

A: In Pinecone dashboard, go to your index → Namespaces → delete the 'global' namespace.

Q: Will I get future updates?

A: Yes. All buyers get lifetime access to the private GitHub repo. v1.3 is current.

Ready to ship?

Get the full source code and start building your AI app this weekend.

Get FastRAG — $29 →

One-time · MIT License · Lifetime updates