Documentation

Getting Started

Overview

FastRAG is a production-ready RAG (Retrieval-Augmented Generation) starter kit built with Next.js, LangChain, Pinecone, Claude Haiku, and Voyage AI embeddings. It eliminates 40+ hours of boilerplate — vector ingestion pipelines, streaming responses, context window management, and a mobile-ready chat UI — so you can focus on building your actual product.

Haiku 4.5

Claude

voyage-3.5

Voyage AI

Serverless

Pinecone

Cheerio

Scraping

What you getPDF ingestion (up to 20MB), URL scraping with Cheerio, streaming chat with source citations via Claude Haiku, Voyage AI embeddings with 200M free tokens, IP-based demo rate limiting, and full source code you actually understand — no black boxes.

Getting Started

Prerequisites

You need four API keys. All services have generous free tiers — you can run FastRAG at zero cost during development.

Node.js v18+Required

Required to run Next.js locally.

Get key

ANTHROPIC_API_KEYRequired

Powers Claude Haiku streaming chat. Get your key from console.anthropic.com — requires a payment method but Haiku is very affordable.

Get key

VOYAGEAI_API_KEYRequired

Voyage AI is Anthropic's recommended embeddings provider. voyage-3.5 gives you 200M free tokens — roughly 400,000 document pages at no cost.

Get key

PINECONE_API_KEYRequired

Vector database for storing and querying embeddings. The free Starter plan supports 1 index and up to 100K vectors — sufficient for most projects.

Get key

Getting Started

Installation

Clone or unzip the project

If you have GitHub repo access (included with all purchases):

bash

git clone fastrag.git
cd fastrag

Or unzip the downloaded ZIP file and open the folder in your terminal — both options are included with your purchase.

Install dependencies

bash

npm install

Seeing peer dependency warnings? Run npm install --legacy-peer-deps — common due to LangChain's rapid release cadence.

Getting Started

Environment Setup

Rename .env.example to .env.local and fill in your four keys:

.env.local

# Anthropic Claude — console.anthropic.com
ANTHROPIC_API_KEY=sk-ant-...

# Voyage AI embeddings — dash.voyageai.com
VOYAGEAI_API_KEY=pa-...

# Pinecone vector DB — app.pinecone.io
PINECONE_API_KEY=pcsk_...

# Must match the index name you create in Pinecone (case-sensitive)
PINECONE_INDEX=fast-rag

ANTHROPIC_API_KEY

Required

Powers Claude Haiku for streaming chat. Add a payment method at console.anthropic.com — Haiku costs fractions of a cent per message.

VOYAGEAI_API_KEY

Required

Powers voyage-3.5 embeddings for both ingestion and retrieval. First 200M tokens are free.

PINECONE_API_KEY

Required

Used to upsert vectors during ingestion and query them during chat.

PINECONE_INDEX

Required

Must exactly match the index name in Pinecone — case-sensitive. "fast-rag" ≠ "Fast-RAG".

Getting Started

Pinecone Setup

CriticalUsing the wrong dimension setting will crash the app on first ingestion. voyage-3.5 outputs 1024-dimensional vectors — your index must match exactly.

Go to app.pinecone.io and sign in

Click "Create Index"

Use these exact settings:

SettingValueNote

Namefast-ragMust match PINECONE_INDEX in .env.local

Dimensions1024Matches voyage-3.5 output dimension

MetricCosineRequired for semantic similarity

CloudAWS us-east-1Recommended for lowest latency

Click Create — wait ~30 seconds for the index to initialise

Why 1024 dimensions?Voyage AI's voyage-3.5 model outputs 1024-dimensional vectors natively. This is the correct dimension — no forcing or truncation required. It also keeps Pinecone storage costs lower than 1536-dim alternatives.

Getting Started

Running Locally

bash

npm run dev

Open http://localhost:3000 in your browser.

Quick testUpload a small PDF (<1 MB), wait for the ingestion confirmation, then ask a question about its contents. If you get a cited, streamed answer — everything is wired up correctly.

How It Works

Architecture

FastRAG is a standard two-phase RAG pipeline. Ingestion happens once per document; retrieval and generation happen on every chat message.

Ingestion (once per document)

Source

PDF / URL

Parse

Extract text

Chunk

1 000 chars

Embed

voyage-3.5

Store

Pinecone

Retrieval (every message)

Question

User input

Embed

voyage-3.5

Query

Top-5 chunks

Prompt

Inject context

Stream

Claude Haiku

Three Next.js API routes handle everything:

pages/api/ingest-pdf.jsMultipart PDF upload, text extraction, chunking, Voyage AI embedding, Pinecone upsert

pages/api/ingest-url.jsCheerio web scraping, content extraction, chunking, embedding, Pinecone upsert

pages/api/chat.jsVoyage AI query embedding, Pinecone similarity search, Claude Haiku SSE streaming

pages/api/demo-status.jsReturns current IP-based usage counters and reset timestamp for the demo UI

How It Works

PDF Ingestion

Handled by pages/api/ingest-pdf.js. Accepts a single PDF up to 20MB via multipart form upload.

Upload Parsing Formidable handles the multipart upload and writes the file to a temp path on the server filesystem.

Text Extraction pdf-parse reads the buffer and extracts all raw text, page by page.

Chunking RecursiveCharacterTextSplitter cuts text into 1000-character chunks with 200-character overlap. Overlap preserves sentence context across chunk boundaries.

Embedding embedBatch() sends all chunks to Voyage AI voyage-3.5 with inputType: "document". LangChain handles batch splitting automatically.

Storage Vectors are upserted to Pinecone under the specified namespace (default: "demo"). Each vector carries source filename and chunkIndex as metadata.

Cleanup The temp file is deleted from disk immediately after processing. Returns { chunksIngested, totalCharacters, source } to the frontend.

How It Works

URL Ingestion

Handled by pages/api/ingest-url.js. No headless browser required — Cheerio runs natively in Node.js with zero ESM conflicts on Vercel.

Fetch A standard fetch() call with a 15-second timeout retrieves the raw HTML. User-Agent is set to avoid basic bot blocking.

Parse & Clean Cheerio loads the HTML and removes script, style, nav, footer, header, noscript, and iframe elements.

Semantic Targeting Content is extracted preferentially from <main>, <article>, or [role="main"] before falling back to <body> — ensuring clean signal over boilerplate.

Text Normalisation Whitespace is collapsed, excessive newlines are trimmed, and the result is validated for minimum length (50 chars).

Chunk → Embed → Store The same chunking → Voyage AI embedding → Pinecone upsert pipeline as PDF ingestion. Source URL is stored as metadata for citations.

Cheerio scrapes a single page, not an entire site. It does not follow links or crawl multiple pages automatically. For multi-page ingestion, call the endpoint once per URL.

Pages behind authentication, paywalls, or heavy client-side rendering (where content loads after JS execution) may not scrape correctly. Public documentation sites and blogs work best.

How It Works

Chat & Retrieval

Handled by pages/api/chat.js. Every user message triggers a full retrieval cycle before Claude is called.

Rate Limit Check IP is extracted from x-forwarded-for headers. incrementUsage() checks and increments the message counter — returns 429 with resetAt if over limit.

Query Embedding The user's message is embedded using Voyage AI voyage-3.5 with inputType: "query" — a separate embedder instance optimised for asymmetric retrieval.

Pinecone Query Top-5 matching chunks are retrieved via similarity search. Matches below a 0.3 cosine score threshold are filtered out.

First SSE Event Before streaming begins, a JSON event is written containing sources[], hasContext flag, and current usage counters — so the UI can render citations and update meters immediately.

System Prompt Retrieved chunks are joined and injected into Claude's system prompt inside <context> tags. If no chunks pass the threshold, Claude is instructed to say so rather than hallucinate.

Claude Haiku Stream anthropic.messages.stream() streams response deltas as SSE text_delta events. The final event carries done: true and token usage stats.

How It Works

Frontend

The demo UI lives in pages/demo.js with components in components/. A two-column layout: left panel for ingestion, right panel for chat.

DemoUploader

Handles PDF drag-and-drop and URL input. Calls /api/ingest-pdf and /api/ingest-url. Shows real chunk stats on success.

ChatPanel

SSE stream consumer. Renders token deltas in real time, shows source cards on first event, auto-scrolls.

UsageMeter

Live progress bars for messages and ingestions. Amber at 70%, full lock wall with upgrade CTA at 100%.

SourceCard

Displays retrieved source filename/URL with animated relevance score bar. Rendered before answer text begins.

Deployment

Deploy to Vercel

FastRAG is optimised for Vercel. No special configuration needed — deployment takes about 5 minutes.

Push your code to GitHub

bash

git init && git add .
git commit -m "initial"
git remote add origin https://github.com/you/fastrag.git
git push -u origin main

Import to Vercel

Go to vercel.com/new, import your GitHub repo, and select Next.js as the framework preset.

Add environment variables

In Vercel project → Settings → Environment Variables, add all four keys:

+ANTHROPIC_API_KEY

+VOYAGEAI_API_KEY

+PINECONE_API_KEY

+PINECONE_INDEX

Click Deploy — live in ~2 minutes

Cheerio-based scraping runs in Vercel's standard Node.js runtime with no special configuration. No Puppeteer layer, no external browser service, no cold-start penalties.

Vercel's free Hobby plan has a 10-second function timeout. Very large PDFs may hit this limit. Upgrade to Pro ($20/mo) for a 60-second limit, or chunk large files before upload.

Reference

Troubleshooting

Click any error to expand the cause and fix.

Reference

Overview

Prerequisites

Installation

Environment Setup

Pinecone Setup

Running Locally

Architecture

PDF Ingestion

URL Ingestion

Chat & Retrieval

Frontend

Deploy to Vercel

Troubleshooting

FAQ