App API Docs Sign Up

Open Source RAG Engine

Realtime RAG

for AI Agents & Developers

Sub-200ms responses. 3 endpoints. Upload docs, search, and get answers.

Get Started Open App API Docs

v0.4.0 on GitHub

<200ms

Time to First Token

94.5%

Pass Rate

Endpoints

18+

File Formats

Works with

Python

TypeScript

cURL

Claude Code

OpenClaw

Built with

Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·

Use Cases

Built for Low Latency

Every millisecond counts. RustyRAG is purpose-built for applications where retrieval speed directly impacts user experience.

Voice AI & Conversational Agents

Sub-200ms TTFT enables natural voice interactions. Ground your voice assistant with real-time document retrieval without perceptible delay.

< 200ms critical

AI Agents & Agentic Workflows

Give your autonomous agents a reliable, fast knowledge backend. Low latency means agents can reason and retrieve in tight loops without bottlenecks.

Tool calling

Legal & Compliance

Analyze contracts, filings, and regulatory documents with table-aware extraction. Accurate retrieval across thousands of PDFs with source citations.

Table extraction

Gaming & Interactive Experiences

Dynamic NPC dialogue, in-game lore retrieval, and real-time quest assistance. Low latency keeps players immersed without breaking flow.

Real-time

Financial Services

Earnings reports, SEC filings, market research — retrieve and synthesize financial data at speed. Compliance-ready with source attribution.

Multi-format

Healthcare & Life Sciences

Clinical guidelines, research papers, and patient documentation. Fast retrieval supports time-critical decision making with cited sources.

Citation-backed

Features

Everything You Need

A complete RAG pipeline in a single API. No orchestration complexity, no glue code — just fast, accurate retrieval-augmented generation.

Sub-200ms TTFT

Built on Rust with Tokio async runtime. Lightning-fast time to first token for real-time applications.

Hybrid Search

Dense HNSW + BM25 sparse search with Reciprocal Rank Fusion. Optional cross-encoder reranking.

Multi-Format Ingestion

18+ file formats including PDF, Office docs, images, Markdown, LaTeX, and more. Text extraction with table preservation, OCR, and AI-powered image descriptions.

SSE Streaming

Server-Sent Events streaming with source citations. Real-time token delivery for conversational UIs.

Production Ready

Single binary deployment with Docker Compose. Built-in eval framework with 3,045-question benchmark.

Groq & Cerebras Powered

Powered by the lowest inference LLM providers out there. Groq LPU and Cerebras wafer-scale chips deliver token speeds no one else can match.

Benchmarks

Benchmarked at Scale

Evaluated on 3,045 questions from Open RAG Bench across 1,000 academic PDFs (57,347 chunks). LLM-as-judge evaluation using Cerebras qwen-3-235b.

With Reranker

Recommended

Jina Reranker v3 cross-encoder enabled

94.5%

Pass Rate

279ms

Avg TTFT

Failed: 167 / 3,024

Total Time: 883ms

Without Reranker

Fastest

Pure hybrid HNSW + BM25 search with RRF

91.6%

Pass Rate

181ms

Avg TTFT

Failed: 253 / 3,020

Total Time: 511ms

Integration

3 Steps.
That's It.

Get your API key

Upload your documents

PDF, Office docs, images, Markdown, LaTeX — one endpoint, 18+ formats.

Search & get answers

Sub-200ms responses with source citations.

Get API Key Swagger Docs

# Upload a document
curl -X POST https://api.rustyrag.ai/v1/upload \
  -H "Authorization: Bearer rr_sk_..." \
  -F "file=@report.pdf"

# Search and answer
curl -X POST https://api.rustyrag.ai/v1/answer \
  -H "Authorization: Bearer rr_sk_..." \
  -H "Content-Type: application/json" \
  -d '{"message": "What is the main finding?", "model": "qwen-3-235b-a22b-instruct-2507", "provider": "cerebras"}'

# Connect RustyRAG to Claude Code via MCP
claude mcp add --transport http rustyrag \
  https://api.rustyrag.ai/v1/mcp \
  --header "Authorization: Bearer rr_sk_..."

# Then in any Claude Code session, just ask:
# "What does my report say about Q3 revenue?"
# Claude calls rustyrag.search() automatically and streams the answer.

Coming soon

Python SDKTypeScript SDKOpenClaw Integration

API

3 Endpoints. That's All.

No complex orchestration. Upload, search, and answer — each with full Swagger documentation.

Upload Documents

POST/v1/upload

Upload and index files. Text is extracted from documents, images (via OCR + AI vision), markup, and more. Automatic chunking, embedding, and indexing.

PDFDOCXPPTXXLSXImagesMarkdownLaTeXHTMLZIP

Semantic Search

POST/v1/search

Hybrid dense + sparse search with optional reranking. Returns the most relevant document chunks with similarity scores.

Hybrid HNSWBM25RRFReranker

Search & Answer

POST/v1/answer

Full RAG pipeline — retrieves relevant context and generates an answer with source citations. JSON or SSE streaming.

SSE StreamCitationsGroq & Cerebras

Supported File Formats

Every format is converted to text for search and retrieval. No images or binary data is stored — only the extracted text content.

Documents

.pdf, .docx, .pptx, .xlsx

Full text with table structure preserved. Headers, paragraphs, lists, and table cells are extracted as structured text.

Images

.png, .jpeg, .jpg, .tiff, .bmp, .webp

OCR extracts visible text and numbers. AI vision model generates a description of charts, diagrams, and visual content.

Markup

.html, .xhtml, .md, .adoc, .tex

Rendered text content is extracted. HTML tags, Markdown syntax, and LaTeX commands are stripped — only the readable text remains.

Plain Text

.txt, .csv, .vtt, .zip

Raw text content as-is. CSV cells are preserved. WebVTT subtitle text is extracted. ZIP archives process all supported files inside.

Pricing

Simple, Transparent Pricing

Open source forever. Use our hosted version for zero-ops, or go enterprise for dedicated global deployments.

Open Source

Free

$0forever

Self-host on your own infrastructure. Full source code, no limits.

Full source code access
All 3 endpoints
Unlimited documents
Deploy anywhere
Community support
Docker Compose ready

View on GitHub

Cloud Usage Tiers

Start at $100/mo. Scale up or down anytime with the tier selector above.

Tier	Price	Queries/week	Chunks	Storage
1x	$100/mo	5K	50K	10 GB
2x	$200/mo	15K	200K	50 GB
5x	$500/mo	50K	750K	150 GB
10x	$1,000/mo	100K	1.5M	300 GB

All tiers include sub-200ms latency, 99.9% SLA, and priority support. Need more? Contact us for Enterprise.

Team

Built by AI Engineers

A small team obsessed with speed and simplicity.

Ignas Vaitukaitis

CEO, AI Agent Engineer

8 years in software engineering and enterprise ERP software.

Miguel de Frias

CTO, AI Agent Engineer

M.Sc. Computer Science at UnB.

Start Building with Sub-200ms RAG

Join the fastest RAG API. Open source, self-hostable, and production-ready.

Get Started Self-Host Instead

Realtime RAG

Built for Low Latency

Voice AI & Conversational Agents

AI Agents & Agentic Workflows

Legal & Compliance

Gaming & Interactive Experiences

Financial Services

Healthcare & Life Sciences

Everything You Need

Benchmarked at Scale

3 Steps.That's It.

3 Endpoints. That's All.

Supported File Formats

Simple, Transparent Pricing

Cloud Usage Tiers

Built by AI Engineers

Ignas Vaitukaitis

Miguel de Frias

Start Building with Sub-200ms RAG

3 Steps.
That's It.