Open Source RAG Engine

Realtime RAG

for AI Agents & Developers

Sub-200ms responses. 3 endpoints. Upload docs, search, and get answers.

v0.4.0 on GitHub
<200ms
Time to First Token
94.5%
Pass Rate
3
Endpoints
18+
File Formats

Works with

Python
TypeScript
cURL
Claude Code
OpenClaw
Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·
Use Cases

Built for Low Latency

Every millisecond counts. RustyRAG is purpose-built for applications where retrieval speed directly impacts user experience.

Voice AI & Conversational Agents

Sub-200ms TTFT enables natural voice interactions. Ground your voice assistant with real-time document retrieval without perceptible delay.

< 200ms critical

AI Agents & Agentic Workflows

Give your autonomous agents a reliable, fast knowledge backend. Low latency means agents can reason and retrieve in tight loops without bottlenecks.

Tool calling

Legal & Compliance

Analyze contracts, filings, and regulatory documents with table-aware extraction. Accurate retrieval across thousands of PDFs with source citations.

Table extraction

Gaming & Interactive Experiences

Dynamic NPC dialogue, in-game lore retrieval, and real-time quest assistance. Low latency keeps players immersed without breaking flow.

Real-time

Financial Services

Earnings reports, SEC filings, market research — retrieve and synthesize financial data at speed. Compliance-ready with source attribution.

Multi-format

Healthcare & Life Sciences

Clinical guidelines, research papers, and patient documentation. Fast retrieval supports time-critical decision making with cited sources.

Citation-backed
Features

Everything You Need

A complete RAG pipeline in a single API. No orchestration complexity, no glue code — just fast, accurate retrieval-augmented generation.

Sub-200ms TTFT
Built on Rust with Tokio async runtime. Lightning-fast time to first token for real-time applications.
Hybrid Search
Dense HNSW + BM25 sparse search with Reciprocal Rank Fusion. Optional cross-encoder reranking.
Multi-Format Ingestion
18+ file formats including PDF, Office docs, images, Markdown, LaTeX, and more. Text extraction with table preservation, OCR, and AI-powered image descriptions.
SSE Streaming
Server-Sent Events streaming with source citations. Real-time token delivery for conversational UIs.
Production Ready
Single binary deployment with Docker Compose. Built-in eval framework with 3,045-question benchmark.
Groq & Cerebras Powered
Powered by the lowest inference LLM providers out there. Groq LPU and Cerebras wafer-scale chips deliver token speeds no one else can match.
Benchmarks

Benchmarked at Scale

Evaluated on 3,045 questions from Open RAG Bench across 1,000 academic PDFs (57,347 chunks). LLM-as-judge evaluation using Cerebras qwen-3-235b.

With Reranker
Recommended
Jina Reranker v3 cross-encoder enabled
94.5%
Pass Rate
279ms
Avg TTFT
Failed: 167 / 3,024
Total Time: 883ms
Without Reranker
Fastest
Pure hybrid HNSW + BM25 search with RRF
91.6%
Pass Rate
181ms
Avg TTFT
Failed: 253 / 3,020
Total Time: 511ms
Integration

3 Steps.
That's It.

1
Get your API key
Sign up and generate your key in seconds.
2
Upload your documents
PDF, Office docs, images, Markdown, LaTeX — one endpoint, 18+ formats.
3
Search & get answers
Sub-200ms responses with source citations.
# Upload a document
curl -X POST https://api.rustyrag.ai/v1/upload \
  -H "Authorization: Bearer rr_sk_..." \
  -F "file=@report.pdf"

# Search and answer
curl -X POST https://api.rustyrag.ai/v1/answer \
  -H "Authorization: Bearer rr_sk_..." \
  -H "Content-Type: application/json" \
  -d '{"message": "What is the main finding?", "model": "qwen-3-235b-a22b-instruct-2507", "provider": "cerebras"}'
Coming soon
Python SDKTypeScript SDKOpenClaw Integration
API

3 Endpoints. That's All.

No complex orchestration. Upload, search, and answer — each with full Swagger documentation.

Upload Documents
POST/v1/upload
Upload and index files. Text is extracted from documents, images (via OCR + AI vision), markup, and more. Automatic chunking, embedding, and indexing.
PDFDOCXPPTXXLSXImagesMarkdownLaTeXHTMLZIP
Semantic Search
POST/v1/search
Hybrid dense + sparse search with optional reranking. Returns the most relevant document chunks with similarity scores.
Hybrid HNSWBM25RRFReranker
Search & Answer
POST/v1/answer
Full RAG pipeline — retrieves relevant context and generates an answer with source citations. JSON or SSE streaming.
SSE StreamCitationsGroq & Cerebras

Supported File Formats

Every format is converted to text for search and retrieval. No images or binary data is stored — only the extracted text content.

Documents
.pdf, .docx, .pptx, .xlsx

Full text with table structure preserved. Headers, paragraphs, lists, and table cells are extracted as structured text.

Images
.png, .jpeg, .jpg, .tiff, .bmp, .webp

OCR extracts visible text and numbers. AI vision model generates a description of charts, diagrams, and visual content.

Markup
.html, .xhtml, .md, .adoc, .tex

Rendered text content is extracted. HTML tags, Markdown syntax, and LaTeX commands are stripped — only the readable text remains.

Plain Text
.txt, .csv, .vtt, .zip

Raw text content as-is. CSV cells are preserved. WebVTT subtitle text is extracted. ZIP archives process all supported files inside.

Pricing

Simple, Transparent Pricing

Open source forever. Use our hosted version for zero-ops, or go enterprise for dedicated global deployments.

Open Source
Free
$0forever
Self-host on your own infrastructure. Full source code, no limits.
  • Full source code access
  • All 3 endpoints
  • Unlimited documents
  • Deploy anywhere
  • Community support
  • Docker Compose ready
View on GitHub
Most Popular
Cloud
$100/mo
1x
Managed infrastructure. Sub-200ms latency in US. Scale up or down anytime.
  • 5K queries/week
  • 50K chunks
  • 10 GB storage
  • 2 GB max file size
  • Unlimited uploads
  • Sub-200ms latency (US)
  • Managed infrastructure
  • Priority email support
  • 99.9% uptime SLA
Custom
Enterprise
Custom
Dedicated instances deployed anywhere in the world, tailored to your needs.
  • Dedicated infrastructure
  • Global deployment
  • Custom SLA & latency targets
  • On-prem or private cloud
  • Dedicated support engineer
  • Custom model selection
  • SSO & audit logs
  • Volume discounts

Cloud Usage Tiers

Start at $100/mo. Scale up or down anytime with the tier selector above.

All tiers include sub-200ms latency, 99.9% SLA, and priority support. Need more? Contact us for Enterprise.

Team

Built by AI Engineers

A small team obsessed with speed and simplicity.

Ignas Vaitukaitis

Ignas Vaitukaitis

CEO, AI Agent Engineer

8 years in software engineering and enterprise ERP software.

Miguel de Frias

Miguel de Frias

CTO, AI Agent Engineer

M.Sc. Computer Science at UnB.

Start Building with Sub-200ms RAG

Join the fastest RAG API. Open source, self-hostable, and production-ready.