Open Source RAG Engine

Low Latency RAG API

for AI Agents & Developers

Sub-200ms responses. 3 endpoints. Upload docs, search, and get answers.

v0.4.0 on GitHub
<200ms
Time to First Token
94.5%
Pass Rate
3
Endpoints
7+
File Formats

Works with

Python
TypeScript
cURL
Claude Code
OpenClaw
Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·Milvus·Jina AI·Cerebras·Groq·Rust·
Use Cases

Built for Low Latency

Every millisecond counts. RustyRAG is purpose-built for applications where retrieval speed directly impacts user experience.

Voice AI & Conversational Agents

Sub-200ms TTFT enables natural voice interactions. Ground your voice assistant with real-time document retrieval without perceptible delay.

< 200ms critical

AI Agents & Agentic Workflows

Give your autonomous agents a reliable, fast knowledge backend. Low latency means agents can reason and retrieve in tight loops without bottlenecks.

Tool calling

Legal & Compliance

Analyze contracts, filings, and regulatory documents with table-aware extraction. Accurate retrieval across thousands of PDFs with source citations.

Table extraction

Gaming & Interactive Experiences

Dynamic NPC dialogue, in-game lore retrieval, and real-time quest assistance. Low latency keeps players immersed without breaking flow.

Real-time

Financial Services

Earnings reports, SEC filings, market research — retrieve and synthesize financial data at speed. Compliance-ready with source attribution.

Multi-format

Healthcare & Life Sciences

Clinical guidelines, research papers, and patient documentation. Fast retrieval supports time-critical decision making with cited sources.

Citation-backed
Features

Everything You Need

A complete RAG pipeline in a single API. No orchestration complexity, no glue code — just fast, accurate retrieval-augmented generation.

Sub-200ms TTFT
Built on Rust with Tokio async runtime. Lightning-fast time to first token for real-time applications.
Hybrid Search
Dense HNSW + BM25 sparse search with Reciprocal Rank Fusion. Optional cross-encoder reranking.
Multi-Format Ingestion
PDF, DOCX, PPTX, XLSX, HTML, TXT, and ZIP archives. Table extraction, OCR, and image understanding.
SSE Streaming
Server-Sent Events streaming with source citations. Real-time token delivery for conversational UIs.
Production Ready
Single binary deployment with Docker Compose. Built-in eval framework with 3,045-question benchmark.
Groq & Cerebras Powered
Powered by the lowest inference LLM providers out there. Groq LPU and Cerebras wafer-scale chips deliver token speeds no one else can match.
Benchmarks

Benchmarked at Scale

Evaluated on 3,045 questions from Open RAG Bench across 1,000 academic PDFs (57,347 chunks). LLM-as-judge evaluation using Cerebras qwen-3-235b.

With Reranker
Recommended
Jina Reranker v3 cross-encoder enabled
94.5%
Pass Rate
279ms
Avg TTFT
Failed: 167 / 3,024
Total Time: 883ms
Without Reranker
Fastest
Pure hybrid HNSW + BM25 search with RRF
91.6%
Pass Rate
181ms
Avg TTFT
Failed: 253 / 3,020
Total Time: 511ms
Integration

3 Steps.
That's It.

1
Get your API key
Sign up and generate your key in seconds.
2
Upload your documents
PDF, DOCX, PPTX, XLSX — one endpoint, any format.
3
Search & get answers
Sub-200ms responses with source citations.
terminal
# Upload a document
curl -X POST https://api.rustyrag.ai/api/v1/documents/upload \
  -H "Authorization: Bearer your-api-key" \
  -F "file=@report.pdf"

# Search and answer
curl -X POST https://api.rustyrag.ai/api/v1/chat-rag \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the main finding?"}'
Coming April 2026
Python SDKTypeScript SDKClaude CodeOpenClaw Integration
API

3 Endpoints. That's All.

No complex orchestration. Upload, search, and answer — each with full Swagger documentation.

Upload Documents
POST/api/v1/documents/upload
Upload and index files in any supported format. PDF, DOCX, PPTX, XLSX, HTML, TXT, or ZIP archives. Automatic chunking, embedding, and indexing.
PDFDOCXPPTXXLSXHTMLTXTZIP
Semantic Search
POST/api/v1/documents/search
Hybrid dense + sparse search with optional reranking. Returns the most relevant document chunks with similarity scores.
Hybrid HNSWBM25RRFReranker
Search & Answer
POST/api/v1/chat-rag/stream
Full RAG pipeline — retrieves relevant context and generates an answer with source citations. SSE streaming support.
SSE StreamCitationsGroq & Cerebras
Pricing

Simple, Transparent Pricing

Open source forever. Use our hosted version for zero-ops, or go enterprise for dedicated global deployments.

Open Source
Free
$0forever
Self-host on your own infrastructure. Full source code, no limits.
  • Full source code access
  • All 3 endpoints
  • Unlimited documents
  • Deploy anywhere
  • Community support
  • Docker Compose ready
View on GitHub
Most Popular
Cloud
$100/mo
1x
Managed infrastructure. Sub-200ms latency in US. Scale up or down anytime.
  • 1K queries/day
  • 10 collections, 100K chunks
  • 2GB document storage
  • Sub-200ms latency (US)
  • 5 concurrent requests
  • Managed infrastructure
  • Priority email support
  • 99.9% uptime SLA
Custom
Enterprise
Custom
Dedicated instances deployed anywhere in the world, tailored to your needs.
  • Dedicated infrastructure
  • Global deployment
  • Custom SLA & latency targets
  • On-prem or private cloud
  • Dedicated support engineer
  • Custom model selection
  • SSO & audit logs
  • Volume discounts

Cloud Usage Tiers

Start at $100/mo. Scale up or down anytime with the tier selector above.

All tiers include sub-200ms latency, 99.9% SLA, and priority support. Need more? Contact us for Enterprise.

Team

Built by AI Engineers

A small team obsessed with speed and simplicity.

Ignas Vaitukaitis

Ignas Vaitukaitis

CEO, AI Agent Engineer

8 years in software engineering and enterprise ERP software.

Miguel de Frias

Miguel de Frias

CTO, AI Agent Engineer

M.Sc. Computer Science at UnB.

Start Building with Sub-200ms RAG

Join the fastest RAG API. Open source, self-hostable, and production-ready.