Fastest RAG API · cited · streaming · rerank by default

The Realtime Context Engine
for AI Agents & Apps.

Turn your data into real-time answers in the cloud, ready for any AI agent or app to use.

Built for production▲ Rust◇ Milvus⊡ Cerebras✦ Groq○ Jina AI
Retrieve0ms
TTFT0ms
Latency p500ms
Latency p950ms
IntegrationTS · PY · MCP · REST
ConnectorsGDrive · OneDrive · SharePoint
@alex
Why is RustyRAG so fast?

Three reasons. The whole stack is written in Rust, so there is no Python overhead. We run inference on Cerebras and Groq, the fastest LLM hosts in the world.

And retrieval, reranking, and citations all run in parallel. The result is cited answers in 300 ms, with rerank on by default.

150ms ttft · 300ms total
Ask anything about your documents…
Webcerebras⏎ send

Answers are grounded in your uploaded docs. Verify before acting on critical decisions.

Capabilities · 01 to 05

Five layers of a production RAG.
One platform.

From document upload to cited answer. Everything you would otherwise stitch together from five different services.

01Realtime retrieval

Cited answers in 300 milliseconds.

Hybrid semantic and keyword search across millions of chunks. Streaming first. Fast enough for voice agents and live chat.

RustyRAG
300 ms
Vendor A
920 ms
Vendor B
1.34 s
02Connectors & ingestion

Bring any source. Tables, scans, and the messy stuff too.

Uploads, Google Drive, OneDrive, SharePoint, and web crawl. Auto-refresh when documents change. OCR for scans, layout parsing for PDFs and tables.

PDF
DOCX
GDrive
OneDrive
SharePoint
Web
Upload
More
03Shared collections

Collections, shared across your team.

Organize docs into collections. Share with a teammate or the whole org. Role-based access. Revoke in one click.

legal-q42,418 pages · acme-prod
shared
EWJDRT+12
3 editors · 12 readers
04Integration

One API. Four ways in.

TypeScript and Python SDKs. A native MCP server for agents. And a clean REST API you can call from anything that speaks HTTP.

// install: npm i @rustyrag/node
import { RustyRAG } from "@rustyrag/node";

const rag = new RustyRAG({ apiKey: process.env.RR_KEY });
const out = await rag.answer({ query: "refund policy?" });
# install: pip install rustyrag
from rustyrag import RustyRAG

rag = RustyRAG(api_key=os.environ["RR_KEY"])
out = rag.answer(query="refund policy?")
// claude_desktop_config.json — drop in any MCP host
{
  "mcpServers": {
    "rustyrag": {
      "command": "npx",
      "args": ["-y", "@rustyrag/mcp"],
      "env": { "RR_KEY": "<your_key>" }
    }
  }
}
# OpenClaw: REST, curl, anything HTTP
curl https://api.rustyrag.ai/v1/answer \
  -H "Authorization: Bearer $RR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query":"refund policy?"}'
05Observability

Trace every query. Audit every answer.

Per-workspace usage. Cost attribution. Query replays for debugging and audit.

Queries / min
12,408
+18.4% · 24h
Hit @ 5
94.2%
+2.1pp
Integration · under 10 minutes

If your team can call an API, you can ship this today.

Push a document. Call the API. Stream cited answers back. No infrastructure to manage. No retrieval pipeline to tune.

01
Get an API key. Takes seconds.
02
Push your documents. Uploads, Google Drive, OneDrive, SharePoint, or your web crawl.
03
Ask questions. Stream cited answers from your app, agent, or workflow.
agent.tstypescript
// Realtime, cited answers from your docs
import { RustyRAG } from "rustyrag";

const client = new RustyRAG({ apiKey: process.env.RR_KEY });

// Upload once
const up = await client.upload("policy.pdf");
await client.waitForUpload(up.task_id);

// Stream a cited answer
const stream = await client.answer("refund policy?", { stream: true });

for await (const c of stream) process.stdout.write(c.text);

// → 150ms TTFT · 300ms total · cited
Architecture

From raw document to cited answer.
Three stages.

Stage 01Ingest

Read everything.
Lose nothing.

Parsers for PDFs, scans, tables, forms, code, and the web. Layout-aware chunking. All metadata preserved.

PDF
chunk_01
chunk_02
chunk_03
chunk_04
indexed
Stage 02Index

Hybrid index.
Dense and lexical.

Vectors for meaning. BM25 for exact terms. Auto-refreshing, tenant-isolated, encrypted at rest.

vector space
dense1536d · cosine
bm25sparse · keyword
metadatafilterable
acltenant-isolated
Stage 03Retrieve

Stream answers.
Cite the source.

Hybrid recall. Cross-encoder rerank. Deduped and cited. Streamed in under 300 ms at p50.

query
recalltop 50
rerankcross-encoder
cite + streamSSE
Team · the founders

Built by builders.
Shipping in production.

Ignas Vaitukaitis

Ignas Vaitukaitis

Founder & CEO

8 years developing enterprise software, the last 4 of them full time on AI agents and RAG applications. Deployed 20+ projects to production for enterprise clients. From Lithuania, based in Rio de Janeiro.

Miguel de Frias

Miguel de Frias

Founder & CTO

4 years building AI solutions for clients across Latin America, the US, and Europe, with deep focus on agentic systems and RAG. Bachelor's in Software Engineering from the University of Brasília, currently pursuing a Master's in Computer Science. Based in Brasília.

Benchmarks · Internal · 10k docs

Faster than anything you can stitch together.

End-to-end retrieval. 10k document corpus.

Internal benchmark across financial, legal, and PubMed documents. Lower is faster. Lower is cheaper.

RustyRAGVendor AVendor BDIY (pgvector + LLM)
Latency p50
RustyRAG
300 ms
Vendor A
680 ms
Vendor B
940 ms
DIY
1.25 s
Cost / 1k queries
RustyRAG
$0.18
Vendor A
$0.46
Vendor B
$0.59
DIY
$0.80
Pricing · usage-based

Pricing that scales with you.
Same product, top to bottom.

Top-up
Try it without a subscription
$5once
One-time pack. No card-on-file, no auto-renew.
Book a demo
  • 500 answers
  • 250 pages
  • No web search
  • 1 collection
Starter
First feature in prod
$49/mo
When one feature ships to real users.
Book a demo
  • 7,000 answers / mo
  • 5,000 pages
  • No web search
  • 1 collection
ProPopular
Growing AI products
$199/mo
Where most teams settle in.
Book a demo
  • 27,000 answers / mo
  • 25,000 pages
  • 3,000 web searches / mo
  • 10 collections
Scale
Real customer traffic
$499/mo
When usage compounds and uptime matters.
Book a demo
  • 60,000 answers / mo
  • 100,000 pages
  • 8,000 web searches / mo
  • 20 collections
Business
Serious volume
$999/mo
For teams running it everywhere.
Book a demo
  • 120,000 answers / mo
  • 300,000 pages
  • 15,000 web searches / mo
  • 50 collections
Included in every plan
  • Unlimited retrieval
  • 99.9% uptime SLA
  • Hybrid search and rerank
  • Web search under 100ms — not a hosted Google call
  • Streaming with citations
  • MCP, REST, and SDKs
  • All connectors

Enterprise · custom

On-prem deployment, dedicated infrastructure, custom SLA, security review, white-glove migration.

Talk to sales

1 page is about 3,000 characters · monthly cap, rolls over once · cancel anytime

FAQ · the common ones

Questions, answered.

We solve the entire pipeline for you. Ingestion, parsing, hybrid retrieval, reranking, citations, and observability all in one API. You skip the five-service stitch and ship features instead of plumbing.

Encrypted at rest and in transit. Each customer gets a tenant-isolated index. Enterprise customers can run RustyRAG on their own infrastructure. We never train on your data.

We run inference on Groq and Cerebras, the two fastest LLM hosts on the planet. Both serve a range of open models including Llama, Qwen, GPT-OSS, and Mixtral. You pick which one per request. Native MCP support means any MCP-compatible agent can call RustyRAG directly.

Yes. p50 under 300 ms, p95 under 600 ms. The API is streaming-first and built for live voice and chat. Voice agents are one of the main use cases we optimize for.

Soft caps. We notify you well before you hit the limit. Overage is metered at a clean per-unit rate. No surprise bills.

Yes. We have migration tooling for the major vendors. Business and Enterprise plans include white-glove migration support.

Ship the AI feature on your roadmap.
This quarter.

Production-ready in an afternoon. Book a 20-minute call and we will get you running on your own documents.

Book a demo