Cited answers in 300 milliseconds.
Hybrid semantic and keyword search across millions of chunks. Streaming first. Fast enough for voice agents and live chat.
Turn your data into real-time answers in the cloud, ready for any AI agent or app to use.
Three reasons. The whole stack is written in Rust, so there is no Python overhead. We run inference on Cerebras and Groq, the fastest LLM hosts in the world.
And retrieval, reranking, and citations all run in parallel. The result is cited answers in 300 ms, with rerank on by default.
Answers are grounded in your uploaded docs. Verify before acting on critical decisions.
From document upload to cited answer. Everything you would otherwise stitch together from five different services.
Hybrid semantic and keyword search across millions of chunks. Streaming first. Fast enough for voice agents and live chat.
Uploads, Google Drive, OneDrive, SharePoint, and web crawl. Auto-refresh when documents change. OCR for scans, layout parsing for PDFs and tables.
Organize docs into collections. Share with a teammate or the whole org. Role-based access. Revoke in one click.
TypeScript and Python SDKs. A native MCP server for agents. And a clean REST API you can call from anything that speaks HTTP.
// install: npm i @rustyrag/node import { RustyRAG } from "@rustyrag/node"; const rag = new RustyRAG({ apiKey: process.env.RR_KEY }); const out = await rag.answer({ query: "refund policy?" });
# install: pip install rustyrag from rustyrag import RustyRAG rag = RustyRAG(api_key=os.environ["RR_KEY"]) out = rag.answer(query="refund policy?")
// claude_desktop_config.json — drop in any MCP host { "mcpServers": { "rustyrag": { "command": "npx", "args": ["-y", "@rustyrag/mcp"], "env": { "RR_KEY": "<your_key>" } } } }
# OpenClaw: REST, curl, anything HTTP curl https://api.rustyrag.ai/v1/answer \ -H "Authorization: Bearer $RR_KEY" \ -H "Content-Type: application/json" \ -d '{"query":"refund policy?"}'
Per-workspace usage. Cost attribution. Query replays for debugging and audit.
Push a document. Call the API. Stream cited answers back. No infrastructure to manage. No retrieval pipeline to tune.
// Realtime, cited answers from your docs import { RustyRAG } from "rustyrag"; const client = new RustyRAG({ apiKey: process.env.RR_KEY }); // Upload once const up = await client.upload("policy.pdf"); await client.waitForUpload(up.task_id); // Stream a cited answer const stream = await client.answer("refund policy?", { stream: true }); for await (const c of stream) process.stdout.write(c.text); // → 150ms TTFT · 300ms total · cited
Parsers for PDFs, scans, tables, forms, code, and the web. Layout-aware chunking. All metadata preserved.
Vectors for meaning. BM25 for exact terms. Auto-refreshing, tenant-isolated, encrypted at rest.
Hybrid recall. Cross-encoder rerank. Deduped and cited. Streamed in under 300 ms at p50.


On-prem deployment, dedicated infrastructure, custom SLA, security review, white-glove migration.
1 page is about 3,000 characters · monthly cap, rolls over once · cancel anytime
We solve the entire pipeline for you. Ingestion, parsing, hybrid retrieval, reranking, citations, and observability all in one API. You skip the five-service stitch and ship features instead of plumbing.
Encrypted at rest and in transit. Each customer gets a tenant-isolated index. Enterprise customers can run RustyRAG on their own infrastructure. We never train on your data.
We run inference on Groq and Cerebras, the two fastest LLM hosts on the planet. Both serve a range of open models including Llama, Qwen, GPT-OSS, and Mixtral. You pick which one per request. Native MCP support means any MCP-compatible agent can call RustyRAG directly.
Yes. p50 under 300 ms, p95 under 600 ms. The API is streaming-first and built for live voice and chat. Voice agents are one of the main use cases we optimize for.
Soft caps. We notify you well before you hit the limit. Overage is metered at a clean per-unit rate. No surprise bills.
Yes. We have migration tooling for the major vendors. Business and Enterprise plans include white-glove migration support.
Production-ready in an afternoon. Book a 20-minute call and we will get you running on your own documents.