retab

Ship next gen document AI

Complete developer platform and SDK for shipping state-of-the-art document processing in the age of LLMs.

Doc processing is a nightmare.

Every developer who's tried it knows the pain. Here's the inevitable progression:

Week 1

"I'll just use Tesseract OCR"

Works on test docs. Production hits: rotated text, spanning tables, messy handwriting.

Week 3

"I'll build a proper pipeline"

Now expert in: PDF libs, image deskewing, table detection, OCR confidence, why .xlsx files are HTML.

Week 6

"Prompt engineering can't be that hard..."

84% accuracy! Until you realize that's 16 wrong extractions per 100 docs. So now you're blindly changing field prompts and hoping for the best.

Week 10

"I need consensus, evals..."

Building: multi-model consensus, confidence scoring, eval frameworks, labeling tools. You're not shipping features anymore.

01
Define Schema
Natural language to schema

Natural Language to Structured Schema

Get a schema by uploading sample documents and describing what you want to extract.

Natural Language to Schema

Get a first version of your extraction schema by simply describing what you want to extract in natural language. No complex configuration required.

Build or Reuse Schemas

Create schemas from scratch or reuse existing ones shared by collaborators. Start with proven templates and customize to your needs.

Structured Field Definitions

Define your schema with fields, detailed descriptions, and constraints. Create the perfect structure for your data extraction needs.

Easy to implement.

Only a few lines of code.

from retab import Retab

client = Retab(api_key="YOUR_API_KEY")

with open("document.pdf", "rb") as f:
    result = client.processors.submit(
        processor_id="PROCESSOR_ID",
        document=f
    )

See for yourself

Experience our document AI platform in action. See how easy it is to extract data from any document.

Interactive demo • No signup required

Why developers choose Retab

We handle the annoying parts so you can focus on building great products.

Built-in preprocessing

We've thought about every edge case so you don't have to - we guarantee the best preprocessing, no matter the file.

Automatic dataset labeling

Multiple models label your docs automatically. You only review the few cells where models disagree in a simple table UI.

Vibe-update your schema

Edit descriptions in natural language. Changes propagate instantly with versioning—never break old integrations.

Interactive prompt testing

Make a tweak, hit "evaluate," instantly see if you improved accuracy against ground truth before deploying.

Auto-model routing

Continuous benchmarking picks the best model for each document based on your accuracy and latency goals.

One-click deployment

Many automation triggers: email forwarding, web UI, Outlook plugin, simple API. Data flows to Google Sheets, Excel, or webhooks instantly.

Built for every industry

Trusted by teams across industries to automate their most complex document workflows.

Healthcare

Automate medical document processing with precision. Extract data from patient forms, insurance claims, and clinical notes while maintaining HIPAA compliance.

Patient intake forms and medical histories
Insurance claims and prior authorizations
Lab reports and diagnostic imaging
Prescription and treatment plans

Get started for free

No credit card required. No commitment.

Free

Unlimited platform access. Perfect for trying out Retab

$0/mo

1000 credits / mo included

  • Schema Designer & Reusable Schemas
  • Drag‑and‑Drop Ground Truth Table
  • Source Highlights & Reasoning Traces
  • Prompt Iteration + Field‑Level Reasoning
  • Multi‑LLM Consensus
  • Advanced Automations: Email, Outlook, API, Webhooks
  • Team Management & Role-based Access Control
  • Community Support
  • Continuous Model Selection & Auto‑Routing

How credits work

ServiceDescriptionCredits
PreprocessingDocument optimization: orientation correction, table conversion, OCR enhancement1/page
Auto-large extractionPremium AI models for complex documents and maximum accuracy
GPT-4.1, Gemini 2.5 Pro — automatically selected for best performance
2/page
Auto-small extractionFast, cost-effective extraction for simpler documents
GPT-4.1 Mini, Gemini Flash — automatically selected for best performance
1/page

Enterprise-grade security

Industry-leading document processing without compromising trust.

Zero Data Retention

All documents and extracted data can be automatically purged based on your requirements.

Enterprise Level Reliability

Built for enterprise workloads with guaranteed uptime and robust infrastructure.

SOC2 Pending

Our security framework maintains the highest level of compliance with industry standards.

High Rate Limits

Flexible, high-capacity rate limits designed to handle peak enterprise demands.

GDPR Compliant

Full compliance with European data protection regulations for secure document processing.

1-on-1 Support & SLAs

Premium support with custom Service Level Agreements for your business needs.