Introducing the Retab CLI
May 14, 2026 • Sacha Ichbiah (Founder)
TL;DR — Document automation has lived in notebooks and dashboards for too long. Today we're shipping retab, a single-binary CLI that brings parsing, structured extraction, and workflow execution into your terminal — and into your repo, where the rest of your engineering work already lives.
Reliable document automation is engineering work. The data shape matters, the prompts matter, the schemas matter, the regression suite matters. But the tooling has lagged: parsing happens in a Jupyter notebook, extraction in a half-written Python script, classification behind a custom prompt, and orchestration in yet another framework. The Retab dashboard fixes some of that, but iterating on schemas and workflows through a GUI doesn't survive contact with serious engineering — there's no version control, no diff, no PR review, no CI gate.
The CLI changes the surface area. Same primitives, same API — but expressible as code, runnable from your terminal, committable to git.
Parse and extract local files in one line
The two most-used operations in Retab — turning a document into LLM-ready markdown, and turning a document into typed JSON — are now one command each.
$ retab parses create --file invoice.pdf
$ retab extractions create --file invoice.pdf --json-schema-file schema.json
Point them at a local file, a URL, or a file already uploaded to your Retab workspace. Output streams to stdout as structured JSON, ready to inspect with jq or feed into the next step of your build.
Tighten the iteration loop on schemas and workflows
Tight feedback loops are what separate prototypes from production systems. retab schemas generate infers a JSON Schema from a handful of sample documents — useful when you don't have one written by hand yet. retab workflows tests execute --workflow-id <wf> re-runs a workflow's regression suite against pinned fixtures and reports drift. Both run in your terminal, both emit structured JSON you can diff, version, and commit — meaning the schemas and tests behind your document workflows become first-class artefacts in your repo, not screenshots passed around on Slack.
Compose with Unix pipes and CI
Every CLI command emits structured output, so you can wire Retab into anything a Unix pipeline can reach. Here's a one-liner that runs structured extraction over every PDF in your workspace and exports the result as CSV:
$ retab files list --limit 100 --mime-type application/pdf | \
jq -r '.data[].id' | \
xargs -I {} retab extractions create \
--file-id {} \
--json-schema-file schema.json | \
jq -r '[.invoice_number, .total, .vendor] | @csv'
The same primitives let you ship document workflows the way you ship API code: run retab workflows tests execute on every pull request, fail the build if extraction regresses, gate deploys on accuracy. Document automation stops being a notebook artefact and starts being software.
Getting started
The Retab CLI runs on macOS, Linux, and Windows. Install with:
curl -fsSL https://retab.com/install.sh | sh
Then retab auth login to connect to your workspace, and retab on its own for the full command tree.
We're just getting started — there's a lot more coming. Feedback and feature requests are welcome on GitHub.