Retab

Leaderboard

Our frontier models extraction benchmark

Extract API

RankModelPlatformAccuracy
🥇1
retab-large
Retab
97.2%
🥈2
ExtendExtend
ExtendExtend
91.4%
🥉3
Landing
Landing
89.2%
4
LlamaParseLlamaParse
LlamaIndexLlamaIndex
87.8%
5
retab-small
Retab
79.3%
6
Reducto
Reducto
63.5%
7
retab-micro
Retab
58.2%
8
Mistral Large 3
Mistral
52.7%
9
Mistral Medium 3.2
Mistral
41.8%

Methodology

Our benchmark is built on a curated dataset of proprietary documents provided by partner customers. These documents span a wide range of industries and use cases, including invoices, contracts, financial reports, and technical documentation.

Each document has been manually annotated by domain experts to establish ground truth for structured data extraction. We evaluate models by running them with the same prompts through identical extraction tasks and comparing their outputs against these human-verified annotations.

Private Benchmark

The dataset is not publicly available to prevent model overfitting and ensure fair evaluation of real-world performance.

Real-World Data

Documents reflect actual production scenarios, providing meaningful accuracy metrics for enterprise extraction tasks.

Run evaluations on frontier AI capabilities

If you'd like to add your model to this leaderboard or a future version, please contact support@retab.com.

Evaluate your Model