Leaderboard
Our frontier models extraction benchmark
Extract API
| Rank | Model | Platform | Accuracy |
|---|---|---|---|
🥇1 | retab-large | Retab | 97.2% |
🥈2 | Extend | Extend | 91.4% |
🥉3 | Landing | Landing | 89.2% |
4 | LlamaParse | LlamaIndex | 87.8% |
5 | retab-small | Retab | 79.3% |
6 | Reducto | Reducto | 63.5% |
7 | retab-micro | Retab | 58.2% |
8 | Mistral Large 3 | Mistral | 52.7% |
9 | Mistral Medium 3.2 | Mistral | 41.8% |
Methodology
Our benchmark is built on a curated dataset of proprietary documents provided by partner customers. These documents span a wide range of industries and use cases, including invoices, contracts, financial reports, and technical documentation.
Each document has been manually annotated by domain experts to establish ground truth for structured data extraction. We evaluate models by running them with the same prompts through identical extraction tasks and comparing their outputs against these human-verified annotations.
Private Benchmark
The dataset is not publicly available to prevent model overfitting and ensure fair evaluation of real-world performance.
Real-World Data
Documents reflect actual production scenarios, providing meaningful accuracy metrics for enterprise extraction tasks.
Run evaluations on frontier AI capabilities
If you'd like to add your model to this leaderboard or a future version, please contact support@retab.com.
Evaluate your Model
