TL;DR
Everyone dreams of hitting 100% accuracy when processing documents with AI.
Retab’s confidence scores help you quantify trust, iterate fast to improve your extraction prompt & schema with reasoning fields, and reach +98% accuracy in minutes.
How? Set n_consensus=5 and spot low-confidence fields, drop a single json_schema_extra={"X-ReasoningPrompt": "…"} into any Pydantic model, and watch shaky fields jump from 0.6 → 1.0 confidence.
Works with any best-in-class model; no extra infra.
Why do LLM pipelines still miss fields in 2025?
The answer is simple: hallucinations.
The models’ imprecisions can take various forms:
An incorrect or even imaginary value is returned.
A field is omitted.
An output requires inference that is not done natively (“Convert °C to °F”).
etc.
LLM hallucinations force teams to double-check the outputs, which often ends up being a major time sink.
Human-in-the-loop vs. out-of-the-loop has long been debated — but at Retab, as we move toward fully automated workflows, we've aimed to gradually remove the human from the loop by introducing probability-weighted answers, combined with “Reasoning” powered by chain-of-thought prompting (for compatible models) to boost output accuracy.
How Retab’s Likelihood Engine works
Here is the simple approach:
Use n-consensus
Retab queries the model n times and ensembles the outputs, dampening randomness and returning a full probability distribution (see our blog post on k-LLMs).
Inspect field‑level likelihood scores
Every key comes back with a 0‑1 score—perfect for thresholds like if score < 0.9.
Add a Reasoning trace (optional)
Add X‑ReasoningPrompt to any field to capture the model’s step‑by‑step logic, stored separately from the payload.
This takes advantages of chain-of-thought (see OpenAI’s article on Structured Outputs) to drastically increase the model’s accuracy.
Hands-on: Upgrade your schema in 2 lines
Example 1
from pydantic import BaseModel, Field
from retab import Retab
class Invoice(BaseModel):
date: str
invoice_number: str
total: str
status: str = Field(
...,
description="Invoice Status: Blank, Paid or Unpaid",
json_schema_extra={
"X-ReasoningPrompt": (
"If Status is blank, state that explicitly; otherwise return Paid or Unpaid."
)
}
)
client = Retab()
resp = client.documents.extract(
documents=["invoice.jpg"],
json_schema=Invoice.model_json_schema(),
model="gpt-4o-mini",
n_consensus=5,
)
print(resp.likelihoods["status"]) # 1.0!!
print(resp.reasoning["status"]) # "Field absent → mark Blank"Result: Status jumps from 0.6 → 1.0 likelihood without adding brittle regexes.
Example 2 – Unit conversion on the fly
Same trick, different field:
class TemperatureReport(BaseModel):
location: str
temperature: float = Field(
...,
description="Temperature in Fahrenheit",
json_schema_extra={
"X-ReasoningPrompt": (
"If the reported unit is Celsius, convert to Fahrenheit."
)
}
)
The raw file said 22 °C; Retab returned 72.5 °F with 1.0 likelihood.
Should you enable Reasoning?
Enable it when…
You need human auditability (finance, healthcare).
The field is a direct text copy (“Invoice #”).
Downstream rules depend on high precision.
Skip it when…
Latency is a hard SLA (< 300 ms).
You’re training reviewers/operators.
You already surface bounding-box citations only.
Implementation checklist (How-To)
Schema first—declare every key explicitly.
Start with n_consensus=5; tune down after monitoring.
Set thresholds (score < 0.9) to auto-route low-confidence docs.
Store reasoning in a separate column for analytics.
Surface it—overlay bounding boxes + reasoning bubbles on Retab’s platform.
FAQ
How many models are supported?
Retab offers compatibility with you preferred model providers, OpenAI, Anthropic, Grok, DeepSeek, etc.
Is n‑consensus expensive?
Each extra pass costs one model call; most teams settle on 3–5.
Conclusion
Trust is the new accuracy: Likelihood scores quantify it, while Reasoning fields justify it.
One-liner upgrade: a tiny json_schema_extra unlocks explainability without extra infrastructure.
Teams win across the stack, as Devs debug faster, operators review less, and end-users see why.
Don't hesitate to reach out on X or Discord if you have any questions or feedback!
