TL;DR
We wired up a lightweight Python workflow that monitors the SEC’s live feed for new 8-K filings, fetches the full Inline XBRL, strips out noise while keeping every field, and runs it through a Retab extractor.
In seconds, we get structured, audit-ready JSON with key facts like company name, CIK, state of incorporation, securities registered, and filing provisions.
Perfect for:
Financial monitoring — track corporate events as they hit EDGAR
Compliance — auto-route filings with specific triggers (e.g., M&A, executive change)
Research — feed structured data directly into your models or dashboards
Introduction
8-K filings are the SEC’s way of telling the market, “Something material just happened.”
They can cover:
Mergers & acquisitions
Executive appointments & departures
Material agreements
Earnings releases
Legal proceedings
If you’re monitoring these in real time, every minute between “filed” and “parsed” counts.
Manually downloading filings or scraping HTML tables slows you down.
We saw clients implement zero-click pipelines — new 8-K hits the SEC, JSON lands in your system.
The workflow
Listen to the SEC RSS feed
We use `feedparser`
to pull the latest 8-K entries from EDGAR’s XBRL feed.
Each entry contains metadata + a link to the filing index.Resolve the actual instance document
The filing’s “MetaLinks.json” tells us where the Inline XBRL lives.
We pick the primary `.htm`
document — no guessing, no brittle scraping.Parse Inline XBRL without losing data
Inline XBRL (ix:*
tags) contains structured facts, but naïve parsers often throw them away.Run through Retab’s structured extraction
Our SEC schema defines keys like:{ "form_type": "8-K", "report_date": "...", "company_name": "...", "state_of_incorporation": "...", "securities_registered": [...] }
Retab validates against the schema, fills fields, and gives field-level confidence scores.
Example output
Here’s Southwest Gas Holdings, Inc.’s 8-K filed Aug 6, 2025:
{
"form_type": "8-K",
"report_date": "2025-08-06",
"company_name": "SOUTHWEST GAS HOLDINGS, INC.",
"state_of_incorporation": "Delaware",
"commission_file_number": "001-37976",
"irs_employer_id_number": "81-3881866",
"principal_office_address": "8360 S. Durango Drive Post Office Box 98510 Las Vegas, Nevada",
"principal_office_zip": "89193-8510",
"registrant_phone": "(702) 876-7237",
"filing_provisions": {
"written_communications_rule_425": false,
"soliciting_material_rule_14a_12": false,
"pre_commencement_communications_rule_14d_2b": false,
"pre_commencement_communications_rule_13e_4c": false
},
"emerging_growth_company": false,
"egc_transition_period_election": false,
"securities_registered": [
{
"title_of_class": "Southwest Gas Holdings, Inc. Common Stock, $1 Par Value",
"trading_symbol": "SWX",
"exchange_name": "New York Stock Exchange"
}
]
}
Why Retab works well here
This isn’t just regex scraping. With Retab you get:
Schema validation — missing fields or wrong types fail fast
Confidence scores — auto-route low-confidence filings for review
k-LLM consensus — boost accuracy on messy filings by combining multiple model outputs
Reasoning fields — capture the model’s thought process for auditability
Code snippet
You can find a python demo file here.
In essence:
entry = get_latest_8k_entry()
inst_url = resolve_instance_url(entry)
raw_html = download_raw_instance(inst_url)
narrative = parse_ixbrl(raw_html)
client = Retab()
completion = client.deployments.extract(
project_id="proj_XXXX",
iteration_id="base-configuration",
document="narrative.txt"
)
Closing thoughts
Automated SEC monitoring isn’t just about speed — it’s about trust.
With field-level likelihoods, reasoning traces, and consensus voting, you can go from “pretty sure” to “production-grade” extraction.
If you want to try the pipeline or adapt it to your use-case, check our Platform, Documentation and the related Notebook on GitHub.
Don't hesitate to reach out on X or Discord if you have any questions or feedback!