Retab
From Filings to Facts in Seconds—Automating SEC 8-K Monitoring with Retab
Author avatar
Victor Soto

TL;DR

We wired up a lightweight Python workflow that monitors the SEC’s live feed for new 8-K filings, fetches the full Inline XBRL, strips out noise while keeping every field, and runs it through a Retab extractor.

In seconds, we get structured, audit-ready JSON with key facts like company name, CIK, state of incorporation, securities registered, and filing provisions.

Perfect for:

  • Financial monitoring — track corporate events as they hit EDGAR

  • Compliance — auto-route filings with specific triggers (e.g., M&A, executive change)

  • Research — feed structured data directly into your models or dashboards


Introduction

8-K filings are the SEC’s way of telling the market, “Something material just happened.”

They can cover:

  • Mergers & acquisitions

  • Executive appointments & departures

  • Material agreements

  • Earnings releases

  • Legal proceedings

If you’re monitoring these in real time, every minute between “filed” and “parsed” counts.

Manually downloading filings or scraping HTML tables slows you down.

We saw clients implement zero-click pipelines — new 8-K hits the SEC, JSON lands in your system.


The workflow

  1. Listen to the SEC RSS feed
    We use `feedparser` to pull the latest 8-K entries from EDGAR’s XBRL feed.
    Each entry contains metadata + a link to the filing index.

  2. Resolve the actual instance document
    The filing’s “MetaLinks.json” tells us where the Inline XBRL lives.
    We pick the primary `.htm` document — no guessing, no brittle scraping.

  3. Parse Inline XBRL without losing data
    Inline XBRL (ix:* tags) contains structured facts, but naïve parsers often throw them away.

  4. Run through Retab’s structured extraction
    Our SEC schema defines keys like:

    {
      "form_type": "8-K",
      "report_date": "...",
      "company_name": "...",
      "state_of_incorporation": "...",
      "securities_registered": [...]
    }

    Retab validates against the schema, fills fields, and gives field-level confidence scores.


Example output

Here’s Southwest Gas Holdings, Inc.’s 8-K filed Aug 6, 2025:

{
  "form_type": "8-K",
  "report_date": "2025-08-06",
  "company_name": "SOUTHWEST GAS HOLDINGS, INC.",
  "state_of_incorporation": "Delaware",
  "commission_file_number": "001-37976",
  "irs_employer_id_number": "81-3881866",
  "principal_office_address": "8360 S. Durango Drive Post Office Box 98510 Las Vegas, Nevada",
  "principal_office_zip": "89193-8510",
  "registrant_phone": "(702) 876-7237",
  "filing_provisions": {
    "written_communications_rule_425": false,
    "soliciting_material_rule_14a_12": false,
    "pre_commencement_communications_rule_14d_2b": false,
    "pre_commencement_communications_rule_13e_4c": false
  },
  "emerging_growth_company": false,
  "egc_transition_period_election": false,
  "securities_registered": [
    {
      "title_of_class": "Southwest Gas Holdings, Inc. Common Stock, $1 Par Value",
      "trading_symbol": "SWX",
      "exchange_name": "New York Stock Exchange"
    }
  ]
}

Why Retab works well here

This isn’t just regex scraping. With Retab you get:

  • Schema validation — missing fields or wrong types fail fast

  • Confidence scores — auto-route low-confidence filings for review

  • k-LLM consensus — boost accuracy on messy filings by combining multiple model outputs

  • Reasoning fields — capture the model’s thought process for auditability


Code snippet

You can find a python demo file here.

In essence:

entry = get_latest_8k_entry()
inst_url = resolve_instance_url(entry)
raw_html = download_raw_instance(inst_url)
narrative = parse_ixbrl(raw_html)

client = Retab()

completion = client.deployments.extract(
    project_id="proj_XXXX",
    iteration_id="base-configuration",
    document="narrative.txt"
)

Closing thoughts

Automated SEC monitoring isn’t just about speed — it’s about trust.

With field-level likelihoods, reasoning traces, and consensus voting, you can go from “pretty sure” to “production-grade” extraction.

If you want to try the pipeline or adapt it to your use-case, check our Platform, Documentation and the related Notebook on GitHub.

Don't hesitate to reach out on X or Discord if you have any questions or feedback!

retab.com