Home Agentium Hub Clear 360 Audit 360 Use Cases Evidence Lab Contact Us
Insights

Where governed AI changes the economics of insurance operations

Three use cases. Measurable outcomes. Every claim backed by evidence.

UC1

FNOL Web Forms

Structured extraction from first-notice packets — policy identifiers, claimant data, injury details — without an LLM call. 95%+ field accuracy on digital documents.

2,300submissions/mo
95%+non-LLM accuracy
$0.00015per document
UC2

Police Report Extraction

VIN, plate, DOB, DL number, citations — extracted deterministically from 50-state form variations. Scanned documents handled via T1+T2.5 stack with DPI-resilient table recognition.

88reports/day
80–88%Cat 1–2 accuracy
0external API calls
UC3

IA Report Intelligence

Coverage values, settlement amounts, and reserve flags extracted with full provenance. Subrogation analysis — the one genuinely inferential field — handled by a scoped local inference call, never an external API.

55reports/day
1 fieldrequires LLM
T4carrier-premise only
Evidence Lab

Benchmarks that can be replicated

Every finding links to runnable code. Results are reproduced deterministically — not from a single run.

Case Study #001 Published

The Extraction Intelligence Benchmark

Zero-LLM vs. Grounded LLM vs. Single-Prompt on a WC FNOL

Three extraction architectures tested on a 7-page NY WC FNOL with a dual-employer layout trap. The architecture with character-interval grounding hallucinated the claimant's name and date of injury. The zero-LLM approach extracted 19 of 19 deterministic fields correctly with full document provenance.

19/19Cat 1–2 correct, zero LLM
3Hallucinations, "grounded" approach
$0API cost per document (Approach C)
Read the case study ↓
Case Study #001 · WC FNOL Document Extraction · May 2026

The Extraction Intelligence Benchmark

A controlled comparison of three extraction architectures on a single high-complexity WC FNOL reveals that the approach marketed for its audit trail hallucinated the claimant's name, last name, and date of injury. The zero-LLM approach extracted every deterministic field correctly and is the only architecture defensible under regulatory examination.

19/19
Deterministic fields correct
Approach C, zero API calls
3
Grounded hallucinations
Approach B, 101 intervals cited
86%
Of fields are non-LLM addressable
Only Cat 4 inference requires LLM
10×
Latency penalty, grounded LLM
42.7s vs. 4.3s (single-prompt)
Exhibit 1 — Three architectures under test
Metric Approach A
Groq · llama-3.3-70b · T5
Approach B
LangExtract · Gemini · T5
Approach C
Docling + Regex · T0+T2
External API calls 1 7 0
Latency 4.3s 42.7s 8.2s
Fields extracted (of 36) 36 36 29 Cat 1–3 only
Cat 1–2 deterministic fields 19 / 19 15 / 19 19 / 19
Hallucinations 0 3 ▲ 0
Document-provenance grounded 0 values 101 intervals † 29 / 29
Deterministic (zero variance) No No Yes
Cost per document ~$0.04–0.06 ~$0.28–0.42 ~$0.00015
PHI-safe for data posture B/C No No Yes

▲ Approach B grounding intervals point to real text from wrong entities — see Finding 1.  † 29/36 fields for Approach C = Cat 1–3; the 7 missing Cat 4 inference fields were left empty, not hallucinated.

Finding 1

Grounding intervals do not guarantee correct field attribution

LangExtract returned a character-interval citation for every extracted value — the feature distinguishing its audit architecture from a standard LLM call. Three of those intervals pointed to real text at the cited position. The text belonged to the wrong entity or the wrong date context.

A grounding interval proves a string exists in the document. It does not prove the string is the correct value for the correct field. Under NYDFS Part 216 examination, this distinction is the difference between passing and failing provenance review.

Exhibit 2 — Three grounded hallucinations (Approach B)
Field Extracted Cited source text Actual source
first_name "Franklin" "Franklin Logistics Inc…" Third-party shipper — not the claimant
last_name "Mr." "Dear Mr. Johnson," Broker salutation — not a surname
date_of_injury "March 4" "filed March 4 by prior counsel" Attorney filing date — injury was March 18
Exhibit 3 — Category 1 & 2 field results: 19 deterministic fields
Field Ground truth A: Groq B: LangExtract C: Docling+Regex
policy_numberAP-2026-WC-9214
claim_numberWCH250721001
employer_fein47-2381094
naics_code561320
date_of_injury2026-03-18✗ "March 4"
date_reported2026-03-28
date_of_birth1988-07-15
ssn_last44721
hourly_rate$28.50
avg_weekly_wage$1,140.00
reporting_delay_days10
attorney_contact_date2026-03-21
first_nameTerrence✗ "Franklin"
last_nameJackson✗ "Mr."
employer_nameApex Staffing Solutions
body_part_primaryLower back / lumbar
injury_mechanismLifting / exertion
occupation_classWarehouse / labor
state_of_injuryNY
Score — Category 1 & 2 19 / 19 15 / 19 19 / 19

Shaded rows = fields where Approach B returned a grounded hallucination. Approach C Cat 4 fields (claim type, RTW status, attorney flag, same body part, delay flag) were left empty by design — not hallucinated.

Finding 2

Section-aware extraction makes the error category structurally unreachable

The document contains three business entities — Apex Staffing, Excel Manufacturing, and Franklin Logistics — before the Employee Information section that contains the claimant's name. Approach B scanned the full document; Approach C partitioned it.

Each regex pattern runs only against its assigned section pool. The first_name pattern sees only text under the Employee Information header. "Franklin" exists only in the Employer Information pool. The two pools never intersect. The attribution error is not a probability to manage — it is a structural impossibility.

Section partitioning (Python)
SECTION_MAP = {
  "employer": re.compile(
      r"employer\s+information", re.I),
  "employee": re.compile(
      r"(?:injured\s+)?employee\s+information", re.I),
  "injury":   re.compile(
      r"(?:injury|incident)\s+(?:information|details)", re.I),
}

# first_name runs in "employee" pool only.
# "Franklin" exists in "employer" pool only.
# No overlap. Attribution error is impossible.
Finding 3 — Audit defensibility

Only one architecture passes regulatory examination

Architecture Answer to "where did this value come from?" Examination result
A — Groq "The language model extracted policy number AP-2026-WC-9214 from the document. Confidence: high." No provenance
B — LangExtract "The value 'Franklin' was extracted from characters 412–419, which reads 'Franklin'." Misleading — wrong entity
C — Docling+Regex "First name matched by First\s+Name[.:\s]+([A-Z][a-z]+)\b in the Employee Information section. Deterministic. Reproducible on every run." Passes examination
Implications
1
Field classification precedes tool selection. The category of a field — deterministic, categorical, verbatim, or inferential — determines the correct extraction tier. Applying LLM to Cat 1 deterministic fields introduces hallucination risk on the most auditable class of data in a claim file.
2
Grounding intervals are necessary but not sufficient for audit defensibility. A character offset that cites real source text does not prove correct entity attribution on complex, multi-entity documents. Section-aware deterministic extraction provides a stronger correctness guarantee for Cat 1–2 fields.
3
The LLM-required zone is 11–14% of this document's field set. Five of 36 fields require genuine inference: claim type, RTW status, attorney flag, same body part comparison, and delay threshold interpretation. The optimal architecture is not LLM vs. non-LLM — it is governing which fields go to which tier.
4
PHI data posture is a tier-selection constraint, not a post-design concern. For carriers with posture B (carrier-cloud) or C (air-gapped), external APIs are not available for PHI documents regardless of accuracy results. T0+T2 for Cat 1–3 combined with T4 local inference (Ollama, carrier VPC) for Cat 4 is the only architecture that satisfies these constraints end-to-end.

This benchmark is anchored to an active enterprise POC covering 79,908 annual documents across three use cases. Phase 1 recommendation: run Approach C on 50 real labeled documents before any GPU or LLM infrastructure investment.

Discuss the extraction architecture →