Sonomos — Use AI Without Exposing Confidential Data

Short answer: Insurers can use ChatGPT, Claude, and Gemini in 2026 — but the regulatory ground has moved enough that the question is no longer "can we?" and increasingly "how do you prove you did it correctly?" The NAIC Model Bulletin on the Use of AI Systems by Insurers (adopted December 2023) has been adopted in some form by more than 25 states; Colorado Reg 10-1-1 imposes specific quantitative testing requirements on life insurance underwriting; NY DFS Insurance Circular Letter No. 7 (2024) sets governance and bias-testing expectations; HIPAA still controls health insurance workflows; GLBA still controls all NPI; and the EU AI Act classifies most insurance underwriting and claims handling as high-risk under Annex III. The fastest way to fail an examination in 2026 is to permit applicant or insured personal data into a consumer-tier AI tool without a documented governance framework and a technical control to keep the data out in the first place. This guide walks through what each authority actually requires and the controls that hold up under examination.

What the NAIC Model Bulletin actually requires

The NAIC Model Bulletin on the Use of AI Systems by Insurers — adopted in December 2023 and based on the AI Principles the NAIC published in 2020 — is not law. It is a model that state insurance departments adopt, modify, or reject. By April 2026, more than 25 states have adopted the bulletin substantively, including (in alphabetical order) Alaska, Connecticut, Illinois, Iowa, Kentucky, Maine, Maryland, Minnesota, Nevada, New Hampshire, Pennsylvania, Rhode Island, Vermont, Washington, and others. Several states (notably Colorado and New York) have layered their own additional requirements on top.

The bulletin organizes obligations around five themes derived from the NAIC's AI Principles:

Fair and ethical — outcomes that comply with applicable laws on unfair trade practices and unfair discrimination.
Accountable — responsible internal governance, with senior management ownership and board-level visibility for material AI uses.
Compliant — alignment with all applicable laws, including state insurance code, federal civil rights statutes, GLBA, HIPAA, and ADA.
Transparent — documentation, explanation, and the ability to respond to consumer inquiries about AI-influenced decisions.
Secure, safe, and robust — testing, monitoring, and an information-security posture appropriate to the risk.

In practice, the bulletin translates into seven concrete artifacts an examination will ask for:

A written AI Systems Program describing governance, scope, and the inventory of AI uses.
An AI inventory with each system, its purpose, the data inputs, the consumer-impact assessment, and the risk tier.
Vendor due diligence documentation for any third-party AI components, including LLM providers when used in workflows that affect insurance practices.
Pre-deployment testing — bias, accuracy, robustness — with documented results.
Ongoing monitoring of the AI system in production, including drift detection.
Consumer disclosure processes that explain AI involvement when required and provide a path to appeal.
Senior-management and board reporting on material AI matters at a defined cadence.

For an insurer using ChatGPT or Claude in any consumer-facing pipeline (underwriting, claims handling, marketing, customer service routing, fraud-flagging), every one of these artifacts has to exist and be defensible.

Colorado Reg 10-1-1: the deepest state-level requirements

Colorado's 3 CCR 702-10, particularly Reg 10-1-1 ("Governance and Risk Management Framework Requirements for Life Insurers' Use of External Consumer Data and Information Sources, Algorithms, and Predictive Models"), goes further than the NAIC bulletin in two ways.

First, it requires life insurers using "external consumer data and information sources" (ECDIS) — which the regulation defines broadly to include credit history, social media, geolocation, biometric, education, and similar non-traditional inputs — to maintain a written governance and risk-management framework specifically for that use, with board-level approval and annual attestation.

Second, it requires quantitative testing for unfair discrimination by race, color, national or ethnic origin, religion, sex, sexual orientation, disability, gender identity, gender expression. The testing methodology must be documented, the results must be reviewed by a designated officer, and the insurer must take "reasonable steps" to mitigate disparities the testing reveals.

A second Colorado regulation — Reg 10-1-2 — specifically addresses algorithmic and AI-system testing in life insurance, with quantitative thresholds and specific test designs. The thresholds are unusually concrete for a state insurance regulation; most other states stop at "test for bias" without specifying methodology.

For LLM use in Colorado, the practical implication is that any workflow where an LLM influences a life-insurance decision — even indirectly, e.g., as a research tool for an underwriter — needs to be in the governance framework, on the inventory, and (depending on materiality) in the quantitative testing program.

New York DFS Insurance Circular Letter No. 7 (2024)

New York's Department of Financial Services issued Insurance Circular Letter No. 7 in 2024 setting the DFS's expectations for AI in insurance underwriting and pricing. The circular layers on top of New York's existing anti-discrimination provisions in the Insurance Law and on DFS's pre-existing Insurance Reg 187 (best-interest standard for life insurance and annuity sales).

Key expectations from Circular Letter No. 7:

Justifiability — the use of AI must be tied to a permissible insurance practice; AI is not a category exception.
Disparate-impact testing — quantitative analysis required, with documentation; testing must extend to proxy variables that may correlate with protected classes.
Model risk management — written policies, model inventory, validation, ongoing monitoring.
Vendor accountability — the insurer is responsible for vendor models; "we use a third-party LLM" is not a defense.
Consumer transparency — disclosure of AI use to applicants on request, sufficient to permit a meaningful appeal.

The proxy-variable requirement is the one most commonly underestimated. An LLM that doesn't directly receive race or zip-code data can still produce outputs whose distribution correlates with race because the input text contains proxies (occupation, education, vocabulary). DFS expects testing to reach those proxies.

The federal layer: HIPAA, GLBA, ADA

State insurance regulation is the foreground; federal law provides the floor.

HIPAA governs health insurers as covered entities and business associates. Sending PHI to a consumer-tier ChatGPT is an unpermitted disclosure regardless of state insurance law. The insurer needs a Microsoft / OpenAI / Anthropic / Google BAA on the appropriate enterprise tier and a technical control to keep PHI out of unsanctioned tools.
GLBA applies to all insurers (federal McCarran-Ferguson Act notwithstanding, GLBA reaches insurers via state implementation). The Safeguards Rule, the Privacy Rule, and the Section 502 disclosure restrictions all attach to nonpublic personal information (NPI). Sending applicant NPI to a vendor without a written contract restricting use is a classic 502(b)(2) breach.
ADA Title III prohibits discrimination based on disability in places of public accommodation. The DOJ has signaled in recent guidance that AI-driven decision systems used by insurers (and other public-accommodation entities) may be analyzed for disability discrimination, including through proxy variables.
FCRA applies when AI inputs include consumer reports — credit, employment screening, investigative consumer reports. The "adverse action" disclosure requirements attach to AI-influenced adverse decisions.

EU AI Act: insurance is high-risk under Annex III

For European insurers — and for US insurers offering insurance to EU residents — the EU AI Act's Annex III classifies as high-risk any AI system used for "risk assessment and pricing in relation to natural persons in the case of life and health insurance." Most underwriting and pricing AI in life and health insurance falls into this classification.

High-risk obligations under the AI Act (Articles 9-15) include:

Risk-management system across the AI lifecycle.
Data governance — relevance, representativeness, error checking, bias examination.
Technical documentation that lets a regulator reconstruct the system's design.
Logging — automatic recording of operating events.
Transparency to deployers and end-users.
Human oversight — designed-in, not bolted on.
Robustness, accuracy, and cybersecurity commensurate with the use.

The Act also imposes general-purpose AI obligations on the underlying LLM provider (OpenAI, Anthropic, Google), but the deployer — the insurer — bears the high-risk obligations for the application of the model.

What goes wrong in real insurance workflows

Across the deployments we've seen and the recent enforcement record, four patterns recur:

Consumer ChatGPT in an underwriter's workflow. An underwriter pastes an application's free-text fields into ChatGPT to "summarize and flag concerns." The applicant's NPI now sits in OpenAI's logs without a GLBA-eligible contract, often with training enabled by default. This is a 502(b)(2) breach and — depending on the state — a violation of the AI bulletin's secure-and-robust principle.
Free-tier Claude in claims triage. A claims handler asks Claude to summarize a long set of medical records. PHI flows through Anthropic's consumer infrastructure; the insurer has no BAA. HIPAA breach plus, in most states, a violation of the AI bulletin's compliant principle.
AI-generated denial letters that don't pass disparate-impact testing. An insurer trains internal staff to use ChatGPT to draft denial-of-coverage letters. The drafts inherit subtle word choices that correlate with applicant demographics, producing measurable disparate impact downstream. NY DFS Circular 7 and Colorado 10-1-1 quantitative testing would have caught this; nobody ran the test.
Vendor LLMs subprocessed inside an EdTech-style platform. A claims-management vendor adds a "smart summary" feature powered by OpenAI's API. The insurer has a contract with the vendor; the vendor has a contract with OpenAI. The chain of GLBA / HIPAA pass-through is incomplete — and the insurer is the regulated entity.

Control layers that hold up under examination

A defensible 2026 insurance AI deployment is a stack:

Governance layer (NAIC bulletin requirement)

AI Systems Program document signed at the senior level, with board reporting.
AI inventory that lists every LLM-influenced workflow and assigns a risk tier.
Vendor due diligence files for OpenAI, Anthropic, Google, Microsoft, and any subprocessing platform — including a copy of each provider's security attestations (SOC 2, ISO 27001), DPAs, BAAs, and (where available) enterprise terms.
Pre-deployment and ongoing testing programs, with quantitative bias and disparate-impact analysis where state law requires it.
Consumer-disclosure processes with a tested appeal path.

Contract layer

Enterprise tier for any LLM that touches NPI, PHI, or applicant data — ChatGPT Enterprise, Claude for Work, Gemini for Workspace, or Microsoft 365 Copilot under the appropriate plan.
DPA / BAA signed and retained.
Sub-processor lists monitored.
EU data residency elected for European-applicant data.
Right-to-audit provisions or third-party audit-report rights.

Tier-and-settings layer

No personal-account use by underwriters, claims handlers, or customer-service staff. This is policy, training, and an enforcement mechanism — not just a memo.
Training disabled on every sanctioned tier.
Retention configured as low as the workload allows. Zero Data Retention for high-sensitivity API workloads.
Memory features off unless individually evaluated.

Data-minimization layer (where the breach prevention happens)

This is where Locke sits. Pasting an applicant's free-text health-history disclosure into a sanctioned ChatGPT Enterprise account is contractually permitted but operationally undesirable: the more PHI / NPI that crosses the wire, the more data is in scope for breach analysis if a future incident occurs. Local-first redaction in the browser detects names, account numbers, dates of birth, medical record numbers, ICD-10 codes, NPI numbers, drug names, diagnosis terms, and similar entities and replaces them with reversible tokens before the prompt leaves the device.

For insurance specifically, this layer is the difference between "we have a BAA" and "the BAA never had to be tested" — and on examination, regulators consistently prefer the latter.

Testing layer (Colorado, NY, EU AI Act requirement)

Pre-deployment bias testing with a documented methodology, sample size, and protected-class breakdown.
Proxy-variable analysis — explicitly required by NY DFS Circular 7 — that examines whether the LLM's outputs correlate with protected classes via non-explicit inputs.
Production drift monitoring with thresholds and an incident-response process when thresholds are crossed.
Annual board attestation as Colorado requires for life insurers.

Frequently asked questions

Does the NAIC Model Bulletin apply to claims handling, not just underwriting?

Yes. The bulletin scope is "use of AI systems by insurers," which the NAIC explicitly extends to "regulated insurance practices" — underwriting, rating, pricing, marketing, claims handling, fraud detection, and customer service. An LLM used to summarize claim documents or to draft adjuster correspondence falls within the bulletin if a state has adopted it. The risk tier may be lower than for underwriting, but the inventory and governance obligations still attach.

Can we use ChatGPT for general-purpose tasks if no insurance data is in the prompt?

Yes. The NAIC bulletin and state regulations apply to AI systems used in connection with regulated insurance practices. Using ChatGPT to help an HR team draft a job posting or to summarize a vendor's product brief does not implicate the insurance regulatory regime. The line moves the moment a prompt touches applicant or insured data, claims content, or rating factors.

What does Colorado Reg 10-1-1 mean for a national life insurer that does not write business in Colorado?

It applies to life insurers writing in Colorado. A national life insurer that does not write Colorado business does not need to comply with Reg 10-1-1 specifically. But Colorado's framework has been influential — Connecticut, Maryland, and others have proposed or adopted similar quantitative testing requirements, and the NAIC's own work (the Special Committee on Race in Insurance, the Big Data and Artificial Intelligence Working Group) has consistently moved toward Colorado-style methodology. National insurers that meet Colorado's bar tend to find themselves ahead of the next state to adopt.

How do we satisfy disparate-impact testing for an LLM used in underwriting?

The methodology is specific to the use case, but the broad shape: identify the decision the LLM influences, identify the protected classes relevant under federal civil rights statutes (race, color, national origin, religion, sex, sexual orientation, gender identity, disability, age, marital status, genetic information) and any state-specific protected classes, sample inputs that vary the protected-class proxy variables, run the LLM on each sample, and analyze outputs for distributional disparities. NY DFS Circular 7 specifically requires examination of proxy variables that may correlate with protected classes — name, geography, occupation, vocabulary — even when the protected class itself is not in the input. Document the methodology, the sample, and the results; refresh annually or on material change.

Is an AI-generated denial letter a "decision" or just a draft?

Both NAIC's bulletin and NY DFS Circular 7 treat the question functionally rather than formally: if the letter materially affects an insurance decision and is sent to the consumer, the AI use is in scope, regardless of whether a human "approved" the draft. "Human in the loop" is required, but it has to be meaningful review with the authority and the time to override the AI. Pro-forma sign-off does not move the AI use out of scope.

Do we need a separate program for LLMs versus traditional ML?

Most insurers find that one AI Systems Program covers both, with risk-tier-specific procedures. The NAIC bulletin is technology-neutral; what matters is the function and the consumer impact, not whether the model is a gradient-boosted decision tree or a large language model. That said, LLMs introduce specific risks — prompt injection, hallucinated identifiers, drift driven by upstream model updates — that the program should call out explicitly.

The bottom line

The fastest way for an insurer to get into trouble in 2026 is to assume that AI is a productivity tool the law hasn't caught up with. It has — through the NAIC Model Bulletin, through state-specific regulations like Colorado 10-1-1 and NY DFS Circular 7, through HIPAA and GLBA, and through the EU AI Act's high-risk classification of insurance underwriting.

The insurers that will navigate this best aren't the ones who buy the most sophisticated AI tools. They're the ones who treat AI like every other regulated process: governed, inventoried, tested, contracted, and — crucially — kept away from applicant and insured data unless the data has been pseudonymized first. Put a technical control between the underwriter's keyboard and the prompt, and most of the regulatory burden becomes a paperwork exercise instead of a breach analysis.

AI in Insurance 2026: NAIC Model Bulletin, Colorado Reg 10-1-1, and ChatGPT for Underwriting