Sonomos — Use AI Without Exposing Confidential Data

When you paste a contract, a customer email, or a patient note into ChatGPT, Claude, or Gemini, that text leaves your device. Once it leaves, you no longer control where it is logged, who can review it, or whether it ends up shaping a future model. For most professionals, that is the single biggest privacy risk introduced by generative AI — and the one most often overlooked.

This 2026 guide explains, in plain English, how to keep sensitive data out of large language models (LLMs) without giving up the productivity gains AI provides. It covers the policies that actually matter, the settings to change today, and the workflow patterns that prevent leaks before they happen.

What counts as "sensitive data" in an AI context?

Sensitive data is any information that could harm a person, a customer, or your organization if it were exposed. In the context of AI prompts, the most commonly leaked categories are:

Personally identifiable information (PII): names, addresses, phone numbers, government IDs, dates of birth.
Protected health information (PHI): anything covered by HIPAA in the United States or equivalent regimes elsewhere.
Financial data: account numbers, card numbers, balances, tax filings.
Trade secrets and source code: proprietary algorithms, internal architecture, unreleased product plans.
Privileged communications: legal advice, M&A documents, HR investigations.
Authentication material: passwords, API keys, session tokens, recovery phrases.

If a prompt contains any of the above, treat it as a regulated workload — even if the AI tool is "free."

How LLM providers actually use your prompts in 2026

The answer depends on which product you use and which tier you pay for. As of April 2026, the publicly documented defaults look like this:

OpenAI ChatGPT: Free and Plus prompts may be used for model training unless you opt out in Data Controls. ChatGPT Team, Enterprise, and API traffic are excluded from training by default.
Anthropic Claude: Claude.ai and the API exclude customer prompts from training by default; safety teams may still review flagged conversations.
Google Gemini: Free-tier conversations can be reviewed by humans and used to improve the product. Workspace and Vertex AI traffic are governed by enterprise terms.
Microsoft Copilot: Commercial Data Protection prevents prompts from being used for training when the user is signed in with an eligible account.

These policies change. The safer assumption is that anything you send may be retained, may be reviewed by a human, and may be subpoenaed.

Step 1: Change the settings that exist today

Before installing anything new, harden the tools you already use. The settings below take less than five minutes total.

ChatGPT

Open Settings → Data Controls.
Turn off Improve the model for everyone.
Set Chat history to off if you do not need it; turning history off also disables training.
For Team and Enterprise plans, confirm with your admin that Workspace data sharing is disabled.

Claude

Claude.ai does not train on your conversations by default — no toggle is required.
Review Settings → Privacy to confirm Help improve Claude is off if it appears for your region.
For Claude for Work, ask your administrator to confirm the SOC 2 Type II report and data retention window.

Gemini

Open Activity → Gemini Apps Activity.
Set retention to Off or 3 months (the shortest available).
Delete past activity if it contains sensitive prompts.
For Workspace, verify your admin has enabled the Gemini for Workspace data protection terms.

These changes reduce — but do not eliminate — the surface area. They cannot help if the prompt itself contains regulated data.

Step 2: Redact sensitive content before it leaves your device

The most reliable defense is to never send the sensitive bits in the first place. There are three ways to do this:

Manual redaction. Replace names, IDs, and identifiers with placeholders before pasting. Reliable but slow and easy to forget under deadline pressure.
Synthetic substitution. Swap real values with realistic but fake ones (e.g., "Jane Doe" → "Mira Patel"). Preserves the model's ability to reason about the structure of the document.
Local-first redaction tools. Browser extensions that detect sensitive entities client-side and mask them automatically before the prompt is submitted.

Sonomos Locke is in the third category: it runs entirely in your browser, identifies categories like names, financial accounts, and health terms, and replaces them with reversible tokens so the model still produces a useful response. Because detection happens before the network request, the unmasked text never reaches OpenAI, Anthropic, Google, or any third-party server — including ours.

Step 3: Adopt a "minimum necessary" prompting habit

Borrowing from HIPAA's Minimum Necessary Standard, ask yourself before every prompt:

Does the model need this name to answer the question? Usually no — "the customer" works.
Does it need the full document, or only the paragraph in question?
Does it need the real numbers, or rounded approximations?

Most useful AI tasks — drafting, summarizing, brainstorming, refactoring — work just as well on redacted or synthetic input. Reserve raw data for tools you control end-to-end.

Step 4: Separate what AI sees from what AI returns

A surprising amount of leakage happens on the way out of the model. If you ask ChatGPT to "rewrite this email," and you pasted the customer's medical history into the original, the medical history is now in the model's response window — and any browser extension, screen recorder, or screen-share session can capture it.

Two practices help:

Round-trip redaction. Mask data on the way in and unmask it locally on the way out. Locke does this automatically using the reversible tokens it inserted.
Output review. Read the response before you copy it back into a system of record. AI assistants regularly hallucinate identifiers; copying without review can introduce new PII problems.

Frequently asked questions

Is it safe to paste my own personal data into ChatGPT?

Pasting your own data is your decision to make, but understand the trade-off: free-tier prompts can be retained, reviewed, and (unless you opt out) used to refine future models. For one-off tasks involving low-risk personal information, that may be acceptable. For anything you would not want to read on the front page of a newspaper, redact first.

Does using the API instead of the web interface protect me?

Partially. Most major providers exclude API traffic from training by default and offer enterprise-grade data processing agreements. The API does not, however, prevent the contents of your prompts from being logged for abuse monitoring, typically for 30 days. If those logs would expose regulated data, redact before sending.

What about "incognito" or "temporary" chat modes?

Temporary chat in ChatGPT, "ephemeral" chats in Claude, and similar modes prevent the conversation from appearing in your history and (per the providers' terms) prevent training. They do not prevent transit to the provider's servers. Treat them as "do-not-store" rather than "do-not-send."

Can my employer see my AI prompts?

If you use a managed browser, a corporate VPN, or a workspace account, assume yes. DLP (data loss prevention) products increasingly inspect AI-bound traffic. This is one more reason to redact at the source: the cleaner your prompts, the smaller the audit surface.

Do I need a different tool for each AI?

No. Local-first privacy tools like Locke work across ChatGPT, Claude, Gemini, Copilot, Perplexity, and most chat-style interfaces because they intercept the prompt in the browser, not at the provider's servers. One install covers all of them.

The bottom line

Protecting sensitive data in 2026 is not about avoiding AI — the productivity gains are too large to ignore. It is about giving each model the minimum necessary information and keeping the rest under your control. Tighten the provider settings you already have, build a redaction habit, and use a local-first tool for the moments when habit slips.

Your thoughts, your clients' data, and your company's secrets do not have to leave your device for AI to be useful. Treat that as the new default.

How to Protect Sensitive Data When Using ChatGPT, Claude, and Gemini (2026 Guide)