Skip to main content
    Back to Blog
    12 min readLast reviewed:
    shadow ai
    chatgpt data leak
    pii exposure
    data loss prevention
    enterprise ai security
    hipaa
    gdpr
    ccpa
    generative ai risk
    data exfiltration

    Can Your Employees Accidentally Leak PII to ChatGPT? Yes — and Here's What It's Costing You

    Team Sonomos

    ChatGPT now has 800 million weekly active users. That's double what it had in February 2025. It's deployed by 92 percent of Fortune 500 companies. And according to LayerX Security's 2025 Enterprise GenAI Report, 45 percent of enterprise employees now actively use generative AI platforms — with 77 percent of them copying and pasting data into their chatbot queries. More than half of those paste events contain corporate information. Twenty-two percent include PII or payment card data.

    The employees doing this aren't negligent. They're trying to be productive. A paralegal pastes case details to draft a motion. A doctor enters patient information to write an insurance appeal. An analyst feeds financial records into ChatGPT for pattern analysis. The intent is efficiency. The effect is uncontrolled data exfiltration.

    IBM's Cost of a Data Breach Report 2025 put numbers on the damage: shadow AI — the use of AI tools without employer approval or oversight — was a factor in 20 percent of all data breaches, adding $670,000 to average breach costs. When shadow AI was involved, 65 percent of compromised records were customer PII — significantly higher than the 53 percent global baseline. And 97 percent of AI-related breaches lacked proper access controls.

    This isn't a hypothetical risk. It's happening at scale, right now, in your browser tabs.

    How Data Actually Leaks to AI Tools

    The mechanics are simple, which is exactly why they're dangerous.

    When an employee pastes text into ChatGPT's consumer interface or a personal account, three things can happen — none of them good for your compliance posture.

    First, the data may enter the training pipeline. For free-tier and Plus users who haven't opted out, OpenAI's documentation states that conversations may be used to improve future models. Even when users disable training, deleted conversations and temporary chats are retained for up to 30 days for safety monitoring. For regulated industries subject to HIPAA, GDPR, SOX, or state privacy laws, 30 days of retention in a third-party system with no Business Associate Agreement and no data processing agreement creates compliance exposure on contact.

    Second, the data is accessible to OpenAI personnel. Conversations may be reviewed by employees and authorized vendors for safety and quality purposes. Your client's Social Security number, your patient's diagnosis, your firm's strategic plan — once pasted, the data has an audience you didn't authorize and can't audit.

    Third, and most critically, the data leaves your perimeter permanently. Unlike a misdirected email that you can recall or a misplaced laptop that you can wipe, data submitted to an AI tool is gone. You cannot retrieve it. You cannot confirm what was done with it. You cannot produce forensic evidence of containment for a regulator or insurance carrier.

    The Scale of the Problem in 2026

    The numbers have only gotten worse since Samsung's engineers leaked semiconductor source code and internal meeting notes to ChatGPT in three separate incidents back in 2023 — within 20 days of the company allowing employee access. That incident was a canary. The mine is now full of gas.

    Cyberhaven's analysis of 1.6 million workers found that 8.6 percent of employees have pasted company data into ChatGPT, with 11 percent of all pasted content classified as confidential. The most common categories are sensitive internal documents, source code, and client data. Fewer than 1 percent of employees (0.9 percent) are responsible for 80 percent of these data egress events — meaning a handful of power users generate the bulk of your exposure.

    LayerX's 2025 data paints the current picture. Among employees who paste data into generative AI tools, the average is 6.8 paste events per day. Of those, more than half — 3.8 pastes — include sensitive corporate information. For a company with 10,000 employees where 45 percent use AI tools and 18 percent paste data, that's roughly 1,200 employees generating over 4,500 sensitive data paste events every single day. Most of them invisible to IT.

    The invisibility is structural. Eighty-two percent of data pasted into GenAI tools comes from personal, unmanaged accounts — employees using personal browsers or personal ChatGPT accounts that sit entirely outside corporate identity systems. And 67 percent of all AI usage occurs through non-corporate accounts. Your enterprise SSO, your managed browser policies, your network-level DLP — none of them can see this traffic. It's not shadow IT in the traditional sense. It's data moving from one browser tab to another, on the same device, through a channel your security stack was never designed to monitor.

    Making the visibility problem worse: nearly half of employees hide their AI usage from managers, primarily because they worry it will be perceived as cutting corners. The people most actively using AI are the ones least likely to tell you about it.

    Why Traditional DLP Doesn't Catch This

    Data loss prevention tools were built for a different threat model. They watch for file uploads across network boundaries, scan email attachments, monitor USB transfers, and flag credit card numbers matching known regex patterns. None of that helps when the exfiltration vector is a copy-paste into a browser text field.

    The failure points are specific. Traditional DLP has no visibility into clipboard activity. An employee copying a paragraph of client information and pasting it into ChatGPT doesn't trigger any file transfer event because there is no file. The data moves from one application's memory to another, entirely within the browser, and never touches the network in a way that perimeter-based tools can inspect.

    Pattern matching doesn't help for most sensitive content. A Social Security number has a recognizable format. A client's name, diagnosis, case strategy, or proprietary algorithm does not. DLP built around regex patterns can catch structured PII — and misses the vast majority of sensitive corporate content that constitutes the real risk.

    And even when organizations deploy browser-level controls, personal accounts render them moot. If an employee opens a personal Chrome profile or an incognito window and navigates to chatgpt.com, the corporate browser extension never loads. The managed device policy doesn't apply. The CASB has nothing to inspect. The data leaves cleanly.

    LayerX's report concluded that generative AI tools have become the single largest uncontrolled channel for corporate data exfiltration, responsible for 32 percent of all unauthorized data movement — surpassing shadow SaaS and unmanaged file sharing.

    The Compliance Exposure Is Industry-Specific — and Severe

    The regulatory consequences vary by industry, but the common thread is that no framework was designed to accommodate "the employee pasted it into ChatGPT by accident."

    Healthcare: HIPAA

    A doctor pasting a patient's name and diagnosis into ChatGPT to draft an insurance appeal letter has committed an impermissible disclosure to a non-covered entity. No Business Associate Agreement exists between the provider's organization and OpenAI for consumer ChatGPT accounts. The data has left the covered entity's control, it's being retained by a third party for up to 30 days, and it may be reviewed by OpenAI personnel — any one of which constitutes a HIPAA violation. Penalties range from $141 to $71,162 per violation, with annual caps exceeding $2 million for repeated violations in the same category.

    Legal: Attorney-Client Privilege

    The ABA's Formal Opinion 512 (July 2024) — the first formal ethics guidance on lawyers' use of generative AI — addresses this directly. Under Model Rule 1.6, lawyers must keep confidential all information relating to client representation, and must make reasonable efforts to prevent inadvertent disclosure. The opinion specifically warns that "self-learning GAI tools" create risks that confidential information input into the tool may surface in later outputs, and requires informed client consent before entering confidential information into such tools. Multiple state bars — including California, Florida, New York, and Pennsylvania — have issued guidance warning against inputting confidential client information into public AI tools. An attorney who pastes case strategy into ChatGPT without client consent risks waiving privilege entirely.

    Financial Services: GLBA and SOX

    Client financial data entered into AI tools violates Gramm-Leach-Bliley Act safeguard requirements. The fundamental obligation — maintaining comprehensive security programs to protect customer information — assumes the institution controls where that data goes. When an analyst pastes client portfolio details into a consumer AI tool, the institution has lost control of data it was legally obligated to protect. The audit trail is gone. The data residency is unknown. And no amount of after-the-fact incident response can reconstruct what the AI tool did with the information.

    Insurance

    Cyber insurers are developing AI-specific coverage exclusions, and IBM's finding that 97 percent of AI-related breaches lacked proper access controls is exactly the evidence carriers cite when denying claims. A shadow AI breach with no monitoring, no policy, and no detection in place is the textbook scenario for coverage denial. Organizations that can't demonstrate AI governance measures — approved tool lists, monitoring capabilities, training documentation — may find their policies don't cover the incident that matters most.

    What Companies Have Tried — and Why It Hasn't Worked

    The corporate response has generally fallen into three camps, and none of them fully solves the problem.

    Blocking is the most aggressive response. Major financial institutions, including JP Morgan, have blocked ChatGPT on corporate networks entirely. The problem: blocking drives usage to personal devices. An employee who can't access ChatGPT on their work laptop opens it on their phone — or their personal laptop sitting next to the corporate one — and pastes the same data. The leak still happens. You just lost visibility into it.

    Restricting is the middle ground. Samsung, after the 2023 incidents, limited ChatGPT prompts to 1,024 bytes and built internal AI tools for sensitive work. Their coding assistant is now used by 60 percent of their DX Division developers. This is a stronger approach, but it requires the engineering resources of a Samsung — and it still doesn't solve the problem of employees who circumvent restrictions by switching to personal accounts.

    Educating is the most common and least effective response. Amazon, Walmart, and most enterprises have issued employee guidelines about AI data handling. But 83 percent of organizations lack technical controls to detect or prevent employees from uploading confidential data to AI platforms. They rely on training sessions, warning emails, or nothing at all. And when nearly half of employees actively hide their AI usage, education without enforcement is an awareness campaign, not a security control.

    The common failure across all three approaches: none of them intervene at the actual point of data exposure — the moment an employee pastes sensitive content into a browser text field.

    The Missing Layer: Detection at the Point of Paste

    The gap in every approach above is the same: no visibility or control at the moment data moves from the employee's clipboard into the AI tool's input field. Network controls can't see it. Endpoint agents weren't built for it. Browser policies don't cover personal accounts.

    Effective protection requires four capabilities working together. First, real-time content inspection that scans text as it enters any browser field — not after it's been transmitted, but before the employee hits send. Second, context-aware detection that goes beyond regex patterns to identify sensitive data types relevant to your organization, including unstructured content like client names, case details, and strategic plans. Third, user-facing alerts that catch mistakes before submission — a visible warning that says "this paste contains PII" at the moment it matters. And fourth, audit logging that creates the compliance trail regulators and insurers expect to see.

    This is what Sonomos does. Our feature, Dagger, monitors text inputs, clipboard operations, and browser field content in real time, entirely on-device, across every application — ChatGPT, Claude, Gemini, email, document editors — and flags sensitive data with a traffic-light alert system before it leaves the device. And our feature, Cloak, goes further: when Dagger identifies PII or sensitive content, Cloak automatically replaces it with synthetic placeholders, preserving the structure and utility of the AI query while ensuring the actual data never reaches an external service. The employee's prompt still works. The AI's response is still useful. But the client's Social Security number, the patient's diagnosis, and the deal terms stay on the device.

    No cloud processing. No third-party data access. No new exfiltration vector introduced by the security tool itself.

    Practical Steps — Starting Today

    This week: Audit which AI tools employees are accessing. Check browser history and network logs, but recognize that personal account usage won't appear in corporate logs — that's the gap you need to close. Issue clear guidance on which data categories must never enter any AI tool, with specific examples. For any corporate ChatGPT accounts, enable the "don't train on my data" setting immediately.

    This month: Implement browser-level monitoring that covers AI tool interactions, including paste events. Add AI data handling to security awareness training with concrete scenarios — "don't paste client SSNs into ChatGPT" lands better than abstract policy language. Review your cyber insurance policy for AI-related exclusions before you need to file a claim.

    This quarter: Evaluate enterprise AI tiers — ChatGPT Enterprise, Claude for Enterprise, Gemini for Workspace — that offer contractual guarantees against training data use and provide admin-level audit trails. Deploy detection tools that catch sensitive data at the point of entry, before transmission. Establish incident response procedures specifically for AI data exposure, including notification timelines and regulatory reporting obligations.

    Ongoing: The productivity benefits of AI are real and significant. Outright bans push usage underground and sacrifice competitive advantage. The organizations that get this right will be the ones that enable AI adoption while maintaining visibility and control over the data flowing into those tools. The ones that don't will discover the leak in a breach notification, a regulatory inquiry, or an insurance claim denial.

    Your employees are already using AI. The question is whether you can see what they're sharing.

    See what's leaving your browser →


    *Last updated: February 2026.

    Protect your data while using AI

    Sonomos detects and masks sensitive information before it reaches AI models. 100% local, zero data collection.

    Install Free