Sonomos — Use AI Without Exposing Confidential Data

Short answer: PII (personally identifiable information), PHI (protected health information), and personal data are overlapping but legally distinct concepts. PII is a US privacy term covering data that identifies a person. PHI is the HIPAA-specific subset of PII tied to health care. Personal data is the broader GDPR concept that covers anything relating to an identified or identifiable person — and reaches data that PII does not. Choosing the right framework matters because each one has different rules, scope, and penalties.

This guide explains each term in plain English, shows how they overlap and diverge, and gives a quick-reference table for compliance, security, and product teams in 2026.

Personally Identifiable Information (PII)

Definition. PII is any data that can be used, alone or in combination with other information, to identify a specific individual. The term comes from US federal privacy guidance (notably NIST SP 800-122) and appears in dozens of state laws, breach-notification statutes, and contracts.

Two categories. NIST and most regulators recognize:

Direct identifiers — full name, government ID, passport number, email address, phone number, biometric template, account number tied to a person.
Indirect identifiers — date of birth, ZIP code, employer, IP address, device fingerprint, rare medical condition, time-and-location pairs.

Where it appears legally. "PII" itself is rarely a regulated category — most US laws use more specific terms (SPI in California, Personal Information in CCPA/CPRA, Sensitive Personal Information / SCD in some state laws). PII is the umbrella term that crosses regimes; the specific obligation depends on which regime applies.

Rule of thumb. If a piece of data can identify a person, treat it as PII for security and operational purposes. The legal label may be different (PHI, Personal Data, NPI), but the operational handling is similar.

Protected Health Information (PHI)

Definition. PHI is the regulated subset of PII created, received, maintained, or transmitted by a covered entity (or a business associate on its behalf) that relates to:

The past, present, or future physical or mental health or condition of an individual;
The provision of health care to an individual; or
The past, present, or future payment for the provision of health care.

Key statute. HIPAA's Privacy Rule (45 CFR Part 164, Subpart E) and Security Rule (Subpart C). The Office for Civil Rights (OCR) at HHS enforces.

The 18 identifiers. Under HIPAA's Safe Harbor de-identification method, removing the 18 identifier categories (names, geographic subdivisions smaller than a state, dates more granular than year, phone, fax, email, SSN, MRN, plan IDs, account numbers, certificate / license numbers, vehicle identifiers, device identifiers, URLs, IPs, biometric identifiers, full-face photos, and any other unique identifying number) renders data not PHI.

Special note on AI. Sending PHI to an AI provider without a Business Associate Agreement (BAA) is an unpermitted disclosure under the Privacy Rule. Even with a BAA, configuration matters: zero retention, encryption, access controls, and audit logging all factor into Security Rule compliance. (Our HIPAA + ChatGPT post goes deep on this.)

Definition. Article 4(1) of the GDPR: "any information relating to an identified or identifiable natural person." The European Court of Justice has interpreted this broadly.

Key consequence. "Personal data" is wider than US PII. Examples that may not be classic PII but are personal data under GDPR:

An IP address (when it can be linked, even indirectly, to a natural person).
A pseudonymized identifier where the controller (or anyone reasonably likely to access the data) holds the key.
An employee ID alone, in many circumstances.
A photograph of a person, even without a name.
Voice recordings.
Online identifiers, advertising IDs, cookies in some configurations.
Inferences and AI-generated profiles about a person.

Special category data (Article 9). A higher-protection subset including racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, genetic data, biometric data for unique identification, health, sex life, or sexual orientation. Equivalent to "sensitive personal information" in several US state laws.

Rule of thumb. If your processing reaches an individual in the EU or UK, assume personal data, evaluate against GDPR or UK GDPR, and document the basis.

Sensitive Personal Information (SPI / SPL) — California (CPRA) and several other US states. Includes specific categories such as government IDs, account credentials, precise geolocation, racial/ethnic origin, sexual orientation, mail/email/text content, genetic and biometric data, and health information. Triggers heightened consumer rights.
Nonpublic Personal Information (NPI) — Gramm-Leach-Bliley Act. Financial-services term covering customer information collected by a financial institution that is not publicly available.
Cardholder Data (CHD) / Sensitive Authentication Data (SAD) — PCI-DSS. Card numbers, names tied to accounts, service codes; SAD includes magnetic-stripe, CVV, PIN.
Personal Identifier (CCPA / CPRA) — broader than direct identifiers; includes online identifiers, IP, account, alias, and similar.
Education Records / Personally Identifiable Information from Education Records — FERPA.
Pseudonymous Data / Pseudonymized Data — data where direct identifiers have been replaced with reversible tokens; still personal data under GDPR if the key exists.
Anonymous Data / Anonymized Data — data altered such that identification is no longer reasonably possible. Generally outside GDPR; the bar is high and contextual.

Quick-reference comparison table

| Concept | Source | Scope | Examples in scope | Examples out of scope | | --- | --- | --- | --- | --- | | PII | NIST 800-122 + general US usage | Anything that identifies a person directly or indirectly | Name, SSN, email, IP, MRN | Aggregated statistics with no individual link | | PHI | HIPAA (US) | Health information held by a covered entity / BA, tied to an individual | Patient name + diagnosis; appointment schedule with patient name | Health information not tied to an individual; HIPAA-de-identified data | | Personal Data | GDPR (EU/UK) | Any information relating to an identified or identifiable natural person | Name, IP, employee ID, photograph, online identifier, inference | Truly anonymous statistical data | | Special Category Data | GDPR Art. 9 | Race, ethnicity, religion, sex life, biometric (for ID), genetic, health | Health record, religious affiliation | Generic demographic (age band) without identification | | SPI | CPRA / state laws | Heightened-protection subset of personal information | Government ID, precise geolocation, account credentials | Public-record name | | NPI | GLBA | Financial-services customer information not publicly available | Account balance, transaction history | Information already in the public record | | CHD | PCI-DSS | Payment-card data | PAN, expiration, name | Tokenized card reference (varies) |

How they interact with AI

For most AI workflows in 2026, the practical question is not "which label" but "which obligations." A few common patterns:

A US healthcare team using ChatGPT on patient notes. PHI (HIPAA), almost certainly also PII, possibly Personal Data if the patient is in the EU. Need: BAA, configuration aligned with the Security Rule, plus GDPR if applicable.
An EU SaaS company using Claude on customer-support tickets. Personal Data (GDPR), possibly Special Category if the tickets include health/political/etc. info, possibly PII for US security purposes. Need: lawful basis, DPA, transfer mechanism, DPIA where applicable. (See our GDPR + AI guide.)
A US bank using Gemini on customer onboarding. NPI (GLBA), PII generally, possibly Personal Data if customers are in the EU. Need: Safeguards Rule compliance, FFIEC alignment, and DPAs. (See our financial services post.)
A US recruiter using ChatGPT on resumes. PII, often SPI under CPRA, possibly Personal Data if the candidate is in the EU. Need: enterprise tier with DPA, candidate notice under LL144 / state laws, and possibly DPIA. (See our AI in hiring post.)

The constant across all of these: keep regulated data inside the right boundary, and use technical controls — local-first redaction at the browser, enterprise tiers with DPAs and zero retention — to make the safe path the easy path. Locke detects entity categories that map onto each of these concepts (names, MRNs, account numbers, biometric identifiers) and replaces them with reversible tokens before the prompt leaves the device, so the underlying regime questions become much smaller.

Frequently asked questions

Is an email address PII?

In the US, generally yes — it identifies a person. Under GDPR, it is personal data. Whether it is sensitive depends on the regime: a personal email is often treated as ordinary personal data; a corporate role mailbox may be treated differently in some contexts. Treat it as PII for security; check the specific law for legal classification.

Is an IP address personal data?

Under GDPR, the European Court of Justice has held that dynamic IP addresses can be personal data when the controller has lawful means to combine them with subscriber information (Breyer v. Germany, 2016). In the US, IP addresses are typically PII for breach-notification and contractual purposes, though specific statutes vary.

Does HIPAA's Safe Harbor really make data "not PHI"?

Yes — Safe Harbor de-identification under 45 CFR §164.514(b)(2) renders data not PHI under HIPAA. The bar is high: 18 categories of identifiers must be removed, and the covered entity must have no actual knowledge that the remaining data could be used to re-identify. Many real-world "de-identification" attempts fall short of Safe Harbor and qualify only as a limited dataset (which still requires a Data Use Agreement) — or, in some cases, remain PHI. Re-identification studies have shown that combining a few innocuous-looking attributes (ZIP, DOB, sex) is often enough to identify individuals.

Yes, if the controller (or anyone reasonably likely to access the data) holds the key. Pseudonymization is a technical and organizational measure encouraged by Article 32, but it does not move data outside the GDPR. Truly anonymous data (no reasonable means of re-identification) is outside the regulation, but the bar is meaningfully high.

What's the difference between SPI and Special Category Data?

SPI (sensitive personal information) is a CCPA/CPRA term covering specified high-protection categories under California law. Special Category Data is the GDPR Article 9 term covering the EU's high-protection categories. The lists overlap (race, religion, sex life, biometrics, health) but are not identical, and the legal consequences differ — CPRA gives consumers heightened opt-out rights; GDPR requires an additional Article 9 condition before processing.

How do I know which concept applies to a given dataset?

Start with who the data relates to, where they are, and what the data describes. Health information held by a US covered entity is PHI under HIPAA. Any data relating to an identified or identifiable EU/UK person is personal data under GDPR. Financial-services customer information held by a US FI is NPI. Most data triggers more than one regime simultaneously; build your security and privacy program around the strictest applicable.

Are AI inferences (e.g., "this user seems unhappy") personal data?

Under GDPR, yes — inferences relating to an identifiable person are personal data. Under HIPAA, an AI inference about an individual's health condition is PHI if held by a covered entity or its BA. Under CPRA, inferences are explicitly listed as personal information. Treat AI-generated profiles with the same care you would apply to underlying records.

A short reference for engineers and compliance teams

PII: practical, US-centric umbrella; use for security categorization.
PHI: HIPAA-specific subset; trigger BAA, Privacy Rule, Security Rule.
Personal Data: GDPR's broad concept; trigger lawful basis, DPA, transfer mechanism, DSAR.
Special Category Data: GDPR Article 9 sensitive subset; additional condition required.
SPI: CPRA sensitive subset; consumer opt-out rights.
NPI: GLBA financial-services subset; Safeguards Rule.

In every case, the operational defense is the same: choose enterprise-tier AI tools with DPAs and zero retention, deploy local-first redaction at the input layer, document your basis, and respect data-subject rights. The labels matter for legal classification; the controls matter for keeping data out of the wrong hands in the first place.

The bottom line

PII, PHI, and personal data are not interchangeable. Each comes with its own scope, regulator, and obligations, and most AI use cases trigger more than one at once. Get the concepts straight, design your program for the strictest applicable, and pair the legal map with technical controls that prevent regulated data from reaching tools you cannot fully control. That combination is what holds up under audits, breach investigations, and AI-specific regulatory inquiries in 2026.

PHI vs PII vs Personal Data: A Plain-English Compliance Glossary for 2026

Personally Identifiable Information (PII)

Protected Health Information (PHI)

Quick-reference comparison table

How they interact with AI

Frequently asked questions

Is an email address PII?

Is an IP address personal data?

Does HIPAA's Safe Harbor really make data "not PHI"?

What's the difference between SPI and Special Category Data?

How do I know which concept applies to a given dataset?

Are AI inferences (e.g., "this user seems unhappy") personal data?

A short reference for engineers and compliance teams

The bottom line

Protect your data while using AI

Related Articles

The Colorado AI Act (SB 24-205): A Compliance Guide for 2026

Texas TRAIGA: The Responsible AI Governance Act Compliance Guide (2026)

AI Agents and Data Privacy: Operator, Computer Use, and Agentic Browsing in 2026

PHI vs PII vs Personal Data: A Plain-English Compliance Glossary for 2026

Personally Identifiable Information (PII)

Protected Health Information (PHI)

Personal Data (GDPR)

Other related terms you'll see

Quick-reference comparison table

How they interact with AI

Frequently asked questions

Is an email address PII?

Is an IP address personal data?

Does HIPAA's Safe Harbor really make data "not PHI"?

Is pseudonymized data still personal data under GDPR?

What's the difference between SPI and Special Category Data?

How do I know which concept applies to a given dataset?

Are AI inferences (e.g., "this user seems unhappy") personal data?

A short reference for engineers and compliance teams

The bottom line

Protect your data while using AI

Related Articles

The Colorado AI Act (SB 24-205): A Compliance Guide for 2026

Texas TRAIGA: The Responsible AI Governance Act Compliance Guide (2026)

AI Agents and Data Privacy: Operator, Computer Use, and Agentic Browsing in 2026