Back

Next Blog

PII and PHI Redaction in AI Voice Agents: Detailed Guide for 2026

Date

Jun 13, 26

Reading Time

12 Minutes

PII vs PHI: The Definitions That Determine Your Compliance Regime

People throw PII and PHI around like they mean the same thing. They don't. And in the context of PHI and PII redaction in AI voice agents, that distinction changes everything: your compliance regime, your vendor contracts, your build requirements.

PII is any data that can identify a specific person. Name, phone number, email, SSN, card number, device ID. Governed by GDPR, CCPA, and whatever state privacy law applies to where your users are.

PHI is narrower. It's PII that relates to health status, healthcare treatment, or payment for care, AND is held by a HIPAA-covered entity or its business associates. That "and" is doing a lot of work.

Here's the overlap that trips most teams: a phone number is PII everywhere. But the moment your voice agent books a medical appointment and logs that number, it becomes PHI under HIPAA. Same data point. Completely different legal obligations.

This is why PII redaction rules can't be set by data type alone. They're set by context, by who holds the data, and what it was collected for.

	PII	PHI
Definition	Any data identifying a specific person	PII tied to health status, care, or payment
Who it governs	All businesses handling personal data	HIPAA-covered entities and business associates
Primary law	GDPR, CCPA, state privacy laws	HIPAA (US)
Redaction standard	Data minimization, proportionality	18 Safe Harbor identifiers or Expert Determination
Voice agent trigger	Any call logging personal data	Medical booking, insurance claims, clinical context

For a full look at the security obligations that follow from this classification, see our guide on voice agent privacy and security.

Both are hard to protect in a web form. Voice agents are a different class of problem entirely.

Why Voice Agents Leak PII Differently Than Every Other Channel

Unlike AI chatbots, which receive PII through controlled input fields, voice agents get it as raw, unstructured speech. And that's what makes PII redaction so much harder here than in any other channel. No field validation. No character limits. A caller booking a dental appointment can volunteer their insurance ID, a child's SSN, and a home address in a single sentence, none of it prompted.

That's problem one. There are three more.

Real-time audio doesn't wait. With a form submission, you validate before writing to a database. With a voice stream, audio is already flowing to your ASR provider before you've finished processing the previous utterance. Across inbound and outbound voice flows, there's no natural pause-before-storage moment unless you build one in deliberately.

Third: callers volunteer data you never asked for. The agent asks for a date of birth. The caller gives the DOB, then their insurance member ID, then their mother's maiden name. All three land in your transcript and your LLM context without any prompting.

And then the vendor chain multiplies everything. Twilio logs the call metadata. Deepgram gets the audio stream. Your LLM provider gets the full transcript in context. Your monitoring tool logs the trace. At the WebRTC and SIP transport layers, call metadata is logged before any redaction system can run. One piece of PHI hits four vendor environments in under two seconds.

This is why PHI and PII redaction in AI voice agents needs control at each stage of that chain. PII redaction that only covers the transcript is theater, not compliance.

Each of those four vectors maps to a specific stage in your pipeline. The next section names exactly where the exposure sits.

The Seven Places PII and PHI Actually Surface in a Voice Pipeline

Infographic showing the seven places PII and PHI surface in a voice agent pipeline: telephony capture, ASR transcription, LLM context window, function call outputs, call recordings, observability logs and OTel spans, and sub-processors and data residency.

The transcript isn't the risk. It's one of seven places where sensitive caller data surfaces in a standard voice pipeline, and most deployments I've seen cover two of them, maybe three.

Pipeline Stage	What Gets Exposed	Control Required
Telephony capture (Twilio, SIP)	Raw audio stream, call metadata: caller ID, timestamps, duration	Encrypted transport (TLS 1.2+), metadata field scoping
ASR / Transcription (Deepgram)	Full spoken transcript, speaker turn boundaries, disfluencies	Redact at the transcription layer before any write to storage
LLM context window	Full transcript passed as prompt; tool call outputs injected into context	Strip PII before model send; scope context to the minimum needed
Function calls and tool outputs	CRM records, EHR data, claims details, and account information returned by APIs	Sanitize API responses before logging; log only fields needed for audit
Call recordings and stored audio	Full dual-channel audio; voice biometrics (HIPAA identifier #16)	Audio masking at the recording layer, or delete after redacted transcript extraction
Logs, traces, observability (OTel spans)	Debug data, raw LLM prompts, latency traces with transcript fragments embedded	Custom span processors scrub attributes before export; log filters on application logs
Sub-processors and data residency	Vendor copies landing in regions that may violate cross-border transfer rules	BAAs and DPAs with every vendor in the chain; data residency controls

Walking through that table, the entries that actually get teams in trouble aren't the obvious ones. Telephony and transcription, teams think about those. The real problems show up in the observability layer.

Here's a real pattern from Hamming's analysis of 4M+ production voice calls: an engineer adds a raw LLM prompt to an OpenTelemetry span three months ago while debugging latency. Nobody removes it. The transcript viewer shows asterisks. The Datadog trace shows the full card number. This is "redaction theater": the UI looks compliant, the infrastructure isn't.

PII redaction that covers only the transcript protects about 15% of the actual exposure surface.

In healthcare voice deployments, function calls returning patient records or appointment history are the most common source of PHI entering the LLM context window unprompted. If your agent queries a voice AI knowledge base for clinical protocols or insurance SOPs, that API response carries structured PHI straight back through the pipeline. And agentic RAG systems that pull from policy documents or medical records introduce a document-level exposure risk at the tool output stage that almost nobody maps during the build.

So PHI and PII redaction in AI voice agents requires controls at all seven stages, not just the two that your users can see.

"The voice agent tells the caller it won't repeat their card number. The OpenTelemetry span stores it in six different ways." - Hamming, analysis of 4M+ production voice agent calls, 2025.

Knowing where the leak is gets you halfway. Each stage needs its own technique, and they're not interchangeable.

Redaction Techniques That Actually Hold at Each Stage

No single technique covers the full pipeline for PII redaction. Each stage demands its own approach. Use the wrong one in the wrong place, and you get either a compliance gap or a transcript so degraded it becomes useless for debugging.

Real-time vs Batch

	Real-time Redaction	Batch Redaction
When it runs	Before database write	After storage, on a schedule
Latency cost	10-50ms per transcript chunk	None at runtime
Compliance gap	None if configured correctly	PII exists in your database until the job runs
Accuracy risk	Chunk boundary failures need 2-3 chunk buffering	Full context = higher accuracy
Best for	Production voice agents	Supplementary archive verification pass

Real-time for production. Batch as a verification layer only. For teams already managing voice agent latency at the infrastructure level, the 10-50ms cost of real-time redaction is manageable with buffered chunk processing.

Audio-level masking

Transcript redaction doesn't touch recordings. Two things to handle at the audio layer. Configure Twilio to suppress DTMF tones during IVR payment entry. PCI DSS requires it, and it's an architectural control, entirely separate from transcript work.

The HIPAA shortcut most teams don't know about: delete the original audio after extracting a redacted transcript. Stored recordings are HIPAA identifier #16, voice biometrics. Delete the audio, and that entire identifier exits your compliance scope.

Transcript NER vs Regex

This is where I see the most mistakes in production deployments.

	Regex / Pattern Matching	ML-Based NER	Hybrid
Accuracy	60-80% F1	94-96% F1	92-95% F1
Speed	Under 5ms	10-50ms	15-60ms
Core weakness	Breaks in spoken language	Slower, higher cost	Added complexity
Best for	Structured data: card numbers, SSNs	Names, addresses, context-dependent PII	Production systems

Regex fails on voice. A caller saying "four, one, one, one" doesn't match a 16-digit card pattern. NER catches it. Production voice agents need both. For multilingual voice agents, NER accuracy also varies by language: English hits 94-96% F1, and lower-resource language models can drop well below that.

Tokenization

Two options. Irreversible replaces sensitive data with a label like [CREDIT_CARD_1], which is standard for transcript storage. Reversible tokenization encrypts the original value and stores a lookup token in a secure vault, used when compliance officers or auditors need to recover a specific value for investigation.

Model layer controls

Getting PHI and PII redaction in AI voice agents right means treating the LLM as a data path, not just a processing step. LLM selection for voice agents directly affects what data leaves your environment, which vendor agreements you need, and whether on-prem deployment is even viable for your compliance posture. Self-hosted ASR like Deepgram on-prem keeps audio within your environment. Cloud ASR requires a signed BAA for any HIPAA-covered entity.

And run your detection pipeline on agent outputs, not just caller inputs. An LLM that surfaces records through a tool called can echo PHI straight back in its own response without any prompting. How you structure voice AI prompting also controls how much raw conversation history the model receives per turn, which directly limits what PII redaction needs to catch at the model layer.

Transcribe first, then redact. The original unredacted transcript should never reach your database. If your pipeline stores first and runs a background redaction job, your compliance gap is exactly as wide as the time between those two events.

Techniques cover the how. Before you build, you need to know which regulatory frameworks apply, because each one sets a different bar for what "redacted" legally means.

The Compliance Frameworks Your Voice Agent Answers To

One voice agent can sit under four compliance frameworks simultaneously. Take a healthcare insurer in Dubai processing US-based claims: HIPAA applies because it's health data touching a covered entity, GDPR kicks in if any EU citizens are involved, UAE PDPL governs because processing happens on UAE soil, and PCI DSS applies the moment a caller reads out a card number for a premium payment. Four regimes, one call. This is exactly why PHI and PII redaction in AI voice agents can't be designed around the framework your legal team knows best.

Your redaction design has to satisfy the strictest framework you touch, not the most convenient one.

Framework	Covers	Max Penalty	Key Voice Agent Obligation
HIPAA (US)	PHI held by covered entities and business associates	$50K per violation, $1.5M annual cap	BAA with every vendor, 18 Safe Harbor identifiers removed, breach notification within 60 days
PCI DSS	Payment card data spoken on any call	$100K/month + card brand fines	DTMF suppression during payment entry, no post-auth PAN storage, no CVV storage
GDPR / UK GDPR	EU/UK personal data	4% of global revenue or €20M	Lawful basis for recording, right to erasure, irreversible anonymization standard
UAE PDPL + DIFC/ADGM	Personal data processed in the UAE	Varies by regime	Data localization, cross-border transfer controls, and separate DIFC/ADGM rules for financial zones
PIPEDA + Quebec Law 25	Canadian personal data	Up to CAD $25M (Quebec)	Express consent for recording, mandatory breach notification, and right to deletion

For a full breakdown of what it takes to build and operate a HIPAA-compliant voice agent, including BAA specifics and audit trail standards, that guide covers it in detail.

Insurance voice agents face the full PCI DSS requirement the moment a caller reads out a card number for a premium payment or claim. Ecommerce voice agents hit the same trigger during any order confirmation or payment update call. And logistics voice agents operating across the UK, UAE, and EU face a three-framework compliance requirement on the same call when the cargo involves a cross-border shipment.

One practical note on HIPAA for 2026: the updated Security Rule, published in 2024, remains in proposed form as of mid-2026. Build your PII redaction controls to the current rule in force, and track the Federal Register for updates to the timeline. Don't engineer for legislation that isn't enacted yet.

GDPR: 4% of global annual revenue. HIPAA: $1.5M per incident category per year. CCPA: $750 per consumer per incident. PCI DSS: $100K per month. These are the active enforcement figures your voice agent operates under right now, not theoretical maximums.

Frameworks tell you what you must achieve. The architecture section shows you how to build a system where compliance is structural rather than something you retrofit before an audit.

What a Redaction-Safe Voice Agent Pipeline Actually Looks Like

Most redaction failures aren't technique failures. They're placement failures. Redaction sits too late in the pipeline, and by the time it runs, the original data has already been written to two or three systems. Understanding the full AI voice stack is the prerequisite for knowing where each control actually belongs.

Here's how a properly sequenced pipeline looks, stage by stage.

Stage 1: Capture (Twilio/SIP).

Dual-channel recording separated at source. TLS 1.2+ transport. DTMF suppression active during any payment or authentication flow. Metadata scoped to the minimum required fields only.

Stage 2: ASR (Deepgram).

Audio streams to Deepgram under a signed BAA for HIPAA deployments, or self-hosted for full data isolation. 2-3 chunk boundary buffering before NER detection runs. Redaction executes before any transcript writing. Confidence threshold at 0.85 for production.

Stage 3: Voice synthesis (ElevenLabs / Retell AI).

Output sanitization runs on the agent's text response before it reaches ElevenLabs. This captures PHI that surfaces via a tool call and would otherwise be spoken back to the caller verbatim.

Stage 4: LLM context.

Redacted transcript passed to the model. Tool call responses from CRM, EHR, and claims APIs are sanitized before injection into context. Minimum necessary context per turn.

Stage 5: Recording vault.

Store dual-channel audio with masking at sensitive segments, or delete original audio after extracting a redacted transcript. Deletion removes voice biometrics from the scope, which matters because stored recordings qualify as HIPAA identifier #16.

Stage 6: Observability.

Custom OTel span processors scrub attributes before export. No raw LLM prompts in any trace, ever. Log filters strip PII from all application output.

Stage 7: Audit trail.

Log entity types, timestamp, detection version, and the vendor that processed the data. Not the values themselves. This is what you show in a compliance audit.

What stays in your environment versus what leaves:

Your environment: recordings, transcripts, audit logs, PII vault
Vendor environments: audio stream (Deepgram), text prompts (LLM provider), call metadata (Twilio)
BAA or DPA required for: Deepgram, your LLM provider, any vendor receiving PHI from regulated subjects

Relinns builds on Retell AI, Deepgram, ElevenLabs, and Twilio. The architecture above is how we deploy for clients handling PHI and PII in AI voice agents across healthcare, insurance, and financial services, while simultaneously complying with HIPAA, PCI DSS, and GDPR. PII redaction is built at ingress. Not patched in after a compliance review.

If you're starting from scratch, the full walkthrough on building an AI voice agent covers the infrastructure decisions and the redaction controls described here. And this architecture needs to hold under production load. The guide covers the decisions that govern scaling AI voice agents without opening new compliance gaps.

Even well-designed architectures fail at predictable points. The next section covers the specific mistakes that show up in actual breach reports.

The Mistakes That Get Teams Fined and the One-Line Fix for Each

These aren't hypothetical risks. They're the patterns that show up in breach disclosures for teams that treated PHI and PII redaction in AI voice agents as a configuration task rather than an architecture decision.

Mistake	Why It Fails	Fix
Recording before redacting	Audio stored with PII intact; batch job creates a compliance gap	Redact at ingress. Never batch-only in production.
Trusting regex alone	Spoken language breaks patterns: "four, one, one, one."	Hybrid NER + regex, tested with voice-realistic inputs, not typed text
Logging the raw LLM prompt	OTel spans expose the full transcript, including PII	Custom span processors scrub attributes before export
No BAA with ASR vendor	HIPAA violation the moment PHI reaches the vendor	BAA signed before the first production call, not the first complaint
Ignoring the caller who reads out an unsolicited card number	Pipeline designed only for prompted data fields	Detection runs on all turns, all speaker channels, full audio
Keeping transcripts indefinitely	GDPR requires purpose-limited retention; HIPAA requires defined schedules	Set retention periods at build time. Automate deletion.
Treating platform-native redaction as complete	Platform tools cover the transcript viewer, not logs, traces, or analytics pipelines	Audit every data path, not just the one visible to users

Every mistake in that table is a PII redaction gap. In a regulated environment, any one of them is enough to trigger a finding.

The last one is the most underestimated. Platform-native PII redaction is marketed as a compliance feature. What it actually covers is the transcript viewer. OTel traces, debug logs, and analytics pipelines stay wide open, and that's where most actual breaches originate.

Regression testing your voice agent against synthetic PII datasets is how you validate the full pipeline before production. Not just the display layer.

And it's worth saying directly: AI voice agents vs. human agents carry fundamentally different compliance risk profiles. Human agents can be retrained after a policy change. A voice agent's exposure doesn't need a bad day to surface. It just needs a gap in the architecture.

The gap between a compliant voice agent and a non-compliant one isn't intent. It's whether the redaction controls are structural or added after the audit.

Ready to Build It Right?

Relinns is HIPAA compliant. We build AI voice agents for clients in healthcare, insurance, and financial services, where PHI and PII redaction in AI voice agents is required by vendor agreements and regulatory audits, not listed as features.

The architecture in this guide is how we handle PII redaction in production for clients operating under HIPAA, PCI DSS, GDPR, and UAE PDPL simultaneously.

Retrofitting compliance after launch costs more than building it in from the start. See how voice agent costs break down when compliance is part of the original build, not a later addition.

Compliance in voice AI is a custom AI development problem. There's no off-the-shelf configuration that covers your specific stack, your vendor chain, and your regulatory exposure simultaneously.

Book a technical consultation. We'll map your current pipeline against the seven exposure stages and show you exactly where the gaps are. Already running a voice agent in a regulated vertical and unsure if your current PII redaction holds up under HIPAA or GDPR scrutiny? Ask us for a compliance architecture review.

We build HIPAA-compliant voice agents with redaction built in from day one.
Talk to Experts!

Recommended for you

AI Voice Agents

7 Biggest Reasons why AI Voice Agents Fail After the Pilot

Joget Development

A Clear Guide to Joget DX9: Features and What Changed From DX8

AI Voice Agents

Voice Agent Red Teaming: Break Your Bot Before Attackers Do

AI Voice Agents

UAE PDPL and AI Voice Agents: Risks and Compliance Checklist

Need AI-Powered
Chatbots &
Custom Mobile Apps ?

Ok, let’s do this

PII and PHI Redaction in AI Voice Agents: Detailed Guide for 2026

PII vs PHI: The Definitions That Determine Your Compliance Regime

Why Voice Agents Leak PII Differently Than Every Other Channel

The Seven Places PII and PHI Actually Surface in a Voice Pipeline

Redaction Techniques That Actually Hold at Each Stage

Real-time vs Batch

Audio-level masking

Transcript NER vs Regex

Tokenization

Model layer controls

The Compliance Frameworks Your Voice Agent Answers To

What a Redaction-Safe Voice Agent Pipeline Actually Looks Like

The Mistakes That Get Teams Fined and the One-Line Fix for Each

Ready to Build It Right?

Need AI-Powered Chatbots & Custom Mobile Apps ?

Need AI-Powered
Chatbots &
Custom Mobile Apps ?