Real-Time PII Detection API for AI Applications
PrivacyGuard scans every prompt for 17 PII entity types — emails, phone numbers, IBANs, U.S. SSNs, SWIFT codes, and regional identifiers like NIF, CPF, and CURP — and returns a redacted version your application can safely forward to the model. Mean server-side latency: 23.4 milliseconds. Every call ships with a SHA-256 signed report and a UUID, so any later audit can reproduce what was redacted and when.
What counts as PII in an LLM application?
Personally identifiable information — PII — is any data that can identify a specific person directly (an email, a phone number, a national identifier) or indirectly when combined with other data. In a Large Language Model application the risk is that users routinely paste this information into prompts without realizing it: a customer-support assistant receives an email signature; a coding assistant receives a stack trace with a real user record; a translation tool receives a contract with a tax ID.
Once that information enters the model the application loses control of it. The model provider may log it. The application itself may store the prompt for analytics or fine-tuning. The information may surface again in a completion to a different user. The window between user input and model invocation is the only point where redaction is straightforward — once the prompt has been processed there is no clean way to unmix it. A real-time PII detection API is the surgical instrument for that window.
The 17 PII entity types
PrivacyGuard recognizes 17 entity types covering the categories most production applications encounter:
- Contact data — email addresses, phone numbers in international and U.S. formats.
- Financial identifiers — IBAN account numbers with country prefix and check digits, SWIFT/BIC codes.
- National identifiers — U.S. Social Security numbers (SSN), Spanish NIF, Brazilian CPF, Mexican CURP, and additional regional formats published in the Integrity Report.
Each entity is matched by a dedicated pattern, not by a single generic regular expression. Regional identifiers fail under a one-size-fits-all approach: an IBAN has structural check digits, a CPF has its own modulo-11 validation, a CURP follows the date-of-birth-plus-initials schema. Treating them as first-class entities lets PrivacyGuard report the type of redaction with the correct semantic label — which downstream audit systems need when explaining what was redacted from a user prompt.
What the API returns
Every analysis with the privacy module enabled returns a privacy.entities array. Each entity records the type (EMAIL, PHONE, IBAN, SSN, etc.), the action taken (REDACTED), and the position offsets of the match in the original input as a [start, end] pair. Position offsets let your application overlay the redaction back onto its own UI if it needs to show users which part of their input was modified.
When at least one PII entity is detected, the top-level verdict becomes ANONYMIZED and the response includes an anonymized_input field with placeholder tokens substituted for each match. Forwarding anonymized_input to your LLM removes the user's raw PII from the model's context without affecting the meaning of the prompt — a question containing an email address can still be answered without the model ever seeing the address.
Every analysis ships with a SHA-256 hash of the report contents, a UUID, and a UTC timestamp. That combination is sufficient to reconstruct what was redacted from any specific prompt months later.
API request and response
The request body is identical to the integrity workflow except for the modules array, which contains privacy (alone, or combined with integrity to run both engines in one call). The options object accepts a language field — defaulting to auto — that selects the locale-specific PII patterns.
# Detect and redact PII — uses the privacy module. curl -X POST https://api.zentricprotocol.com/v1/analyze \ -H "Authorization: Bearer zp_live_••••••••" \ -H "Content-Type: application/json" \ -d '{ "input": "My email is john.doe@example.com and my SSN is 123-45-6789.", "modules": ["privacy"], "options": { "language": "auto" } }'
In the response the verdict is ANONYMIZED. The privacy.pii_detected flag is true and the privacy.entities array contains one record per match. The anonymized_input field replaces every detected entity with a [REDACTED] placeholder while preserving the rest of the prompt unchanged. Your application forwards anonymized_input to the LLM and never sends the raw value across the wire to the model.
# 200 OK — verdict ANONYMIZED + redacted output { "status": "ok", "verdict": "ANONYMIZED", "report": { "report_id": "zp_4F0E670C1E27E4B4", "uuid": "a1c4…-…-…", "timestamp_utc": "2026-05-17T11:43:51.108Z", "sha256": "4f0e670c…", "verdict": "ANONYMIZED", "integrity": { "injection_detected": false, "signatures_matched": [], "confidence": 0.9998 }, "privacy": { "pii_detected": true, "entities": [ { "type": "EMAIL", "action": "REDACTED", "position": [12, 33] }, { "type": "SSN", "action": "REDACTED", "position": [49, 60] } ] }, "compliance": { "gdpr_art30": true, "ccpa": true, "eu_ai_act_s52": true }, "latency_ms": 22.7 }, "anonymized_input": "My email is [REDACTED] and my SSN is [REDACTED].", "latency_ms": 22.7 }
Why this matters for data protection
Data-protection programs measure two things: the surface area where personal data is processed, and the auditability of every processing decision. A prompt that contains a user's IBAN and is sent to an external LLM expands both surfaces — the IBAN is now processed by an additional vendor, and the decision to send it has no record. Redacting at the prompt layer reverses both effects.
PrivacyGuard's structured output — typed entity records with position offsets, a SHA-256 hash on the full report, and a UTC timestamp — gives a compliance team a defensible answer to "what personal data did this LLM see, when, and how was it minimized?" Combined with retention-window controls on your side, this is the foundation of GDPR-style data-minimization for AI applications.
Performance and pricing
Mean server-side latency is 23.4 milliseconds with P99 under 100 milliseconds, measured across one million simulated requests. The Free tier covers 2,000 PII analyses per month with no credit card. Growth at $499 per month raises the quota to 100,000 analyses. Enterprise at $2,500 per month removes the quota and adds EU data residency, dedicated SLA, and signed PDF integrity certificates suitable for vendor due-diligence packages.
Redact PII before it reaches your model
Start with 2,000 free PII analyses per month, no credit card. Same module, same precision, same signed audit report as paid tiers.