PII detection
Detect and redact personal data before the upstream model sees a single byte. Built-in regex detectors cover the high-frequency cases. Microsoft Presidio plugs in as a sidecar for higher recall. Custom rules let you teach the redactor about your own identifiers.
The threat model
PII makes its way into LLM prompts almost no matter what you do. A customer pastes a support ticket. An agent forwards an email thread. A chat user mentions their email or a credit card number while describing a problem. Once those bytes are in the model's context, you have lost control: they show up in logs, in vendor-side caches, in the model's response if it decides to repeat them back, and in any audit you do not own.
AdaptiveAPI's redactor runs before translation and before the upstream call. The model never sees the original spans. Since it never saw them, it cannot emit them.
How it fires
- Inbound request arrives. Body parsed.
- Detector runs over every translatable string field.
- Each match is replaced with an opaque substitute (
[redacted-email],[redacted-card], etc.). - Translation pipeline runs on the redacted text.
- Upstream receives the redacted, translated request.
- Response comes back, is translated, and is returned to the caller. The substitutes are kept verbatim. No de-redaction step. The original PII never leaves AdaptiveAPI's process memory.
Enabling it
Set redactPii: true on the route's proxy rule:
{
"redactPii": true
}
Or override per request with X-AdaptiveApi-Redact-Pii: true.
The header is stripped before the upstream call, so it never leaks into
the model context.
Built-in detectors
The default regex set covers the high-frequency cases. All detectors are conservative: they match formats with strong structure, not free text.
| Detector | Match | Replacement |
|---|---|---|
email | RFC-5322 simplified | [redacted-email] |
creditcard | 13 to 19 digits, Luhn-validated | [redacted-card] |
ssn-us | NNN-NN-NNNN with valid prefix | [redacted-ssn] |
iban | Country-prefixed IBAN, mod-97 checksum | [redacted-iban] |
ipv4 | Dotted quad, valid octets | [redacted-ip] |
phone | E.164 plus common national formats | [redacted-phone] |
Higher recall with Presidio
For names, locations, organisations, driver licences, NHS numbers, and the long tail of locale-specific identifiers, Microsoft Presidio Analyzer plugs in as an HTTP sidecar.
PiiRedactor__Provider=presidio
PiiRedactor__Presidio__BaseUrl=http://presidio:5002
AdaptiveAPI sends the body to the analyzer, takes the returned span list, and applies replacements in the same way the regex redactor does. If Presidio is unreachable, the redactor falls back to the regex set, so a temporary sidecar outage never opens the door for PII to reach the upstream.
Custom rules
Real systems carry identifiers the built-in detectors do not know about. Internal customer IDs. Product SKUs that look like names. Reservation codes, order numbers, internal ticket IDs. AdaptiveAPI lets you teach the redactor about them.
Shape of a custom rule
A custom rule has four parts:
name. Human-readable label (customer-id,internal-ticket).pattern. A .NET regular expression. Anchored, with named groups if you want partial replacement.replacement. The opaque substitute ([redacted-customer-id]).flags. Optional.caseInsensitive,multiline, etc.
{
"name": "customer-id",
"pattern": "\\bCUST-\\d{6,8}\\b",
"replacement": "[redacted-customer-id]",
"flags": ["caseInsensitive"]
}
Where they go
Today, custom rules live in route configuration alongside the rest of the proxy rule. The admin UI is gaining a dedicated PII page that surfaces premade regex packs (US, EU, UK, financial, healthcare), the rule editor, and per-tenant rule libraries that bind to routes by reference.
Tip. Test custom rules against real-shaped data before binding them to a route. A loose pattern (
\\d{6}) will redact phone numbers, postal codes, prices, and version strings along with the customer IDs you wanted. Anchor with surrounding context (\\bCUST-,\\bACME-) and validate with the admin UI's regex tester.
What does not get redacted
The redactor only runs on translatable string fields. By design, it does not touch:
- JSON keys.
- Identifiers in the proxy-rule denylist (
id,uuid,emailas a key, etc.). - URL paths and query strings on the wire.
- Headers, including
Authorization. - Content inside placeholder spans (code blocks, URLs, IDs already wrapped).
That keeps machine-readable parts of your payload working while still catching PII inside human-text content.
Audit and metrics
Every redaction event is recorded as audit metadata: the detector that
fired, the count of matches, the replacement label. The original spans are
never logged. The Prometheus metric
adaptiveapi_pii_redactions_total is labelled by detector and
route, so dashboards can show which routes carry the most sensitive
traffic.
Combining with style rules and glossaries
PII detection runs first, before glossary substitution and before style rules apply. That ordering matters: glossary terms cannot accidentally contain redacted spans, and style rules cannot leak the original PII into a custom instruction.