Features

PII detection

Detect and redact personal data before the upstream model sees a single byte. Built-in regex detectors cover the high-frequency cases. Microsoft Presidio plugs in as a sidecar for higher recall. Custom rules let you teach the redactor about your own identifiers.

The threat model

PII makes its way into LLM prompts almost no matter what you do. A customer pastes a support ticket. An agent forwards an email thread. A chat user mentions their email or a credit card number while describing a problem. Once those bytes are in the model's context, you have lost control: they show up in logs, in vendor-side caches, in the model's response if it decides to repeat them back, and in any audit you do not own.

AdaptiveAPI's redactor runs before translation and before the upstream call. The model never sees the original spans. Since it never saw them, it cannot emit them.

How it fires

  1. Inbound request arrives. Body parsed.
  2. Detector runs over every translatable string field.
  3. Each match is replaced with an opaque substitute ([redacted-email], [redacted-card], etc.).
  4. Translation pipeline runs on the redacted text.
  5. Upstream receives the redacted, translated request.
  6. Response comes back, is translated, and is returned to the caller. The substitutes are kept verbatim. No de-redaction step. The original PII never leaves AdaptiveAPI's process memory.

Enabling it

Set redactPii: true on the route's proxy rule:

{
  "redactPii": true
}

Or override per request with X-AdaptiveApi-Redact-Pii: true. The header is stripped before the upstream call, so it never leaks into the model context.

Built-in detectors

The default regex set covers the high-frequency cases. All detectors are conservative: they match formats with strong structure, not free text.

DetectorMatchReplacement
emailRFC-5322 simplified[redacted-email]
creditcard13 to 19 digits, Luhn-validated[redacted-card]
ssn-usNNN-NN-NNNN with valid prefix[redacted-ssn]
ibanCountry-prefixed IBAN, mod-97 checksum[redacted-iban]
ipv4Dotted quad, valid octets[redacted-ip]
phoneE.164 plus common national formats[redacted-phone]

Higher recall with Presidio

For names, locations, organisations, driver licences, NHS numbers, and the long tail of locale-specific identifiers, Microsoft Presidio Analyzer plugs in as an HTTP sidecar.

PiiRedactor__Provider=presidio
PiiRedactor__Presidio__BaseUrl=http://presidio:5002

AdaptiveAPI sends the body to the analyzer, takes the returned span list, and applies replacements in the same way the regex redactor does. If Presidio is unreachable, the redactor falls back to the regex set, so a temporary sidecar outage never opens the door for PII to reach the upstream.

Custom rules

Real systems carry identifiers the built-in detectors do not know about. Internal customer IDs. Product SKUs that look like names. Reservation codes, order numbers, internal ticket IDs. AdaptiveAPI lets you teach the redactor about them.

Shape of a custom rule

A custom rule has four parts:

{
  "name": "customer-id",
  "pattern": "\\bCUST-\\d{6,8}\\b",
  "replacement": "[redacted-customer-id]",
  "flags": ["caseInsensitive"]
}

Where they go

Today, custom rules live in route configuration alongside the rest of the proxy rule. The admin UI is gaining a dedicated PII page that surfaces premade regex packs (US, EU, UK, financial, healthcare), the rule editor, and per-tenant rule libraries that bind to routes by reference.

Tip. Test custom rules against real-shaped data before binding them to a route. A loose pattern (\\d{6}) will redact phone numbers, postal codes, prices, and version strings along with the customer IDs you wanted. Anchor with surrounding context (\\bCUST-, \\bACME-) and validate with the admin UI's regex tester.

What does not get redacted

The redactor only runs on translatable string fields. By design, it does not touch:

That keeps machine-readable parts of your payload working while still catching PII inside human-text content.

Audit and metrics

Every redaction event is recorded as audit metadata: the detector that fired, the count of matches, the replacement label. The original spans are never logged. The Prometheus metric adaptiveapi_pii_redactions_total is labelled by detector and route, so dashboards can show which routes carry the most sensitive traffic.

Combining with style rules and glossaries

PII detection runs first, before glossary substitution and before style rules apply. That ordering matters: glossary terms cannot accidentally contain redacted spans, and style rules cannot leak the original PII into a custom instruction.