How does PII redaction work in practice for LLM APIs?
Category:LLM Privacy & Compliance
Quick Answer
PII detector identifies personal data (names, emails), replaces with placeholders like [PERSON_1], sends redacted prompt to LLM, then de-redacts response for user. Use ML-based detection (Microsoft Presidio), not just regex — regex misses misspellings and contextual PII.
Detailed Answer
The flow:
- User types: My name is John Smith, email [email protected]
- PII detector identifies: John Smith (PERSON), [email protected] (EMAIL)
- Redacted prompt sent to LLM: My name is [PERSON_1], email [EMAIL_1]
- LLM responds with placeholders
- De-redaction restores original values for the user
Code example with Microsoft Presidio:
from presidio_analyzer import AnalyzerEngine from presidio_anonymizer import AnonymizerEngine analyzer = AnalyzerEngine() anonymizer = AnonymizerEngine() text = "Call John Smith at 555-0123" results = analyzer.analyze(text=text, language="en") redacted = anonymizer.anonymize(text=text, analyzer_results=results)
Important: Use ML-based detection (not just regex). Regex misses misspelled names, non-standard formats, and contextual PII.


Comments
Loading comments...