A production-ready, reusable service for detecting and anonymizing Personally Identifiable Information (PII) using Microsoft Presidio. This service runs entirely locally using Docker Compose, with no external API dependencies.
This service provides two core capabilities:
- PII Detection (Analyzer): Identifies PII entities in text such as names, email addresses, phone numbers, credit cards, and more.
- PII Anonymization (Anonymizer): Replaces detected PII with anonymized placeholders or custom values.
The service consists of two separate containers that work together:
- presidio-analyzer: Detects PII entities in text
- presidio-anonymizer: Anonymizes detected PII entities
This separation allows for flexible deployment and independent scaling of each component.
- Docker Engine 20.10 or later
- Docker Compose v2.0 or later
-
Clone or download this repository:
git clone <repository-url> cd pii-service
-
Copy the example environment file:
cp .env.example .env
-
(Optional) Adjust settings in
.envif needed. Defaults are secure and bind to localhost only. -
Start the services:
docker compose up -d
-
Verify services are running:
docker compose ps
-
Check service health:
docker compose logs -f
Run the test script to verify everything works:
./scripts/test.shOr test manually:
1. Analyze text for PII:
curl -X POST http://127.0.0.1:5002/analyze \
-H "Content-Type: application/json" \
-d '{
"text": "John Smith email john@acme.com",
"language": "en"
}'2. Anonymize text:
curl -X POST http://127.0.0.1:5001/anonymize \
-H "Content-Type: application/json" \
-d '{
"text": "John Smith email john@acme.com",
"analyzer_results": [
{
"entity_type": "PERSON",
"start": 0,
"end": 10,
"score": 0.85
},
{
"entity_type": "EMAIL_ADDRESS",
"start": 17,
"end": 30,
"score": 0.99
}
],
"operators": {
"PERSON": {
"type": "replace",
"new_value": "{{PERSON}}"
},
"EMAIL_ADDRESS": {
"type": "replace",
"new_value": "{{EMAIL}}"
}
}
}'URL: http://<bind_address>:<analyzer_port>/analyze
Method: POST
Request Body:
{
"text": "Your text to analyze",
"language": "en"
}Response:
[
{
"entity_type": "PERSON",
"start": 0,
"end": 10,
"score": 0.85
},
{
"entity_type": "EMAIL_ADDRESS",
"start": 17,
"end": 30,
"score": 0.99
}
]URL: http://<bind_address>:<anonymizer_port>/anonymize
Method: POST
Request Body:
{
"text": "Original text with PII",
"analyzer_results": [
{
"entity_type": "PERSON",
"start": 0,
"end": 10,
"score": 0.85
}
],
"operators": {
"PERSON": {
"type": "replace",
"new_value": "{{PERSON}}"
},
"EMAIL_ADDRESS": {
"type": "replace",
"new_value": "{{EMAIL}}"
}
}
}Response:
{
"text": "{{PERSON}} email {{EMAIL}}",
"items": [
{
"operator": "replace",
"entity_type": "PERSON",
"start": 0,
"end": 12,
"text": "{{PERSON}}"
}
]
}For backend applications, follow this pattern:
- Send text to Analyzer → Receive detected entities
- Send text + entities + operators to Anonymizer → Receive sanitized text
- Use sanitized text in your application
Example workflow:
# 1. Analyze
analyzer_response = requests.post(
"http://127.0.0.1:5002/analyze",
json={"text": user_input, "language": "en"}
)
entities = analyzer_response.json()
# 2. Anonymize
anonymizer_response = requests.post(
"http://127.0.0.1:5001/anonymize",
json={
"text": user_input,
"analyzer_results": entities,
"operators": {
"PERSON": {"type": "replace", "new_value": "{{PERSON}}"},
"EMAIL_ADDRESS": {"type": "replace", "new_value": "{{EMAIL}}"}
}
}
)
sanitized_text = anonymizer_response.json()["text"]The scripts/ directory contains convenience scripts:
scripts/up.sh: Start services in detached modescripts/down.sh: Stop and remove servicesscripts/status.sh: Show service status and healthscripts/test.sh: Run test requests against both services
Make scripts executable:
chmod +x scripts/*.shEdit .env to customize:
PRESIDIO_IMAGE_TAG: Docker image tag (default:latest)PRESIDIO_BIND_ADDRESS: Network binding address (default:127.0.0.1)PRESIDIO_ANALYZER_HOST_PORT: Analyzer port (default:5002)PRESIDIO_ANONYMIZER_HOST_PORT: Anonymizer port (default:5001)
- Default binding is
127.0.0.1(localhost only) - This is secure by default - Do NOT expose these ports externally without proper authentication
- If external access is required, use a reverse proxy (nginx, Traefik) with authentication
- Never bind to
0.0.0.0in production without additional security measures
Presidio services are CPU and memory intensive:
- Minimum: 2 CPU cores, 4GB RAM
- Recommended: 4 CPU cores, 8GB RAM
- Container limits: Default Docker limits apply (adjust in
docker-compose.ymlif needed)
View logs:
# All services
docker compose logs -f
# Specific service
docker compose logs -f presidio-analyzer
docker compose logs -f presidio-anonymizerBoth services include health checks that verify service availability:
- Health check endpoint:
/health - Check interval: 30 seconds
- Retries: 3
- Start period: 40 seconds (allows initial startup time)
Check health status:
docker compose ps- Verify Docker and Docker Compose are installed and running
- Check port availability:
lsof -i :5001 -i :5002 - Review logs:
docker compose logs
- Wait 40-60 seconds after startup for services to initialize
- Check logs for errors:
docker compose logs presidio-analyzer - Verify network connectivity:
docker network inspect pii-service_presidio-net
- Change ports in
.envfile - Or stop conflicting services
docker compose downOr use the helper script:
./scripts/down.shThis setup works on:
- Linux (amd64)
- macOS (Apple Silicon / arm64 and Intel)
- Windows (with WSL2 or Docker Desktop)
The official Presidio images support both amd64 and arm64 architectures automatically.
See LICENSE file for details.