PII Redaction Service

A production-ready, reusable service for detecting and anonymizing Personally Identifiable Information (PII) using Microsoft Presidio. This service runs entirely locally using Docker Compose, with no external API dependencies.

Overview

This service provides two core capabilities:

PII Detection (Analyzer): Identifies PII entities in text such as names, email addresses, phone numbers, credit cards, and more.
PII Anonymization (Anonymizer): Replaces detected PII with anonymized placeholders or custom values.

The service consists of two separate containers that work together:

presidio-analyzer: Detects PII entities in text
presidio-anonymizer: Anonymizes detected PII entities

This separation allows for flexible deployment and independent scaling of each component.

Quick Start

Prerequisites

Docker Engine 20.10 or later
Docker Compose v2.0 or later

Installation

Clone or download this repository:

git clone <repository-url>
cd pii-service

Copy the example environment file:
```
cp .env.example .env
```
(Optional) Adjust settings in .env if needed. Defaults are secure and bind to localhost only.
Start the services:
```
docker compose up -d
```
Verify services are running:
```
docker compose ps
```
Check service health:
```
docker compose logs -f
```

Testing the Service

Run the test script to verify everything works:

./scripts/test.sh

Or test manually:

1. Analyze text for PII:

curl -X POST http://127.0.0.1:5002/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "text": "John Smith email john@acme.com",
    "language": "en"
  }'

2. Anonymize text:

curl -X POST http://127.0.0.1:5001/anonymize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "John Smith email john@acme.com",
    "analyzer_results": [
      {
        "entity_type": "PERSON",
        "start": 0,
        "end": 10,
        "score": 0.85
      },
      {
        "entity_type": "EMAIL_ADDRESS",
        "start": 17,
        "end": 30,
        "score": 0.99
      }
    ],
    "operators": {
      "PERSON": {
        "type": "replace",
        "new_value": "{{PERSON}}"
      },
      "EMAIL_ADDRESS": {
        "type": "replace",
        "new_value": "{{EMAIL}}"
      }
    }
  }'

API Usage

Analyzer Endpoint

URL: http://<bind_address>:<analyzer_port>/analyze

Method: POST

Request Body:

{
  "text": "Your text to analyze",
  "language": "en"
}

Response:

[
  {
    "entity_type": "PERSON",
    "start": 0,
    "end": 10,
    "score": 0.85
  },
  {
    "entity_type": "EMAIL_ADDRESS",
    "start": 17,
    "end": 30,
    "score": 0.99
  }
]

Anonymizer Endpoint

URL: http://<bind_address>:<anonymizer_port>/anonymize

Method: POST

Request Body:

{
  "text": "Original text with PII",
  "analyzer_results": [
    {
      "entity_type": "PERSON",
      "start": 0,
      "end": 10,
      "score": 0.85
    }
  ],
  "operators": {
    "PERSON": {
      "type": "replace",
      "new_value": "{{PERSON}}"
    },
    "EMAIL_ADDRESS": {
      "type": "replace",
      "new_value": "{{EMAIL}}"
    }
  }
}

Response:

{
  "text": "{{PERSON}} email {{EMAIL}}",
  "items": [
    {
      "operator": "replace",
      "entity_type": "PERSON",
      "start": 0,
      "end": 12,
      "text": "{{PERSON}}"
    }
  ]
}

Recommended Integration Flow

For backend applications, follow this pattern:

Send text to Analyzer → Receive detected entities
Send text + entities + operators to Anonymizer → Receive sanitized text
Use sanitized text in your application

Example workflow:

# 1. Analyze
analyzer_response = requests.post(
    "http://127.0.0.1:5002/analyze",
    json={"text": user_input, "language": "en"}
)
entities = analyzer_response.json()

# 2. Anonymize
anonymizer_response = requests.post(
    "http://127.0.0.1:5001/anonymize",
    json={
        "text": user_input,
        "analyzer_results": entities,
        "operators": {
            "PERSON": {"type": "replace", "new_value": "{{PERSON}}"},
            "EMAIL_ADDRESS": {"type": "replace", "new_value": "{{EMAIL}}"}
        }
    }
)
sanitized_text = anonymizer_response.json()["text"]

Helper Scripts

The scripts/ directory contains convenience scripts:

scripts/up.sh: Start services in detached mode
scripts/down.sh: Stop and remove services
scripts/status.sh: Show service status and health
scripts/test.sh: Run test requests against both services

Make scripts executable:

chmod +x scripts/*.sh

Configuration

Environment Variables

Edit .env to customize:

PRESIDIO_IMAGE_TAG: Docker image tag (default: latest)
PRESIDIO_BIND_ADDRESS: Network binding address (default: 127.0.0.1)
PRESIDIO_ANALYZER_HOST_PORT: Analyzer port (default: 5002)
PRESIDIO_ANONYMIZER_HOST_PORT: Anonymizer port (default: 5001)

Security Best Practices

⚠️ Important Security Notes:

Default binding is 127.0.0.1 (localhost only) - This is secure by default
Do NOT expose these ports externally without proper authentication
If external access is required, use a reverse proxy (nginx, Traefik) with authentication
Never bind to 0.0.0.0 in production without additional security measures

Resource Requirements

Presidio services are CPU and memory intensive:

Minimum: 2 CPU cores, 4GB RAM
Recommended: 4 CPU cores, 8GB RAM
Container limits: Default Docker limits apply (adjust in docker-compose.yml if needed)

Logging

View logs:

# All services
docker compose logs -f

# Specific service
docker compose logs -f presidio-analyzer
docker compose logs -f presidio-anonymizer

Health Checks

Both services include health checks that verify service availability:

Health check endpoint: /health
Check interval: 30 seconds
Retries: 3
Start period: 40 seconds (allows initial startup time)

Check health status:

docker compose ps

Troubleshooting

Services won't start

Verify Docker and Docker Compose are installed and running
Check port availability: lsof -i :5001 -i :5002
Review logs: docker compose logs

Health checks failing

Wait 40-60 seconds after startup for services to initialize
Check logs for errors: docker compose logs presidio-analyzer
Verify network connectivity: docker network inspect pii-service_presidio-net

Port already in use

Change ports in .env file
Or stop conflicting services

Stopping the Service

docker compose down

Or use the helper script:

./scripts/down.sh

Platform Support

This setup works on:

Linux (amd64)
macOS (Apple Silicon / arm64 and Intel)
Windows (with WSL2 or Docker Desktop)

The official Presidio images support both amd64 and arm64 architectures automatically.

License

See LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PII Redaction Service

Overview

Quick Start

Prerequisites

Installation

Testing the Service

API Usage

Analyzer Endpoint

Anonymizer Endpoint

Recommended Integration Flow

Helper Scripts

Configuration

Environment Variables

Security Best Practices

Resource Requirements

Logging

Health Checks

Troubleshooting

Services won't start

Health checks failing

Port already in use

Stopping the Service

Platform Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

PII Redaction Service

Overview

Quick Start

Prerequisites

Installation

Testing the Service

API Usage

Analyzer Endpoint

Anonymizer Endpoint

Recommended Integration Flow

Helper Scripts

Configuration

Environment Variables

Security Best Practices

Resource Requirements

Logging

Health Checks

Troubleshooting

Services won't start

Health checks failing

Port already in use

Stopping the Service

Platform Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages