Skip to content

Kolot-lu/pii-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PII Redaction Service

A production-ready, reusable service for detecting and anonymizing Personally Identifiable Information (PII) using Microsoft Presidio. This service runs entirely locally using Docker Compose, with no external API dependencies.

Overview

This service provides two core capabilities:

  1. PII Detection (Analyzer): Identifies PII entities in text such as names, email addresses, phone numbers, credit cards, and more.
  2. PII Anonymization (Anonymizer): Replaces detected PII with anonymized placeholders or custom values.

The service consists of two separate containers that work together:

  • presidio-analyzer: Detects PII entities in text
  • presidio-anonymizer: Anonymizes detected PII entities

This separation allows for flexible deployment and independent scaling of each component.

Quick Start

Prerequisites

  • Docker Engine 20.10 or later
  • Docker Compose v2.0 or later

Installation

  1. Clone or download this repository:

    git clone <repository-url>
    cd pii-service
  2. Copy the example environment file:

    cp .env.example .env
  3. (Optional) Adjust settings in .env if needed. Defaults are secure and bind to localhost only.

  4. Start the services:

    docker compose up -d
  5. Verify services are running:

    docker compose ps
  6. Check service health:

    docker compose logs -f

Testing the Service

Run the test script to verify everything works:

./scripts/test.sh

Or test manually:

1. Analyze text for PII:

curl -X POST http://127.0.0.1:5002/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "text": "John Smith email john@acme.com",
    "language": "en"
  }'

2. Anonymize text:

curl -X POST http://127.0.0.1:5001/anonymize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "John Smith email john@acme.com",
    "analyzer_results": [
      {
        "entity_type": "PERSON",
        "start": 0,
        "end": 10,
        "score": 0.85
      },
      {
        "entity_type": "EMAIL_ADDRESS",
        "start": 17,
        "end": 30,
        "score": 0.99
      }
    ],
    "operators": {
      "PERSON": {
        "type": "replace",
        "new_value": "{{PERSON}}"
      },
      "EMAIL_ADDRESS": {
        "type": "replace",
        "new_value": "{{EMAIL}}"
      }
    }
  }'

API Usage

Analyzer Endpoint

URL: http://<bind_address>:<analyzer_port>/analyze

Method: POST

Request Body:

{
  "text": "Your text to analyze",
  "language": "en"
}

Response:

[
  {
    "entity_type": "PERSON",
    "start": 0,
    "end": 10,
    "score": 0.85
  },
  {
    "entity_type": "EMAIL_ADDRESS",
    "start": 17,
    "end": 30,
    "score": 0.99
  }
]

Anonymizer Endpoint

URL: http://<bind_address>:<anonymizer_port>/anonymize

Method: POST

Request Body:

{
  "text": "Original text with PII",
  "analyzer_results": [
    {
      "entity_type": "PERSON",
      "start": 0,
      "end": 10,
      "score": 0.85
    }
  ],
  "operators": {
    "PERSON": {
      "type": "replace",
      "new_value": "{{PERSON}}"
    },
    "EMAIL_ADDRESS": {
      "type": "replace",
      "new_value": "{{EMAIL}}"
    }
  }
}

Response:

{
  "text": "{{PERSON}} email {{EMAIL}}",
  "items": [
    {
      "operator": "replace",
      "entity_type": "PERSON",
      "start": 0,
      "end": 12,
      "text": "{{PERSON}}"
    }
  ]
}

Recommended Integration Flow

For backend applications, follow this pattern:

  1. Send text to Analyzer → Receive detected entities
  2. Send text + entities + operators to Anonymizer → Receive sanitized text
  3. Use sanitized text in your application

Example workflow:

# 1. Analyze
analyzer_response = requests.post(
    "http://127.0.0.1:5002/analyze",
    json={"text": user_input, "language": "en"}
)
entities = analyzer_response.json()

# 2. Anonymize
anonymizer_response = requests.post(
    "http://127.0.0.1:5001/anonymize",
    json={
        "text": user_input,
        "analyzer_results": entities,
        "operators": {
            "PERSON": {"type": "replace", "new_value": "{{PERSON}}"},
            "EMAIL_ADDRESS": {"type": "replace", "new_value": "{{EMAIL}}"}
        }
    }
)
sanitized_text = anonymizer_response.json()["text"]

Helper Scripts

The scripts/ directory contains convenience scripts:

  • scripts/up.sh: Start services in detached mode
  • scripts/down.sh: Stop and remove services
  • scripts/status.sh: Show service status and health
  • scripts/test.sh: Run test requests against both services

Make scripts executable:

chmod +x scripts/*.sh

Configuration

Environment Variables

Edit .env to customize:

  • PRESIDIO_IMAGE_TAG: Docker image tag (default: latest)
  • PRESIDIO_BIND_ADDRESS: Network binding address (default: 127.0.0.1)
  • PRESIDIO_ANALYZER_HOST_PORT: Analyzer port (default: 5002)
  • PRESIDIO_ANONYMIZER_HOST_PORT: Anonymizer port (default: 5001)

Security Best Practices

⚠️ Important Security Notes:

  • Default binding is 127.0.0.1 (localhost only) - This is secure by default
  • Do NOT expose these ports externally without proper authentication
  • If external access is required, use a reverse proxy (nginx, Traefik) with authentication
  • Never bind to 0.0.0.0 in production without additional security measures

Resource Requirements

Presidio services are CPU and memory intensive:

  • Minimum: 2 CPU cores, 4GB RAM
  • Recommended: 4 CPU cores, 8GB RAM
  • Container limits: Default Docker limits apply (adjust in docker-compose.yml if needed)

Logging

View logs:

# All services
docker compose logs -f

# Specific service
docker compose logs -f presidio-analyzer
docker compose logs -f presidio-anonymizer

Health Checks

Both services include health checks that verify service availability:

  • Health check endpoint: /health
  • Check interval: 30 seconds
  • Retries: 3
  • Start period: 40 seconds (allows initial startup time)

Check health status:

docker compose ps

Troubleshooting

Services won't start

  • Verify Docker and Docker Compose are installed and running
  • Check port availability: lsof -i :5001 -i :5002
  • Review logs: docker compose logs

Health checks failing

  • Wait 40-60 seconds after startup for services to initialize
  • Check logs for errors: docker compose logs presidio-analyzer
  • Verify network connectivity: docker network inspect pii-service_presidio-net

Port already in use

  • Change ports in .env file
  • Or stop conflicting services

Stopping the Service

docker compose down

Or use the helper script:

./scripts/down.sh

Platform Support

This setup works on:

  • Linux (amd64)
  • macOS (Apple Silicon / arm64 and Intel)
  • Windows (with WSL2 or Docker Desktop)

The official Presidio images support both amd64 and arm64 architectures automatically.

License

See LICENSE file for details.

About

Production-ready PII detection and anonymization service using Microsoft Presidio. Runs with Docker Compose, no external APIs required.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages