Skip to content

danfmaia/hybrid-legal-doc-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Hybrid Legal Document Classifier

Hybrid Legal Document Classifier

FastAPI Python 3.10+ Code style: black License: MIT

A production-ready zero-shot legal document classification system powered by Mistral-7B and FAISS vector similarity validation. This hybrid approach combines the reasoning capabilities of Large Language Models with the precision of embedding-based validation to achieve high-accuracy document classification.

πŸš€ Features

  • Zero-Shot Classification: Leverages Mistral-7B for flexible category inference without training data
  • Hybrid Validation: FAISS vector store validation ensures classification accuracy
  • Production-Ready Architecture:
    • FastAPI async endpoints with comprehensive middleware
    • JWT authentication and rate limiting
    • Performance monitoring and logging
  • Current Performance (as of Feb 11, 2025):
    • Response time: ~33.18s per request
    • Classification accuracy: 100% on latest tests
    • GPU utilization: Not optimal
    • Throughput: ~1.8 requests per minute

πŸ—οΈ Technical Architecture

src/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ auth/          # JWT authentication and token handling
β”‚   β”œβ”€β”€ models/        # Core classification models
β”‚   β”œβ”€β”€ middleware/    # Auth and rate limiting
β”‚   └── routers/      # API endpoints and routing
tests/                # Test suite

Performance Characteristics

Development Environment (Local)

  • Hardware Requirements:
    • NVIDIA GPU with 4GB+ VRAM
    • 4+ CPU cores
    • 16GB+ system RAM
  • Expected Performance:
    • Response Time: ~33s average
    • Throughput: 1-2 RPM
    • Classification Accuracy: 100%

Production Environment (AWS)

  1. Minimum Configuration (g5.xlarge):

    • NVIDIA A10G GPU (24GB VRAM)
    • Response Time: 3-4s
    • Throughput: 30-40 RPM per instance
    • Classification Accuracy: 85-90%
  2. Target Configuration (g5.2xlarge or higher):

    • Response Time: ~2s
    • Throughput: 150+ RPM (with load balancing)
    • Classification Accuracy: 90-95%
    • High Availability: 99.9%

Key Components

  1. Classification Engine

    • Mistral-7B integration via Ollama
    • GPU-accelerated inference
    • FAISS similarity validation (0.85 threshold)
    • Response caching (1-hour TTL)
  2. API Layer

    • Async endpoint structure
    • JWT authentication
    • Rate limiting (1000 req/min)
    • Detailed error handling

🚦 Project Status

Current implementation status:

βœ… Core Features

  • Classification engine with Mistral-7B
  • FAISS validation layer
  • Performance monitoring and logging

βœ… API & Security

  • JWT authentication
  • Rate limiting middleware
  • FastAPI async endpoints

βœ… Testing & Quality

  • Basic test coverage
  • Error handling
  • Input validation

🚧 Optimization Goals

  • Response time improvement (Current: ~33s β†’ Target: <2s)
  • GPU utilization optimization
  • Throughput enhancement (Current: ~1.8 RPM β†’ Target: 150 RPM)
  • Production deployment setup

Optimization Strategy:

  1. Performance Enhancement

    • Response caching implementation
    • Batch processing optimization
    • GPU utilization improvements
  2. Production Deployment

    • AWS g5.xlarge/g5.2xlarge setup
    • Load balancing configuration
    • Auto-scaling implementation
  3. Documentation & Monitoring

    • Detailed benchmark reports
    • Performance monitoring dashboards
    • Production deployment guides

See BENCHMARKS.md for detailed performance analysis and optimization plans.

πŸ› οΈ Development Setup

Prerequisites

  • NVIDIA GPU with 4GB+ VRAM
  • 4+ CPU cores
  • 16GB+ system RAM
  • Python 3.10+
  • Conda (recommended for environment management)

Installation

  1. Clone the repository
git clone https://github.com/yourusername/hybrid-llm-classifier.git
cd hybrid-llm-classifier
  1. Set up the environment
# Create and activate environment
make setup

# Install development dependencies
make install-dev
  1. Install and start Ollama
    • Follow instructions at Ollama.ai
    • Pull Mistral model: ollama pull mistral
    • Verify GPU support: nvidia-smi

Development Commands

We use make to standardize development commands. Here are the available targets:

Testing

# Run basic tests
make test

# Run tests with coverage report
make test-coverage

# Run tests in watch mode (auto-rerun on changes)
make test-watch

# Run tests with verbose output
make test-verbose

Performance Testing

# Run full benchmark suite
make benchmark

# Run continuous benchmark monitoring
make benchmark-watch

# Run memory and line profiling
make benchmark-profile

Code Quality

# Format code (black + isort)
make format

# Run all linters
make lint

Development Server

# Start development server with hot reload
make run

Cleanup

# Remove all build artifacts and cache files
make clean

For a complete list of available commands:

make help

Test Coverage

Current test suite includes:

  • Unit tests for core classification
  • Integration tests for API endpoints
  • Authentication and rate limiting tests
  • Performance metrics validation
  • Error handling scenarios
  • Benchmark tests

Test coverage metrics:

  • Line coverage: 90%+
  • Branch coverage: 85%+
  • All critical paths covered

All tests are async-compatible and use pytest-asyncio for proper async testing.

Performance Guidelines

Development Environment:

  • Keep documents under 2,048 tokens
  • Expect ~10s response time
  • 5-10 requests per minute
  • Memory usage: ~3.5GB VRAM

Production Environment:

  • AWS g5.xlarge or higher recommended
  • Load balancing for high throughput
  • Auto-scaling configuration
  • Regional deployment for latency optimization

πŸ“ˆ Performance

See BENCHMARKS.md for detailed performance analysis and optimization experiments.

Development Environment (Current):

  • Average response time: ~33.18s
  • Classification accuracy: 100%
  • GPU utilization: Not optimal
  • Throughput: ~1.8 requests/minute

Production Targets (AWS g5.2xlarge):

  • Response time: <2s
  • Throughput: 150+ RPM
  • Accuracy: 90-95%
  • High availability: 99.9%

Optimization Roadmap:

  1. Response Caching

    • In-memory caching for repeated queries
    • Configurable TTL
    • Cache hit monitoring
  2. Performance Optimization

    • Response streaming
    • Batch processing
    • Memory usage optimization
  3. Infrastructure

    • Docker containerization
    • AWS deployment
    • Load balancing setup
    • Monitoring integration

πŸ›£οΈ Roadmap

  1. Core Functionality (Day 1)

    • Optimize classification engine βœ…
    • Implement caching layer
    • Document performance baselines
  2. API & Performance (Day 2)

    • Security hardening
    • Response optimization
    • Load testing
  3. Production Ready (Day 3)

    • AWS deployment
    • Documentation
    • Final testing

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

While this project is primarily for demonstration purposes, we welcome feedback and suggestions. Please open an issue to discuss potential improvements.


Note: This project is under active development. Core functionality is implemented and tested, with performance optimizations in progress.

About

Production-ready zero-shot legal document classifier using Mistral-7B LLM and FAISS validation, built with FastAPI for high-performance document classification.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors