Hybrid Legal Document Classifier

A production-ready zero-shot legal document classification system powered by Mistral-7B and FAISS vector similarity validation. This hybrid approach combines the reasoning capabilities of Large Language Models with the precision of embedding-based validation to achieve high-accuracy document classification.

🚀 Features

Zero-Shot Classification: Leverages Mistral-7B for flexible category inference without training data
Hybrid Validation: FAISS vector store validation ensures classification accuracy
Production-Ready Architecture:
- FastAPI async endpoints with comprehensive middleware
- JWT authentication and rate limiting
- Performance monitoring and logging
Current Performance (as of Feb 11, 2025):
- Response time: ~33.18s per request
- Classification accuracy: 100% on latest tests
- GPU utilization: Not optimal
- Throughput: ~1.8 requests per minute

🏗️ Technical Architecture

src/
├── app/
│   ├── auth/          # JWT authentication and token handling
│   ├── models/        # Core classification models
│   ├── middleware/    # Auth and rate limiting
│   └── routers/      # API endpoints and routing
tests/                # Test suite

Performance Characteristics

Development Environment (Local)

Hardware Requirements:
- NVIDIA GPU with 4GB+ VRAM
- 4+ CPU cores
- 16GB+ system RAM
Expected Performance:
- Response Time: ~33s average
- Throughput: 1-2 RPM
- Classification Accuracy: 100%

Production Environment (AWS)

Minimum Configuration (g5.xlarge):
- NVIDIA A10G GPU (24GB VRAM)
- Response Time: 3-4s
- Throughput: 30-40 RPM per instance
- Classification Accuracy: 85-90%
Target Configuration (g5.2xlarge or higher):
- Response Time: ~2s
- Throughput: 150+ RPM (with load balancing)
- Classification Accuracy: 90-95%
- High Availability: 99.9%

Key Components

Classification Engine
- Mistral-7B integration via Ollama
- GPU-accelerated inference
- FAISS similarity validation (0.85 threshold)
- Response caching (1-hour TTL)
API Layer
- Async endpoint structure
- JWT authentication
- Rate limiting (1000 req/min)
- Detailed error handling

🚦 Project Status

Current implementation status:

✅ Core Features

Classification engine with Mistral-7B
FAISS validation layer
Performance monitoring and logging

✅ API & Security

JWT authentication
Rate limiting middleware
FastAPI async endpoints

✅ Testing & Quality

Basic test coverage
Error handling
Input validation

🚧 Optimization Goals

Response time improvement (Current: ~33s → Target: <2s)
GPU utilization optimization
Throughput enhancement (Current: ~1.8 RPM → Target: 150 RPM)
Production deployment setup

Optimization Strategy:

Performance Enhancement
- Response caching implementation
- Batch processing optimization
- GPU utilization improvements
Production Deployment
- AWS g5.xlarge/g5.2xlarge setup
- Load balancing configuration
- Auto-scaling implementation
Documentation & Monitoring
- Detailed benchmark reports
- Performance monitoring dashboards
- Production deployment guides

See BENCHMARKS.md for detailed performance analysis and optimization plans.

🛠️ Development Setup

Prerequisites

NVIDIA GPU with 4GB+ VRAM
4+ CPU cores
16GB+ system RAM
Python 3.10+
Conda (recommended for environment management)

Installation

Clone the repository

git clone https://github.com/yourusername/hybrid-llm-classifier.git
cd hybrid-llm-classifier

Set up the environment

# Create and activate environment
make setup

# Install development dependencies
make install-dev

Install and start Ollama
- Follow instructions at Ollama.ai
- Pull Mistral model: ollama pull mistral
- Verify GPU support: nvidia-smi

Development Commands

We use make to standardize development commands. Here are the available targets:

Testing

# Run basic tests
make test

# Run tests with coverage report
make test-coverage

# Run tests in watch mode (auto-rerun on changes)
make test-watch

# Run tests with verbose output
make test-verbose

Performance Testing

# Run full benchmark suite
make benchmark

# Run continuous benchmark monitoring
make benchmark-watch

# Run memory and line profiling
make benchmark-profile

Code Quality

# Format code (black + isort)
make format

# Run all linters
make lint

Development Server

# Start development server with hot reload
make run

Cleanup

# Remove all build artifacts and cache files
make clean

For a complete list of available commands:

make help

Test Coverage

Current test suite includes:

Unit tests for core classification
Integration tests for API endpoints
Authentication and rate limiting tests
Performance metrics validation
Error handling scenarios
Benchmark tests

Test coverage metrics:

Line coverage: 90%+
Branch coverage: 85%+
All critical paths covered

All tests are async-compatible and use pytest-asyncio for proper async testing.

Performance Guidelines

Development Environment:

Keep documents under 2,048 tokens
Expect ~10s response time
5-10 requests per minute
Memory usage: ~3.5GB VRAM

Production Environment:

AWS g5.xlarge or higher recommended
Load balancing for high throughput
Auto-scaling configuration
Regional deployment for latency optimization

📈 Performance

See BENCHMARKS.md for detailed performance analysis and optimization experiments.

Development Environment (Current):

Average response time: ~33.18s
Classification accuracy: 100%
GPU utilization: Not optimal
Throughput: ~1.8 requests/minute

Production Targets (AWS g5.2xlarge):

Response time: <2s
Throughput: 150+ RPM
Accuracy: 90-95%
High availability: 99.9%

Optimization Roadmap:

Response Caching
- In-memory caching for repeated queries
- Configurable TTL
- Cache hit monitoring
Performance Optimization
- Response streaming
- Batch processing
- Memory usage optimization
Infrastructure
- Docker containerization
- AWS deployment
- Load balancing setup
- Monitoring integration

🛣️ Roadmap

Core Functionality (Day 1)
- Optimize classification engine ✅
- Implement caching layer
- Document performance baselines
API & Performance (Day 2)
- Security hardening
- Response optimization
- Load testing
Production Ready (Day 3)
- AWS deployment
- Documentation
- Final testing

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

While this project is primarily for demonstration purposes, we welcome feedback and suggestions. Please open an issue to discuss potential improvements.

Note: This project is under active development. Core functionality is implemented and tested, with performance optimizations in progress.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
benchmark_results		benchmark_results
scripts		scripts
src/app		src/app
tests		tests
.cursorrules		.cursorrules
.cursorrules_dev		.cursorrules_dev
.gitignore		.gitignore
.pylintrc		.pylintrc
BENCHMARKS.md		BENCHMARKS.md
Hybrid LLM Classifier - Git Ingest.md		Hybrid LLM Classifier - Git Ingest.md
LICENSE		LICENSE
Makefile		Makefile
Modelfile		Modelfile
README.md		README.md
environment.yml		environment.yml
gh-cover.jpeg		gh-cover.jpeg
logging_config.json		logging_config.json
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hybrid Legal Document Classifier

🚀 Features

🏗️ Technical Architecture

Performance Characteristics

Development Environment (Local)

Production Environment (AWS)

Key Components

🚦 Project Status

🛠️ Development Setup

Prerequisites

Installation

Development Commands

Testing

Performance Testing

Code Quality

Development Server

Cleanup

Test Coverage

Performance Guidelines

📈 Performance

🛣️ Roadmap

📄 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hybrid Legal Document Classifier

🚀 Features

🏗️ Technical Architecture

Performance Characteristics

Development Environment (Local)

Production Environment (AWS)

Key Components

🚦 Project Status

🛠️ Development Setup

Prerequisites

Installation

Development Commands

Testing

Performance Testing

Code Quality

Development Server

Cleanup

Test Coverage

Performance Guidelines

📈 Performance

🛣️ Roadmap

📄 License

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages