A sophisticated RAG (Retrieval-Augmented Generation) Q&A Assistant built with Streamlit, LangChain, and LangGraph for answering questions about AI regulations in France and Europe.
It processes PDF documents, creates embeddings, and uses Gemma 2 2B and Gemma 3 270M via Ollama to generate contextual answers with ultra-fast CPU-optimized inference, structured output, and intelligent question analysis.
- PDF Document Processing: Automatically loads and processes PDF documents from the
legal_docs/folder - Advanced RAG Architecture: Uses LangChain and LangGraph for sophisticated document retrieval and generation
- Intelligent Question Analysis: Reformulates questions and determines legal relevance before processing
- Multi-Model Architecture: Uses Gemma 2 2B for analysis and final answers, Gemma 3 270M for tool calls
- Structured Output: Pydantic models ensure robust JSON parsing and validation
- Vectorstore Caching: Intelligent caching system to speed up document loading and embedding creation
- LangSmith Integration: Built-in tracing and monitoring for performance optimization and debugging
- Ollama Integration: Uses Ollama with specialized models for ultra-fast CPU inference and tool calling
- Local LLM: Runs entirely locally with Gemma models - no external API dependencies
- Agent-Based Processing: Uses LangGraph agents with conditional routing for intelligent question answering
- Multilingual Support: French-English cross-lingual queries with automatic language detection
- Enhanced Legal Responses: Structured legal answers with direct responses, legal basis, conditions, and consequences
- User-friendly Interface: Clean Streamlit interface with cache management and settings
- CI/CD Pipeline: Includes linting, testing, and automated workflows
Gemma 2 2B via Ollama (Primary Model)
- Model Size: ~1.6GB (optimized by Ollama)
- RAM Required: β3GB
- Features: Ultra-fast inference, excellent JSON generation, RAG optimization, Google's latest architecture
- Performance: Superior quality and speed for legal document analysis and Q&A on CPU
- Usage: Question analysis, reformulation, and final answer generation
- Local Processing: No external API calls required
Gemma 3 270M via Ollama (Tool Model)
- Model Size: ~291MB (ultra-lightweight)
- RAM Required: β1GB
- Features: Ultra-fast inference, optimized for tool calls and document retrieval
- Performance: Lightning-fast processing for structured output and JSON generation
- Usage: Tool calls and document retrieval queries
- Local Processing: No external API calls required
Embeddings Multilingues
- Model:
distiluse-base-multilingual-cased - Size: ~135MB
- Languages: French, English, German, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Turkish, Arabic, Chinese, Japanese, Korean, Hindi
- Cross-lingual Performance: Excellent FR-EN semantic matching
- Features: Optimized for multilingual document retrieval and cross-language Q&A
β‘ Performance Optimized: This quantized model prioritizes speed and efficiency for optimal Q&A performance.
- Speed: Ultra-fast inference with optimized response times
- Memory: Efficient RAM usage with quantized model
- Quality: Excellent for legal document analysis and Q&A
- Multilingual: Support for French-English cross-lingual queries
- Use Case: Perfect for quick legal document queries and fast responses
This is a stateless Q&A system with intelligent question analysis and routing:
Complete LangGraph workflow showing the multi-model architecture and intelligent routing
-
Question Analysis (Gemma 2 2B)
- Reformulates questions for optimal document retrieval
- Determines legal relevance and scope
- Identifies specific legal domains
-
Conditional Routing
- Legal questions β Document retrieval and analysis
- Non-legal questions β General response
-
Document Retrieval (Gemma 3 270M)
- Structured tool calls with Pydantic validation
- Multi-question support for comprehensive search
- Optimized query generation
-
Final Answer Generation (Gemma 2 2B)
- Structured legal responses with:
- Direct answer (LΓ©gal/IllΓ©gal/Partiellement lΓ©gal)
- Legal basis with specific references
- Conditions and requirements
- Practical consequences
- Recommendations
- Structured legal responses with:
- Optimal Performance: Intelligent routing optimizes model usage
- Consistent Results: Each answer is based solely on the documents, not conversation history
- Resource Efficiency: Specialized models for different tasks
- Reliability: Structured output prevents parsing errors
- Enhanced Accuracy: Multi-step analysis improves response quality
Note: This is not a conversational chatbot but rather an intelligent document-based legal analysis assistant.
Option 1: Local Development
- Install Ollama: Download from ollama.ai
- Pull the models:
ollama pull gemma2:2b ollama pull gemma3:270m
Option 2: Docker Deployment (Recommended)
- Install Docker: Download from docker.com
- No additional setup needed - Ollama and models are included in the container
git clone https://github.com/<USER>/french-ai-law-guru.git
cd french-ai-law-gurupython -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windowspip install --upgrade pip
pip install -r requirements.txt
pip install -e .Create a .env file in the project root:
# Hugging Face Configuration
HF_TOKEN=your_huggingface_token
# Ollama configuration (no API key needed for local models)
OLLAMA_MODEL_MAIN=gemma2:2b
OLLAMA_MODEL_TOOL=gemma3:270m
# LangSmith Configuration (Optional - for tracing and monitoring)
LANGCHAIN_API_KEY=your_langsmith_api_key_here
LANGCHAIN_PROJECT=faq-chatbot
LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT=https://api.smith.langchain.comThis application now supports French-English cross-lingual queries:
- Questions in French β Documents in English β
- Questions in English β Documents in French β
- Mixed language documents β
- Automatic language matching β Responses in the same language as questions β
If upgrading from the previous version, you need to regenerate the vectorstore:
# Delete old vectorstore to force re-embedding with multilingual model
rm -rf chroma_db/The new embedding model will be downloaded automatically on first run (~135MB).
French Question with English Documents:
- Question: "Quelles sont les exigences GDPR pour les applications d'IA ?"
- Answer: "Les exigences GDPR pour les applications d'IA incluent..." (in French)
English Question with French Documents:
- Question: "What are the AI Act requirements for transparency?"
- Answer: "The AI Act requirements for transparency include..." (in English)
Automatic Language Detection:
- The system automatically detects the language of your question and responds in the same language
- Works with French, English, and other supported languages
Hugging Face Token:
- Go to Hugging Face
- Create a new token with "Read" permissions
- Add it to your
.envfile
LangSmith API Key (Optional):
- Go to LangSmith
- Sign up or log in
- Go to Settings > API Keys
- Create a new API key
- Add it to your
.envfile
Note: LangSmith integration is optional but recommended for monitoring performance and debugging issues.
Run the app using the provided launcher:
python run_streamlit.pyOr run directly with Streamlit:
streamlit run legal_ai_assistant/app.py# Clone and run
git clone https://github.com/<USER>/french-ai-law-guru.git
cd french-ai-law-guru
docker-compose up --buildThe application will be available at http://localhost:8501
# Start the application
docker-compose up --build
# Run in background
docker-compose up -d --build
# View logs
docker-compose logs -f
# Stop the application
docker-compose down- β Ollama pre-installed with Gemma models
- β Model persistence between container restarts
- β Automatic model download on first run
- β Health checks for monitoring
- β Resource limits configured (8GB RAM)
- β
Full project mount to
/workfor live development - β Data persistence with named volumes
Open your browser and go to: http://localhost:8501
- The app will automatically process PDF documents in the
legal_docs/folder - Embeddings will be created and cached for faster subsequent runs
- If LangSmith is configured, you'll see tracing information in the sidebar
- Place your PDF documents in the
legal_docs/folder - The app supports multiple PDF files
- Documents are automatically chunked and processed
- Use the "Clear Cache" button in the sidebar to refresh embeddings when documents change
Run tests:
pytestRun linter:
ruff check .french-ai-law-guru/
βββ legal_ai_assistant/ # Main application package
β βββ __init__.py # Package initialization
β βββ app.py # Main Streamlit application
β βββ agents.py # LangGraph agent definitions and workflow
β βββ chat_handler.py # Question processing and answer generation
β βββ local_models.py # Ollama model client configuration
β βββ config.py # Application configuration
β βββ utils.py # Document processing, embeddings, and caching utilities
β
βββ legal_docs/ # PDF documents for processing
β βββ CELEX_32001L0029_EN_TXT.pdf # EU Directives and Regulations
β βββ CELEX_32016R0679_EN_TXT.pdf # GDPR Regulation
β βββ CELEX_32019L0790_EN_TXT.pdf # EU AI Act
β βββ CELEX_32022R1925_EN_TXT.pdf # Additional EU Regulations
β βββ CELEX_32022R2065_EN_TXT.pdf # EU Regulations
β βββ CELEX_32024L2853_EN_TXT.pdf # Latest EU AI Regulations
β βββ CELEX_52022PC0165_EN_TXT.pdf # EU Proposals
β βββ CELEX_52022PC0496_EN_TXT.pdf # EU Proposals
β βββ joe_*.pdf # French Official Journal documents
β βββ OJ_L_*.pdf # Official Journal L series documents
β
βββ chroma_db/ # ChromaDB persistent storage (auto-created)
β
βββ tests/ # Test suite
β βββ __init__.py # Test package initialization
β βββ test_utils.py # Unit tests for utility functions
β
βββ notebooks/ # Jupyter notebooks
β βββ PDFembedding.ipynb # PDF embedding exploration notebook
β
βββ .env # Environment variables (create this)
βββ .gitignore # Git ignore rules
βββ Dockerfile # Docker configuration
βββ docker-compose.yml # Docker Compose configuration
βββ requirements.txt # Python dependencies
βββ pyproject.toml # Project configuration
βββ run_streamlit.py # Streamlit app launcher
βββ README.md # This file
app.py: Main Streamlit interface with document processing and chat functionalityagents.py: LangGraph agent implementation with question analysis, routing, and structured outputchat_handler.py: Handles question processing and answer generationutils.py: Core utilities for PDF processing, embeddings, caching, and token calculationlocal_models.py: Ollama model client configuration for Gemma modelsconfig.py: Centralized configuration including multi-model LLM and embedding settingschroma_db/: Stores processed vectorstores for faster loadinglegal_docs/: Contains EU and French legal PDF documents for processing
This chatbot includes built-in LangSmith integration for monitoring, tracing, and debugging. LangSmith provides valuable insights into the RAG pipeline performance.
- Performance Monitoring: Track execution times for each step in the RAG pipeline
- Token Usage Tracking: Monitor API costs and usage patterns
- Error Debugging: Identify where issues occur in the processing chain
- Optimization Insights: Find bottlenecks and optimization opportunities
- Get LangSmith API Key: Visit LangSmith and create an API key
- Configure Environment: Add LangSmith variables to your
.envfile (see Environment Variables section) - Monitor Performance: Check the LangSmith dashboard for real-time traces and metrics
- URL: https://smith.langchain.com/projects
- Project:
faq-chatbot(configurable viaLANGCHAIN_PROJECT) - Features: View traces, metrics, token usage, and error logs
For detailed setup instructions, see the configuration files in the project.
π‘ Tip: The app automatically uses the multi-model configuration in
legal_ai_assistant/config.py. You can modify the model settings there if needed.
This application is optimized for high-quality inference using a multi-model architecture:
| Aspect | Gemma 2 2B | Gemma 3 270M |
|---|---|---|
| Inference Time | 15-25 seconds | 2-5 seconds |
| Model Size | 1.6GB | 291MB |
| RAM Required | β3GB | β1GB |
| Context Window | 2048 tokens | 512 tokens |
| Response Quality | Superior for Q&A with citations | Optimized for tool calls |
| Use Case | Analysis & final answers | Tool calls & retrieval |
- Intelligent routing: Questions are analyzed and routed to appropriate models
- Structured output: Pydantic models ensure robust JSON parsing and validation
- Comprehensive responses: Optimized for detailed Q&A with full legal citations
- Memory efficient: Multi-model architecture with optimized RAM usage
- CPU optimized: Configured for maximum performance on CPU-only systems
- Context awareness: Dynamic token calculation for optimal response length
- Enhanced legal analysis: Structured legal responses with direct answers, legal basis, and consequences
- Multi-question support: Tool calls can handle multiple reformulated questions simultaneously
The project includes Docker configuration for easy deployment:
# Build the Docker image
docker build -t french-ai-law-guru .
# Run the container
docker run -p 8501:8501 --env-file .env french-ai-law-guru- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Developed by drikseyy π
