🏡 AI Real Estate Content Generator

A production-grade, containerized AI system that automates the creation of multilingual, SEO-optimized real estate property listings. This solution leverages Google Gemini 2.5 Flash-Lite for high-speed content generation and LanguageTool for grammar verification, wrapped in a FastAPI backend with an interactive Streamlit frontend.

It includes a robust LLM-as-a-Judge evaluation suite to automatically verify content quality against strict linguistic and structural criteria.

🎯 Problem Statement

Real estate companies manage hundreds of property listings across different cities and need to:

Generate consistent, high-quality content at scale
Support multiple languages and market tones
Ensure strict SEO compliance
Maintain a specific HTML structure for website integration

This system solves these challenges by transforming structured property data (JSON) into complete, ready-to-publish listing descriptions with guaranteed structural compliance.

✨ Key Features

Core Capabilities

📝 Structured Output: Generates 7 distinct content sections (Title, Meta Description, H1, Description, Key Features, Neighborhood, CTA) with proper HTML tagging.
🌍 Multilingual & Regional:
- Languages: English, Portuguese, Spanish, French.
- Regional Localization: US vs. UK English (e.g., "Elevator" vs. "Lift"), PT vs. BR Portuguese.
🎭 Tone Customization: Professional, Friendly, Luxury, or Investor-focused writing styles.
🔍 SEO Optimization: Built-in keyword verification, dynamic city translation, and meta tag length validation.
✅ Quality Assurance: Grammar checking via LanguageTool (Java-based) and logic validation (e.g., ensuring "Studio" is used for 1-bedroom units < 40sqm).

🧪 Automated Evaluation (New)

LLM-as-a-Judge: A built-in testing suite that uses a separate LLM instance to grade generated content against a "Golden Dataset".
Criteria Verification: Automatically checks for correct vocabulary (e.g., "Flat" vs "Apartment"), currency symbols, and tone compliance.

Technical Highlights

Deterministic HTML Generation: LLM outputs pure JSON; Python constructs HTML (100% structural compliance).
Async Processing: FastAPI with async/await for high-throughput batch operations.
Docker-Optimized: LanguageTool models pre-downloaded during image build (no runtime delays).

🛠️ Tech Stack

Component	Technology	Purpose
API Framework	FastAPI 0.121.3	Async REST API with OpenAPI docs
LLM	Google Gemini 2.5 Flash-Lite	Cost-effective, low-latency content generation
Grammar Check	language-tool-python 3.0.0	Multi-language grammar verification
Frontend	Streamlit 1.51.0	Interactive testing UI
Validation	Pydantic 2.12.4	Type-safe request/response models
Translation	deep-translator 1.11.4	Dynamic city name localization
Testing	Google GenAI + Pytest	LLM-as-a-Judge evaluation framework
Containerization	Docker + Docker Compose	Reproducible deployment

📁 Project Structure

real-estate-ai/
├── app/
│   ├── api/
│   │   └── routes.py          # FastAPI endpoints (/generate, /batch)
│   ├── core/
│   │   └── config.py          # Settings & environment variables
│   ├── models/
│   │   └── schemas.py         # Pydantic models (input/output)
│   └── services/
│       ├── generator.py       # LLM orchestration & HTML construction
│       ├── prompt.py          # Dynamic prompt builder with localization
│       └── quality.py         # Grammar checking & SEO validation
├── tests/
│   ├── evaluation_suite.py    # LLM-as-a-Judge runner
│   ├── golden_dataset.json    # Test cases with strict criteria
│   └── evaluation_results.json # Output logs of the last test run
├── frontend.py                # Streamlit UI for interactive testing
├── main.py                    # FastAPI app with lifespan management
├── preload.py                 # Downloads LanguageTool models (Docker build step)
├── docker-compose.yml         # Multi-container orchestration
├── Dockerfile                 # Optimized image with Java + Python
├── requirements.txt           # Python dependencies
└── .env                       # API keys (DO NOT COMMIT)

⚙️ Setup & Installation

Prerequisites

Docker (≥ 20.10) and Docker Compose (≥ 2.0)
Google Gemini API Key – Get one here
At least 2GB RAM allocated to Docker (for LanguageTool)

1️⃣ Clone the Repository

git clone https://github.com/jgurakuqi/real-estate-ai-content-generator.git
cd real-estate-ai-generator

2️⃣ Configure Environment Variables

Create a .env file in the root directory:

# .env
GEMINI_API_KEY=your_actual_api_key_here

3️⃣ Build & Run with Docker (Recommended)

This command builds the image, pre-downloads grammar models, and starts the services.

docker-compose up --build

Service URLs:

Frontend (Streamlit): http://localhost:8501
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs

📖 Usage

Option A: Interactive Web UI (Streamlit)

Navigate to http://localhost:8501.
Select Language (e.g., English) and Region (e.g., 🇬🇧 UK vs 🇺🇸 US).
Fill in property details.
Click "✨ Generate Content".
View results in tabs: Preview, SEO & Quality (Grammar/Keywords), and Raw HTML.

Option B: REST API

Endpoint: POST /api/v1/generate

curl -X POST "http://localhost:8000/api/v1/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Modern Flat in London",
    "location": { "city": "London", "neighborhood": "Shoreditch" },
    "features": { "bedrooms": 2, "bathrooms": 1, "area_sqm": 60, "elevator": true },
    "price": 500000,
    "listing_type": "sale",
    "language": "en",
    "region": "GB",
    "tone": "professional"
  }'

🧪 Automated Evaluation Suite

This project includes a sophisticated LLM-as-a-Judge script to verify that the AI adheres to complex instructions (e.g., "Use British English spelling" or "Don't sound like an investment pitch").

How it Works

Loads test cases from tests/golden_dataset.json.
Generates content using the current system.
Sends the output + grading criteria to a separate LLM instance (Judge).
The Judge evaluates Pass/Fail and provides reasoning.

Running the Tests

You can run the evaluation suite locally (requires Python installed locally):

# Install dependencies locally
pip install -r requirements.txt

# Run the suite
python -m tests.evaluation_suite

Sample Output:

▶️  Running Case: TEST_001_UK_REGION...
   ✅ PASS
▶️  Running Case: TEST_002_LUXURY_TONE...
   ✅ PASS
📊 SUMMARY: 4/4 Tests Passed
🚀 Ready for Production!

Results are saved to tests/evaluation_results.json.

🏗️ Architecture & Design Decisions

1. Deterministic HTML Structure

Problem: LLMs often break HTML tags. Solution: The LLM outputs pure JSON. Python handles the HTML wrapping. This guarantees 100% valid HTML structure every time.

2. Dependency Injection & Preloading

Problem: LanguageTool is heavy (Java-based) and slow to load. Solution:

Models are downloaded during docker build via preload.py.
The QualityChecker class is loaded as a singleton on app startup (lifespan event) and injected into routes.

3. Dynamic Localization Strategy

Problem: "Apartment" (US) vs "Flat" (UK); "Elevator" vs "Lift". Solution: A PromptBuilder injects region-specific vocabulary rules into the system prompt based on the region input, ensuring the LLM adopts the correct persona (e.g., "British Estate Agent").

⚠️ Assumptions & Limitations

Studio Logic: The system automatically detects studios based on bedroom count (1) and keywords/size (<40sqm), adjusting the title to "Studio" or "T0".
Translation: City names are translated via Google Translate API (deep-translator). In a high-load production environment, a static dictionary or caching layer (Redis) would be preferred.
Memory: Requires ~2GB RAM due to the Java-based LanguageTool server running alongside the Python app.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏡 AI Real Estate Content Generator

📖 Table of Contents

🎯 Problem Statement

✨ Key Features

Core Capabilities

🧪 Automated Evaluation (New)

Technical Highlights

🛠️ Tech Stack

📁 Project Structure

⚙️ Setup & Installation

Prerequisites

1️⃣ Clone the Repository

2️⃣ Configure Environment Variables

3️⃣ Build & Run with Docker (Recommended)

📖 Usage

Option A: Interactive Web UI (Streamlit)

Option B: REST API

🧪 Automated Evaluation Suite

How it Works

Running the Tests

🏗️ Architecture & Design Decisions

1. Deterministic HTML Structure

2. Dependency Injection & Preloading

3. Dynamic Localization Strategy

⚠️ Assumptions & Limitations

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
app		app
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
frontend.py		frontend.py
main.py		main.py
preload.py		preload.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🏡 AI Real Estate Content Generator

📖 Table of Contents

🎯 Problem Statement

✨ Key Features

Core Capabilities

🧪 Automated Evaluation (New)

Technical Highlights

🛠️ Tech Stack

📁 Project Structure

⚙️ Setup & Installation

Prerequisites

1️⃣ Clone the Repository

2️⃣ Configure Environment Variables

3️⃣ Build & Run with Docker (Recommended)

📖 Usage

Option A: Interactive Web UI (Streamlit)

Option B: REST API

🧪 Automated Evaluation Suite

How it Works

Running the Tests

🏗️ Architecture & Design Decisions

1. Deterministic HTML Structure

2. Dependency Injection & Preloading

3. Dynamic Localization Strategy

⚠️ Assumptions & Limitations

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages