Skip to content

jgurakuqi/real-estate-ai-content-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏑 AI Real Estate Content Generator

Python 3.12 FastAPI Docker License: MIT

A production-grade, containerized AI system that automates the creation of multilingual, SEO-optimized real estate property listings. This solution leverages Google Gemini 2.5 Flash-Lite for high-speed content generation and LanguageTool for grammar verification, wrapped in a FastAPI backend with an interactive Streamlit frontend.

It includes a robust LLM-as-a-Judge evaluation suite to automatically verify content quality against strict linguistic and structural criteria.


πŸ“– Table of Contents


🎯 Problem Statement

Real estate companies manage hundreds of property listings across different cities and need to:

  • Generate consistent, high-quality content at scale
  • Support multiple languages and market tones
  • Ensure strict SEO compliance
  • Maintain a specific HTML structure for website integration

This system solves these challenges by transforming structured property data (JSON) into complete, ready-to-publish listing descriptions with guaranteed structural compliance.


✨ Key Features

Core Capabilities

  • πŸ“ Structured Output: Generates 7 distinct content sections (Title, Meta Description, H1, Description, Key Features, Neighborhood, CTA) with proper HTML tagging.
  • 🌍 Multilingual & Regional:
    • Languages: English, Portuguese, Spanish, French.
    • Regional Localization: US vs. UK English (e.g., "Elevator" vs. "Lift"), PT vs. BR Portuguese.
  • 🎭 Tone Customization: Professional, Friendly, Luxury, or Investor-focused writing styles.
  • πŸ” SEO Optimization: Built-in keyword verification, dynamic city translation, and meta tag length validation.
  • βœ… Quality Assurance: Grammar checking via LanguageTool (Java-based) and logic validation (e.g., ensuring "Studio" is used for 1-bedroom units < 40sqm).

πŸ§ͺ Automated Evaluation (New)

  • LLM-as-a-Judge: A built-in testing suite that uses a separate LLM instance to grade generated content against a "Golden Dataset".
  • Criteria Verification: Automatically checks for correct vocabulary (e.g., "Flat" vs "Apartment"), currency symbols, and tone compliance.

Technical Highlights

  • Deterministic HTML Generation: LLM outputs pure JSON; Python constructs HTML (100% structural compliance).
  • Async Processing: FastAPI with async/await for high-throughput batch operations.
  • Docker-Optimized: LanguageTool models pre-downloaded during image build (no runtime delays).

πŸ› οΈ Tech Stack

Component Technology Purpose
API Framework FastAPI 0.121.3 Async REST API with OpenAPI docs
LLM Google Gemini 2.5 Flash-Lite Cost-effective, low-latency content generation
Grammar Check language-tool-python 3.0.0 Multi-language grammar verification
Frontend Streamlit 1.51.0 Interactive testing UI
Validation Pydantic 2.12.4 Type-safe request/response models
Translation deep-translator 1.11.4 Dynamic city name localization
Testing Google GenAI + Pytest LLM-as-a-Judge evaluation framework
Containerization Docker + Docker Compose Reproducible deployment

πŸ“ Project Structure

real-estate-ai/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   └── routes.py          # FastAPI endpoints (/generate, /batch)
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   └── config.py          # Settings & environment variables
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── schemas.py         # Pydantic models (input/output)
β”‚   └── services/
β”‚       β”œβ”€β”€ generator.py       # LLM orchestration & HTML construction
β”‚       β”œβ”€β”€ prompt.py          # Dynamic prompt builder with localization
β”‚       └── quality.py         # Grammar checking & SEO validation
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ evaluation_suite.py    # LLM-as-a-Judge runner
β”‚   β”œβ”€β”€ golden_dataset.json    # Test cases with strict criteria
β”‚   └── evaluation_results.json # Output logs of the last test run
β”œβ”€β”€ frontend.py                # Streamlit UI for interactive testing
β”œβ”€β”€ main.py                    # FastAPI app with lifespan management
β”œβ”€β”€ preload.py                 # Downloads LanguageTool models (Docker build step)
β”œβ”€β”€ docker-compose.yml         # Multi-container orchestration
β”œβ”€β”€ Dockerfile                 # Optimized image with Java + Python
β”œβ”€β”€ requirements.txt           # Python dependencies
└── .env                       # API keys (DO NOT COMMIT)

βš™οΈ Setup & Installation

Prerequisites

  • Docker (β‰₯ 20.10) and Docker Compose (β‰₯ 2.0)
  • Google Gemini API Key – Get one here
  • At least 2GB RAM allocated to Docker (for LanguageTool)

1️⃣ Clone the Repository

git clone https://github.com/jgurakuqi/real-estate-ai-content-generator.git
cd real-estate-ai-generator

2️⃣ Configure Environment Variables

Create a .env file in the root directory:

# .env
GEMINI_API_KEY=your_actual_api_key_here

3️⃣ Build & Run with Docker (Recommended)

This command builds the image, pre-downloads grammar models, and starts the services.

docker-compose up --build

Service URLs:


πŸ“– Usage

Option A: Interactive Web UI (Streamlit)

  1. Navigate to http://localhost:8501.
  2. Select Language (e.g., English) and Region (e.g., πŸ‡¬πŸ‡§ UK vs πŸ‡ΊπŸ‡Έ US).
  3. Fill in property details.
  4. Click "✨ Generate Content".
  5. View results in tabs: Preview, SEO & Quality (Grammar/Keywords), and Raw HTML.

Option B: REST API

Endpoint: POST /api/v1/generate

curl -X POST "http://localhost:8000/api/v1/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Modern Flat in London",
    "location": { "city": "London", "neighborhood": "Shoreditch" },
    "features": { "bedrooms": 2, "bathrooms": 1, "area_sqm": 60, "elevator": true },
    "price": 500000,
    "listing_type": "sale",
    "language": "en",
    "region": "GB",
    "tone": "professional"
  }'

πŸ§ͺ Automated Evaluation Suite

This project includes a sophisticated LLM-as-a-Judge script to verify that the AI adheres to complex instructions (e.g., "Use British English spelling" or "Don't sound like an investment pitch").

How it Works

  1. Loads test cases from tests/golden_dataset.json.
  2. Generates content using the current system.
  3. Sends the output + grading criteria to a separate LLM instance (Judge).
  4. The Judge evaluates Pass/Fail and provides reasoning.

Running the Tests

You can run the evaluation suite locally (requires Python installed locally):

# Install dependencies locally
pip install -r requirements.txt

# Run the suite
python -m tests.evaluation_suite

Sample Output:

▢️  Running Case: TEST_001_UK_REGION...
   βœ… PASS
▢️  Running Case: TEST_002_LUXURY_TONE...
   βœ… PASS
πŸ“Š SUMMARY: 4/4 Tests Passed
πŸš€ Ready for Production!

Results are saved to tests/evaluation_results.json.


πŸ—οΈ Architecture & Design Decisions

1. Deterministic HTML Structure

Problem: LLMs often break HTML tags. Solution: The LLM outputs pure JSON. Python handles the HTML wrapping. This guarantees 100% valid HTML structure every time.

2. Dependency Injection & Preloading

Problem: LanguageTool is heavy (Java-based) and slow to load. Solution:

  • Models are downloaded during docker build via preload.py.
  • The QualityChecker class is loaded as a singleton on app startup (lifespan event) and injected into routes.

3. Dynamic Localization Strategy

Problem: "Apartment" (US) vs "Flat" (UK); "Elevator" vs "Lift". Solution: A PromptBuilder injects region-specific vocabulary rules into the system prompt based on the region input, ensuring the LLM adopts the correct persona (e.g., "British Estate Agent").


⚠️ Assumptions & Limitations

  • Studio Logic: The system automatically detects studios based on bedroom count (1) and keywords/size (<40sqm), adjusting the title to "Studio" or "T0".
  • Translation: City names are translated via Google Translate API (deep-translator). In a high-load production environment, a static dictionary or caching layer (Redis) would be preferred.
  • Memory: Requires ~2GB RAM due to the Java-based LanguageTool server running alongside the Python app.

πŸ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

About

A production-grade, containerized AI system that automates the creation of multilingual, SEO-optimized real estate property listings. This solution leverages Google Gemini 2.5 Flash for content generation and LanguageTool for grammar verification, wrapped in a FastAPI backend with an interactive Streamlit frontend.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors