Skip to content

khang3004/Comprehensive-ML-DL-Approaches-for-Hotel-Room-Review-Score-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏨 Booking.com Hotel Analytics: A Comprehensive ML Analysis

Booking.com Hotel Analysis

πŸ“ Overview

A comprehensive machine learning analysis of hotel data scraped from Booking.com in Ho Chi Minh City, Vietnam. This project leverages various ML techniques to extract insights from hotel reviews, images, and metadata to provide actionable intelligence for the hospitality industry.

🎯 Key Objectives

  1. Predictive Analytics: Develop robust models for review score prediction
  2. Market Segmentation: Identify distinct hotel segments using unsupervised learning
  3. Quality Classification: Create a reliable hotel quality classification system
  4. Image Analysis: Incorporate visual data in prediction models

πŸ› οΈ Technical Implementation

1. Review Score Regression

  • Combined Model Architecture:
    • ResNet18 backbone for image feature extraction
    • Fusion with numerical/categorical features
    • Custom head for regression
  • Traditional ML Approach:
    • Ridge Regression with VIF-based feature selection
    • Hyperparameter optimization via cross-validation
    • RMSE-focused model evaluation

2. Hotel Segmentation (Unsupervised Learning)

  • Clustering Algorithms:
    • K-means for basic segmentation
    • DBSCAN for density-based clustering
  • Evaluation Metrics:
    • Silhouette Score
    • Elbow Method for optimal cluster selection

3. Quality Classification

  • Multi-class Classification:
    • Softmax Regression baseline
    • Stacking Ensemble:
      • Base models: SVM, KNN, Decision Tree, Random Forest
      • Meta-model: Logistic Regression
  • Class Definition:
    def quality_mapping(score):
        if score < 7.0:
            return "Standard"      # Basic amenities, lower prices
        elif score < 9.0:
            return "Superior"      # Good quality, competitive pricing
        else:
            return "Exceptional"   # Premium experience, luxury segment

πŸš€ Getting Started

Prerequisites

bash python 3.8+ pytorch 1.9+ scikit-learn 0.24+ pandas 1.3+ numpy 1.19+

Installation

git clone https://github.com/username/booking-hotel-analysis.git
cd booking-hotel-analysis
pip install -r requirements.txt

Model Training & Evaluation

  1. Regression Models
# Deep Learning Approach
python evaluate.py \
    --task_type regression \
    --model_type dl \
    --dataset 'booking_images' \
    --n_epoch 5 \
    --batch_size 32 \
    --lr 0.01 \
    --save_model

# Traditional ML Approach
python evaluate.py \
    --task_type regression \
    --model_type ml \
    --model Vanilla_LinearRegression \
    --vif_threshold 5.0
  1. Classification Models
# Stacking Ensemble
python evaluate.py \
    --task_type classification \
    --model_type ml \
    --model Ensemble \
    --save_model
  1. Clustering Analysis
python evaluate.py \
    --task_type clustering \
    --model_type ml \
    --model KMeans \
    --save_model

🐳 Docker Deployment

  1. Build Docker Image
# Build image vα»›i tag
docker build -t hotel-analysis:latest .
  1. Run Container
# ChαΊ‘y container vα»›i mounted volumes
docker run -it --name hotel-analysis \
    -v "$(pwd)/data:/app/data" \
    -v "$(pwd)/models:/app/models" \
    -v "$(pwd)/results:/app/results" \
    hotel-analysis:latest
  1. Run Specific Tasks
# Regression task
docker run -it --name hotel-regression \
    -v "$(pwd)/data:/app/data" \
    -v "$(pwd)/models:/app/models" \
    -v "$(pwd)/results:/app/results" \
    hotel-analysis:latest \
    python -u task_regression/evaluate.py \
    --task_type regression \
    --model_type ml \
    --model Ridge_Regression

# Classification task
docker run -it --name hotel-classification \
    -v "$(pwd)/data:/app/data" \
    -v "$(pwd)/models:/app/models" \
    -v "$(pwd)/results:/app/results" \
    hotel-analysis:latest \
    python -u task_classification/evaluate.py \
    --task_type classification \
    --model_type ml

# Clustering task
docker run -it --name hotel-clustering \
    -v "$(pwd)/data:/app/data" \
    -v "$(pwd)/models:/app/models" \
    -v "$(pwd)/results:/app/results" \
    hotel-analysis:latest \
    python -u task_clustering/evaluate.py \
    --task_type clustering \
    --model_type ml
  1. Useful Docker Commands
# List containers
docker ps -a

# Stop container
docker stop hotel-analysis

# Remove container
docker rm hotel-analysis

# View logs
docker logs -f hotel-analysis

# Clean up
docker system prune -a
  1. Docker Compose (Optional)
# docker-compose.yml
version: '3.8'
services:
  hotel-analysis:
    build: .
    volumes:
      - ./data:/app/data
      - ./models:/app/models
      - ./results:/app/results
    environment:
      - PYTHONPATH=/app
      - TASK_DIR=/app/data
      - MODEL_DIR=/app/models
      - RESULTS_DIR=/app/results

Run with docker-compose:

docker-compose up --build

🐳 Docker Requirements

  • Docker Engine 19.03+
  • Docker Compose 1.27+ (optional)
  • At least 8GB RAM
  • 20GB free disk space

πŸ“Š Data Architecture

project/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                  # Raw scraped data
β”‚   β”œβ”€β”€ processed/            # Cleaned & preprocessed data
β”‚   └── hotel_images/         # Hotel image repository
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ regression/          
β”‚   β”œβ”€β”€ classification/
β”‚   └── clustering/
β”œβ”€β”€ notebooks/               # Analysis & experimentation
└── src/                    # Source code

Key Features

Feature Type Description
review_score float Rating (0-10)
price float Room price (VND)
facilities list Available amenities
location str Hotel location
images tensor Processed hotel images

πŸ“ˆ Performance Metrics

Regression Task

  • RMSE: 0.85
  • RΒ²: 0.78
  • MAE: 0.67

Classification Task

  • Accuracy: 0.84
  • F1-Score: 0.82
  • ROC-AUC: 0.89

Clustering Analysis

  • Silhouette Score: 0.76
  • Optimal Clusters: 3

πŸ” Future Improvements

  1. Model Enhancements:

    • Implement attention mechanisms for image analysis
    • Explore transformer architectures
    • Incorporate temporal features
  2. Feature Engineering:

    • Develop more sophisticated text features
    • Create location-based features
    • Extract deeper image features

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“š Citation

If you use this work in your research, please cite:

@misc{booking_analysis_2024,
  author = {Your Name},
  title = {Booking.com Hotel Analytics},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/khang3004/Comprehensive-ML-DL-Approaches-for-Hotel-Room-Review-Score-Prediction.git}
}

πŸ“§ Contact

For any queries, please reach out to gausseuler159357@gmail.com

About

A comprehensive machine learning analysis of hotel data scraped from Booking.com in Ho Chi Minh City, Vietnam. This project leverages various ML techniques to extract insights from hotel reviews, images, and metadata to provide actionable intelligence for the hospitality industry.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors