Web application to predict medical insurance costs using Machine Learning, deployed on Google Cloud Run.
🔗 Live Demo: insurance-predictor-562289298058.us-central1.run.app
- ML-based medical insurance cost prediction
- Interactive web interface with Streamlit
- Gradient Boosting model with 90% accuracy (R²)
- Deployed on Google Cloud Run
| Category | Technologies |
|---|---|
| ML | scikit-learn, XGBoost, pandas, numpy |
| Web | Streamlit |
| Cloud | Google Cloud Run, Cloud Build |
| Containers | Docker |
├── app.py # Streamlit application
├── train.py # Training script
├── requirements.txt # Python dependencies
├── Dockerfile # Container for Cloud Run
├── Dockerfile.training # Container for training
├── .env # Environment variables (don't push to git)
├── data/
│ └── insurance.csv # Dataset
└── model/
├── model.joblib # Trained model
└── feature_names.joblib
# 1. Clone repository
git clone https://github.com/your-username/ai-insurance-cost-predictor.git
cd ai-insurance-cost-predictor
# 2. Create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
# 3. Install dependencies
pip install -r requirements.txt
# 4. Download dataset from Kaggle
# https://www.kaggle.com/datasets/mirichoi0218/insurance
# Save as data/insurance.csv
# 5. Train model (optional, already included)
python train.py --data-path=data/insurance.csv --model-dir=model
# 6. Run application
streamlit run app.pyApp will be available at: http://localhost:8501
- Google Cloud account with billing enabled
- gcloud CLI installed and configured
# 1. Set project
gcloud config set project YOUR-PROJECT-ID
# 2. Enable APIs
gcloud services enable cloudbuild.googleapis.com run.googleapis.com storage.googleapis.com containerregistry.googleapis.com
# 3. Build image in the cloud
gcloud builds submit --tag gcr.io/YOUR-PROJECT-ID/insurance-app .
# 4. Deploy to Cloud Run
gcloud run deploy insurance-predictor --image gcr.io/YOUR-PROJECT-ID/insurance-app --platform managed --region us-central1 --allow-unauthenticated --memory 1Gi --port 8080| Metric | Value |
|---|---|
| R² Score | 0.90 |
| MAE | $2,530 |
| RMSE | $4,269 |
- 🚬 Smoker (~70%)
- ⚖️ BMI (~15%)
- 📅 Age (~10%)
- 📍 Other (~5%)
| Variable | Type | Description |
|---|---|---|
| age | int | Age (18-100) |
| sex | str | Sex (Male/Female) |
| bmi | float | Body Mass Index |
| children | int | Number of children (0-5) |
| smoker | str | Smoker (Yes/No) |
| region | str | Region (Northeast/Northwest/Southeast/Southwest) |
| Service | Approximate Cost |
|---|---|
| Cloud Run | ~$0-5/month |
| Cloud Build | ~$0.003/build |
| Container Registry | ~$0.10/GB |
Medical Cost Personal Dataset from Kaggle: https://www.kaggle.com/datasets/mirichoi0218/insurance
Adrian Zambrana
MIT License