This project was developed as part of a Data Mining course in the Computer Science Department.
It demonstrates a complete data mining and machine learning workflow, from data preprocessing and model training to deployment using a simple web application.
The goal of the project is to predict Alzheimer’s disease diagnosis based on patient clinical data using supervised machine learning techniques.
- Apply data preprocessing and feature engineering techniques
- Train and evaluate multiple machine learning models
- Compare model performance using appropriate metrics
- Deploy a trained model using a Streamlit web application
- Present results in a clear, clean and reproducible manner
The Notebook-Code folder contains the core data mining and machine learning work.
Preprocessing (1_Preprocessing.ipynb):
- Data loading and inspection
- Handling missing values
- Encoding categorical variables
- Feature preparation
- Saving the processed dataset for modeling
Modeling (2_modeling.ipynb):
- Training multiple machine learning models
- Model evaluation and comparison
- Selection of the best-performing model
- Saving trained models and evaluation results
Notebooks should be run in order.
The following models were implemented and evaluated:
- Logistic Regression
- Decision Tree
- Random Forest
- Gradient Boosting
- Naive Bayes
- Voting Ensemble Classifier
The Random Forest model achieved the best performance and was selected for deployment.
The Streamlit-Application folder contains a web application built using Streamlit.
Application features:
- Dataset overview (raw and processed data)
- Visualization of model performance
- Alzheimer’s disease prediction using trained models
- Simple and interactive user interface
Run the application from the project root directory:
python -m streamlit run Streamlit-Application/app.py
The app will open automatically in the browser.
Install required dependencies using:
pip install -r requirements.txt
Main libraries used:
- Python
- Pandas
- NumPy
- Scikit-learn
- Streamlit
- Matplotlib
- Seaborn
This project is developed for educational purposes only as part of a university course.
It is not intended for medical diagnosis or clinical use.
Course: Data Mining
Department: Computer Science
Project Type: Academic / Educational
Focus: Machine Learning, Data Mining, and Model Deployment
- KhaledHima
- ihateskil
This project demonstrates the practical application of data mining concepts, machine learning techniques, and basic deployment using Streamlit.
It provides a complete and reproducible workflow suitable for academic evaluation and learning purposes.