Supervised Machine Learning

A predictive analytics study demonstrating the application of Ordinary Least Squares (OLS) Regression to estimate academic performance based on temporal study patterns.

Google Colab · Kaggle Notebook · Video Demo · Live Demo

Authors · Overview · Features · Structure · Results · Quick Start · Usage Guidelines · License · About · Acknowledgments

Authors

Amey Thakur	Mega Satish

Important

🤝🏻 Special Acknowledgement

Special thanks to Mega Satish for her meaningful contributions, guidance, and support that helped shape this work.

Overview

Supervised Machine Learning - Task 1 is a foundational Data Science exploration conducted under the Graduate Rotational Internship Program (GRIP) at The Sparks Foundation. The project establishes a univariate linear model to quantify the correlation between study duration (Hours) and academic outcome (Scores).

By leveraging Scikit-Learn's predictive algorithms, the system minimizes the sum of squared residuals to derive a best-fit line, enabling precise scalar predictions for arbitrary numerical inputs (e.g., predicting scores for 9.25 hours of study).

Computational Objectives

The simulation is governed by strict statistical principles ensuring reproducibility and accuracy:

Linear Approximation: establishing a linear relationship $y = mx + c$ where $y$ is the predicted score and $x$ is study hours.
Residual Minimization: utilizing the OLS algorithm to optimize the coefficient (slope) and intercept.
Predictive Inference: generating a specific scalar output for the internship query: What will be the predicted score if a student studies for 9.25 hrs/day?

Tip

Model Applicability: While this linear model provides a precise mathematical estimation for the given range, extrapolating predictions beyond the observed data range (e.g., studying > 15 hours/day) may yield unrealistic results due to the physical constraints of a 24-hour day.

Features

Component	Technical Description
Ingestion Pipeline	Automated data retrieval and parsing using Pandas from remote HTTP endpoints.
Exploratory Analysis	Visualizing distribution and correlation via Matplotlib and Seaborn scatter plots.
Model Architecture	Implementation of `LinearRegression` from Scikit-Learn for OLS optimization.
Evaluation Metrics	Quantitative assessment using Mean Absolute Error (MAE) to validate model precision.
Inference Engine	Direct scalar injection logic to predict outcomes for specific user-defined inputs.

Note

Empirical Context

The dataset consists of a bivariate distribution (Hours vs. Scores). The high correlation coefficient observed during EDA justifies the selection of a Linear Regression model over more complex polynomial or ensemble approaches, adhering to the principle of parsimony (Occam's Razor) in machine learning design.

Tech Stack

Runtime: Python 3.x
Data Manipulation: Pandas, NumPy
Visualization: Matplotlib, Seaborn
Machine Learning: Scikit-Learn (sklearn)
Environment: Jupyter Notebook / Google Colab

Project Structure

TSF-SUPERVISED-MACHINE-LEARNING/
│
├── docs/                                            # Technical Documentation
│   └── SPECIFICATION.md                             # Architecture & Design Specification
│
├── Mega/                                            # Archival Attribution Assets
│   ├── Filly.jpg                                    # Companion (Filly)
│   ├── Mega.png                                     # Author Profile Image (Mega Satish)
│   └── ...                                          # Additional Attribution Files
│
├── Source Code/                                     # Core Implementation
│   └── TSF_INTERNSHIP_TASK_1_SUPERVISED_LEARNING.ipynb  # Jupyter Notebook (Analysis Kernel)
│
├── The Sparks Foundation/                           # Internship Artifacts
│   └── Task_1_Dataset.csv                           # Empirical Data Source
│
├── .gitattributes                                   # Git configuration
├── .gitignore                                       # Repository Filters
├── CITATION.cff                                     # Scholarly Citation Metadata
├── codemeta.json                                    # Machine-Readable Project Metadata
├── LICENSE                                          # MIT License Terms
├── README.md                                        # Project Documentation
└── SECURITY.md                                      # Security Policy

Results

1. Exploratory Data Analysis: Hours vs Percentage
Initial scatter plot revealing the strong positive correlation.

2. Feature Distribution: Scores Analysis
Statistical distribution of the target variable (Percentage Scored).

3. Regression Fit: Hours vs Scores
OLS Regression line fitted to the training data.

4. Model Training: Fitting the Line
Visualizing the linear approximation on Training Data.

5. Model Validation: Testing the Fit
Validation of the regression line against unseen Test Data.

Evaluation Metrics
Mean Absolute Error: 4.18 | R2 Score: 0.945

Final Inference
Input: 9.25 Hours → Predicted Score: 93.69%

Quick Start

1. Prerequisites

Python 3.7+: Required for runtime execution. Download Python
Jupyter Environment: For interactive code execution (JupyterLab or Notebook).

Warning

Data Path Integrity

The analysis kernel relies on precise relative file paths. Ensure Task_1_Dataset.csv remains within The Sparks Foundation/ directory. Modifying the directory structure without updating the ingestion logic will result in FileNotFoundError during runtime.

2. Installation

Establish the local environment by cloning the repository and installing the computational stack:

# Clone the repository
git clone https://github.com/Amey-Thakur/TSF-SUPERVISED-MACHINE-LEARNING.git
cd TSF-SUPERVISED-MACHINE-LEARNING

# Install predictive modeling dependencies
pip install pandas numpy matplotlib seaborn scikit-learn

3. Execution

Launch the analysis kernel to reproduce the findings:

jupyter notebook "Source Code/TSF_INTERNSHIP_TASK_1_SUPERVISED_LEARNING.ipynb"

Tip

Interactive Predictive Analytics | Student Score Estimation

Explore the high-fidelity Live Demo to visualize the Ordinary Least Squares (OLS) regression analysis in real-time. The interactive dashboard showcases Exploratory Data Analysis (EDA), Feature Distribution, and Model Validation results, quantifying the strong positive correlation between study hours and academic outcomes with an R² score of 0.945.

Launch Live Demo

Usage Guidelines

This repository is openly shared to support learning and knowledge exchange across the academic community.

For Students
Use this project as reference material for understanding supervised learning pipelines, univariate regression, and statistical predictive modeling. The source code is available for study to facilitate self-paced learning and exploration of OLS optimization and residual analysis.

For Educators
This project may serve as a practical lab example or supplementary teaching resource for Data Science and Applied Statistics courses. Attribution is appreciated when utilizing content.

For Researchers
The documentation and architectural approach may provide insights into academic project structuring, predictive inference, and industrial internship artifacts.

License

This academic submission, developed for the Graduate Rotational Internship Program (GRIP) at The Sparks Foundation, is made available under the MIT License. See the LICENSE file for complete terms.

Note

Summary: You are free to share and adapt this content for any purpose, even commercially, as long as you provide appropriate attribution to the original authors.

About This Repository

Created & Maintained by: Amey Thakur & Mega Satish
Role: Data Science & Business Analytics Interns
Program: Graduate Rotational Internship Program (GRIP)
Organization: The Sparks Foundation

This project features Supervised Machine Learning - Task 1, a predictive analytics study conducted as part of the GRIP Internship. It explores the application of linear regression to solve real-world estimation problems.

Connect: GitHub · LinkedIn · ORCID

Acknowledgments

Grateful acknowledgment to Mega Satish for her exceptional collaboration and scholarly partnership during the execution of this data science internship task. Her analytical precision, deep understanding of statistical modeling, and constant support were instrumental in refining the predictive algorithms used in this study. Working alongside her was a transformative experience; her thoughtful approach to problem-solving and steady encouragement turned complex regression challenges into meaningful learning moments. This work reflects the growth and insights gained from our side-by-side academic journey. Thank you, Mega, for everything you shared and taught along the way.

Special thanks to the mentors at The Sparks Foundation for providing this platform for rapid skill development and industrial exposure.

↑ Back to Top

Authors · Overview · Features · Structure · Results · Quick Start · Usage Guidelines · License · About · Acknowledgments

📈 TSF-SUPERVISED-MACHINE-LEARNING

Presented as part of the Internship @ The Sparks Foundation

🎓 Computer Engineering Repository

Computer Engineering (B.E.) - University of Mumbai

Semester-wise curriculum, laboratories, projects, and academic notes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supervised Machine Learning

Authors

🤝🏻 Special Acknowledgement

Overview

Computational Objectives

Features

Empirical Context

Tech Stack

Project Structure

Results

Quick Start

1. Prerequisites

2. Installation

3. Execution

Usage Guidelines

License

About This Repository

Acknowledgments

Presented as part of the Internship @ The Sparks Foundation

🎓 Computer Engineering Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
Mega		Mega
Source Code		Source Code
The Sparks Foundation		The Sparks Foundation
docs		docs
screenshots		screenshots
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
codemeta.json		codemeta.json
index.html		index.html

Folders and files

Latest commit

History

Repository files navigation

Supervised Machine Learning

Authors

🤝🏻 Special Acknowledgement

Overview

Computational Objectives

Features

Empirical Context

Tech Stack

Project Structure

Results

Quick Start

1. Prerequisites

2. Installation

3. Execution

Usage Guidelines

License

About This Repository

Acknowledgments

Presented as part of the Internship @ The Sparks Foundation

🎓 Computer Engineering Repository

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages