Wind Turbine Anomaly Detection

Overview

This repository contains a comprehensive data science notebook focused on detecting anomalous behavior in wind turbines using sensor data. The primary objective is to build a machine learning classification model capable of distinguishing between normal and anomalous operational states, and subsequently predicting the status of unknown observations.

Problem Statement

The dataset consists of measurements from two wind turbines (Turbine 38 and Turbine 44) equipped with ~238 sensors recording various physical quantities (such as temperatures, pressures, wind speeds, and power outputs) every 10 minutes. Each observation is labeled with a health status:

Normal: The turbine is operating correctly.
Anomalous: The turbine behavior deviates from normal operation (potential fault or degradation).
Unknown: The operational status has not been determined.

The goal is to accurately classify the records despite the severe class imbalance (where normal operation dominates, and anomalies are rare).

Project Workflow

Data Exploration & Preprocessing:
- Analyzed the temporal distribution to see if anomalies are clustered or scattered.
- Validated Active Power values and plotted Power Curves (Wind Speed vs. Active Power) to visualize turbine operating regimes.
- Scaled numerical features using StandardScaler to prepare for model ingestion.
Model Selection & Training:
- Logistic Regression (Baseline): A linear baseline model configured with L1 penalty for feature selection, and customized class_weight={0: 1, 1: 3} to counteract the severe imbalance between normal and anomalous classes.
Appropriate Evaluation Metrics:
- Macro F1-Score: Chosen over Accuracy because relying on Accuracy in heavily imbalanced datasets leads to the "Accuracy Paradox."
- Precision-Recall AUC (PR-AUC): Chosen over ROC-AUC because it strictly evaluates performance on the minority class rather than rewarding the model for identifying easily recognizable normal states.
- Log Loss: Measures the confidence of the predicted probabilities.
- Confusion Matrix: Translated the errors into business impact (False Alarms vs. Missed Faults).
Predicting the Unknowns:
- Scaled the unknown data using the scaler fitted on historical training data.
- Retrained the best model configuration on 100% of the labeled data to maximize knowledge extraction.
- Performed out-of-sample predictions on the unknown dataset (winter data) to estimate global anomaly rates.

Key Files

FINAL.ipynb: The main Jupyter Notebook containing the full Data Science workflow, data exploration, visualizations, and modeling results.
wind_turbine_snippet_A.csv & wind_turbine_snippet_B.csv: The datasets utilized for model training and prediction.

Requirements

To run this project, you will need Jupyter Notebook and the following Python libraries installed:

pandas
numpy
matplotlib
scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitattributes		.gitattributes
FINAL.ipynb		FINAL.ipynb
README.md		README.md
wind_turbine_snippet_A.csv		wind_turbine_snippet_A.csv
wind_turbine_snippet_B.csv		wind_turbine_snippet_B.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wind Turbine Anomaly Detection

Overview

Problem Statement

Project Workflow

Key Files

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wind Turbine Anomaly Detection

Overview

Problem Statement

Project Workflow

Key Files

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages