Reply Kaggle Competition - Nasa-Cmapss-Maintenance

Overview

This competition is part of the September Data Science Training Camp of Sapienza University, organized by Target Reply Roma.

Our challenge was predict the Remaining Useful Life (RUL) of aircraft turbofan engines using time-series sensor data.

The dataset comes from the well-known NASA C-MAPSS simulator, which models how aircraft engines degrade under varying operational conditions. Each engine is run until failure and multivariate time-series data from onboard sensors are collected at each cycle.

Task

In this challenge, our task is to implement and train models that can learn from historical engine runs and generalize to unseen engines, accurately predicting their RUL from sensor data.

Dataset Structure

For this challenge we work with two specific datasets:

train_challenge.txt
test_challenge.txt.

These datasets contain 26 columns in total, representing multivariate time series. Each row corresponds to a single time step for a specific engine unit.

Column Structure

Column	Description
1	Unit number (engine identifier)
2	Time cycles (operational time)
3–5	Operational settings (3 variables)
6–26	Sensor measurements (21 sensors)

Evaluation

Submissions are scored using Mean Squared Error (MSE) between predicted and true RUL values:

$$ \displaystyle MSE = \frac{1}{N} \sum_{i=1}^{N} \left( \hat{RUL}_i - RUL_i \right)^2 $$

where:

$N$ = number of engines in the test set
$\hat{RUL}_i$ = predicted Remaining Useful Life for engine i
$RUL_i$ = true Remaining Useful Life for engine i

A lower MSE means better predictions.

Repository Structure

├── Dataset/                           # Dataset folder
│   ├── train_challenge.txt      
│   └── test_challenge.txt
├── Models & Functions/               
│   ├── models                        # Hybrid attention-LSTM model for predictions
│   └── preprocessing
│   └── training
├── Notebook/
│   ├── Reply_Challenge.ipynb         # Notebook for the submission
├── README.md
└── LICENSE

Model

We propose a hybrid architecture for Remaining Useful Life (RUL) prediction:

Feature-level Self-Attention (multi-head, residual, LayerNorm) to capture dependencies across sensors.
Stacked LSTM layers to model temporal dynamics.
MLP head for final regression output.

Preprocessing

The target variable RUL was limited to 170. Because the average of the test RULs was 105 in the training set. Other clipping thresholds were tested, but they yielded worse results.
Normalization was applied by performing K-Means clustering on the three operating settings. This allowed the data to be normalized within homogeneous operating conditions, reducing the impact of varying environmental and operational factors across different engines.
Sliding windows of 70 time steps (stride = 1) were used to segment the multivariate time series. Each window captures the temporal dynamics of sensor signals over 70 consecutive cycles, enabling the model to learn degradation patterns.
The stride of 1 ensures that the windows overlap, maximizing the amount of training data available.

Training

Adam optimizer with cosine Learning Rate warmup.
MSE loss, gradient clipping to one to prevent exploding during training, 50% of dropout to prevent model overfitting.
Exponential Moving Average (EMA) of weights.
Early stopping based on validation loss.

Post-processing

Exponential smoothing over the last 8 windows per unit (higher weight to recent data).
Negative RUL values clipped to 0.

Output: submission.csv with predicted RUL for 259 units.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reply Kaggle Competition - Nasa-Cmapss-Maintenance

Overview

Task

Dataset Structure

Column Structure

Evaluation

Repository Structure

Model

Preprocessing

Training

Post-processing

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Dataset		Dataset
Models & Functions		Models & Functions
Notebook		Notebook
image		image
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Reply Kaggle Competition - Nasa-Cmapss-Maintenance

Overview

Task

Dataset Structure

Column Structure

Evaluation

Repository Structure

Model

Preprocessing

Training

Post-processing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages