Cross-Corpus Speech-Emotion Recognition

MFCC ➜ PCA/NMF Compression ➜ CNN & SVM Benchmarks

Reproducible EE8104 Adaptive Signal Processing project (Winter 2025) Supervisor Prof Dr. Sridhar Krishnan

Speech-emotion recognition (SER) models often perform brilliantly on a single dataset yet stumble the moment recording conditions—or speakers—change.
This repo tackles that mismatch by training and evaluating on two publicly available corpora at once:

TESS — Toronto Emotional Speech Set (2 800 clips)
RAVDESS — Ryerson Audio-Visual Database of Emotional Speech and Song (1 260 clips)

Our pipeline walks through every stage, end-to-end:

Feature extraction – 13 × 85 MFCC maps for each utterance.
Low-rank analysis – PCA & NMF reveal shared structure and let us compress features to 26 / 52 dimensions.
Classical baseline – RBF-SVMs on the compressed vectors (48 – 65 % accuracy).
Deep baselines –
- A 5-block CNN on full MFCC maps (≈ 91 % test accuracy)
- A dual-input CNN that fuses MFCC maps with 26-D PCA+NMF vectors (≈ 92 % test accuracy).
Learning-curve study – How data volume affects SVM vs. CNN performance.

The full report is available here →: Document

#	Notebook	1-line tagline	Details & key outputs
01	`MFCC_TESS_NOTEBOOK.ipynb`	Generate 13 × 85 MFCC maps for the 2 800 TESS recordings.	• Loads TESS wav files → 16 kHz → 0.5 s pads/trims → MFCC • Saves `X_tess.npy`, `y_tess.npy`, and 80/20 split indices. • Quick sanity MLP reaches ~100 % (shows dataset is easy to over-fit).
02	`MFCC_RAVDESS_NOTEBOOK.ipynb`	Same MFCC pipeline for the 1 260 RAVDESS speech clips.	• Produces `X_ravdess.npy`, `y_ravdess.npy`. • Sanity MLP tops out at ~80 %.
03	`PCA and NMF COM.ipynb`	Run PCA + NMF on the pooled corpus and plot the bases.	• Concatenates both MFCC sets. • Computes first 6 PCA & NMF components. • Saves scatter plot (PCA) and bar chart (NMF) used in Figs 2–3. • Dumps `pca_com.npy`, `nmf_com.npy` for later feature compression.
04	`NMF PCA TESS.ipynb`	Stand-alone PCA/NMF exploration for TESS only.	• Generates TESS-specific panels for Fig 2/3.
05	`NMF PCA RAVESS.ipynb`	Stand-alone PCA/NMF exploration for RAVDESS only.	• Generates RAVDESS-specific panels for Fig 2/3.
06	`SVM_NFM_PCa.ipynb`	Train SVM baselines on compressed (PCA + NMF) features.	• Builds 26-dim (2 × 2) and 52-dim (4 × 4) vectors. • LOOCV + hold-out test: • 2 × 2 → ~0.48 test acc • 4 × 4 → ~0.65 test acc • Saves 4 × 4 confusion matrix for the appendix.
07	`CNN_COMBINEdata_Gaussian.ipynb`	Baseline CNN on full MFCC maps with Gaussian-noise augmentation.	• Doubles training data with σ = 0.01 noise. • 5-block CNN, 50 epochs → ~0.91 test acc. • 7 × 7 confusion matrix = Fig 1; per-class PR table.
08	`CNN_PCA_NMF.ipynb`	Lightweight two-input CNN that consumes 26-dim compressed features.	• Dual branch: MFCC map & 26-dim side vector. • ~0.92 test acc (numbers in Table VI).
09	`ML vs DL.ipynb`	Compare learning curves of SVM vs. CNN as data size grows.	• Trains both models on 10 – 100 % of data. • Plots Figure 4 and prints the CSV behind it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Corpus Speech-Emotion Recognition

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
CNN_COMBINEdata_Gaussian.ipynb		CNN_COMBINEdata_Gaussian.ipynb
CNN_PCA_NMF.ipynb		CNN_PCA_NMF.ipynb
MFCC_RAVDESS_NOTEBOOK.ipynb		MFCC_RAVDESS_NOTEBOOK.ipynb
MFCC_TESS_NOTEBOOK.ipynb		MFCC_TESS_NOTEBOOK.ipynb
ML vs DL.ipynb		ML vs DL.ipynb
NMF PCA RAVESS.ipynb		NMF PCA RAVESS.ipynb
NMF PCA TESS.ipynb		NMF PCA TESS.ipynb
PCA and NMF COM.ipynb		PCA and NMF COM.ipynb
README.md		README.md
SVM_NFM_PCa.ipynb		SVM_NFM_PCa.ipynb

Folders and files

Latest commit

History

Repository files navigation

Cross-Corpus Speech-Emotion Recognition

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages