Data Selection for Efficient Speech Processing

👋 Welcome to the Data Selection in Speech Processing repo! Here you’ll find a carefully curated list of essential papers and resources on data selection methods used in various speech processing tasks.

Our work is based on a comprehensive survey of data selection methodologies in speech processing, covering automatic speech recognition, speaker recognition, speech emotion recognition, speech synthesis, and audio anti-spoofing. The survey paper is published in IEEE Access (https://ieeexplore.ieee.org/document/11048490).

⭐ Star and fork this repository to stay up-to-date with the latest advancements and support the community.

📌 Contents

Section	Description
🎤 Automatic Speech Recognition	Data selection for ASR systems
🔊 Speaker Recognition	Data selection for speaker identification and verification
😊 Speech Emotion Recognition	Data selection for emotion recognition from speech
🗣️ Text-to-Speech Synthesis	Data selection for speech synthesis systems
🛡️ Audio Anti-Spoofing	Data selection for spoofing detection systems

🎤 Automatic Speech Recognition

Data selection techniques for improving ASR performance through strategic training data selection, active learning, and domain adaptation.

Title	URL
Active learning: theory and applications to automatic speech recognition	IEEE
Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion	Elsevier
Active Learning for Automatic Speech Recognition	IEEE
Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition	IEEE
Submodular Subset Selection for Large-Scale Speech Training Data	IEEE
Unsupervised and Active Learning in Automatic Speech Recognition for Call Classification	IEEE
Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition	Springer
An active approach to spoken language processing	ACM
Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR	IEEE
Towards Data Selection on TTS Data for Children's Speech Recognition	IEEE
XenC: An Open-Source Tool for Data Selection in Natural Language Processing	UFAL
Active learning for accent adaptation in automatic speech recognition	IEEE
Gradient-based Active Learning Query Strategy for End-to-end Speech Recognition	IEEE
In Search of Optimal Data Selection for Training of Automatic Speech Recognition Systems	IEEE
Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses	IEEE
Submodular Data Selection with Acoustic and Phonetic Features for Automatic Speech Recognition	IEEE
Maximizing global entropy reduction for active learning in speech recognition	IEEE
Loss Prediction: End-to-End Active Learning Approach For Speech Recognition	IEEE
Supervised and Unsupervised Active Learning for Automatic Speech Recognition of Low-Resource Languages	IEEE
Efficient data selection for ASR	Springer
Tibetan Language Continuous Speech Recognition Based on Active WS-DBN	IEEE
Data Selection from Multiple ASR Systems' Hypotheses for Unsupervised Acoustic Model Training	IEEE
Optimal Subset Selection from Text Databases	IEEE
Grammar-Based Semi-Supervised Incremental Learning in Automatic Speech Recognition and Labeling	Elsevier
Data pruning for template-based automatic speech recognition	ISCA
Active Learning for LF-MMI Trained Neural Networks in ASR	ISCA
Improved Data Selection for Domain Adaptation in ASR	IEEE
LMC-SMCA: A New Active Learning Method in ASR	IEEE
Textual Data Selection for Language Modelling in the Scope of Automatic Speech Recognition	Elsevier
Efficient Use of Training Data for Sinhala Speech Recognition using Active Learning	IEEE
Unsupervised Language Model Adaptation by Data Selection for Speech Recognition	Springer
Kullback-leibler divergence-based ASR training data selection	ISCA
Automatic data selection for MLP-based feature extraction for ASR	ISCA
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition	arXiv
Representative Subset Selection for Efficient Fine-Tuning in Self-Supervised Speech Recognition	arXiv
Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition	ISCA
Distributed Submodular Maximization for Large Vocabulary Continuous Speech Recognition	IEEE
Analysing Acoustic Model Changes for Active Learning in Automatic Speech Recognition	IEEE
Data Selection using Spoken Language Identification for Low-Resource and Zero-Resource Speech Recognition	IEEE
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages	arXiv
Unsupervised Multi-Domain Data Selection for Asr Fine-Tuning	IEEE
Semi-Supervised Learning For Code-Switching ASR With Large Language Model Filter	arXiv
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR	arXiv
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation	arXiv
Dynamic Data Pruning for Automatic Speech Recognition	arXiv
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition	arXiv
Speech Corpora Divergence Based Unsupervised Data Selection for ASR	arXiv
Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models	arXiv
Data Selection Based on Phoneme Affinity Matrix for Electrolarynx Speech Recognition	IEEE
Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models	arXiv
Unsupervised Data Selection via Discrete Speech Representation for ASR	arXiv
Ask2Mask: Guided Data Selection for Masked Speech Modeling	arXiv
Unsupervised data selection for Speech Recognition with contrastive loss ratios	arXiv
DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation	arXiv
Sequence-level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models	arXiv
Scaling ASR Improves Zero and Few Shot Learning	arXiv
Seed Words Based Data Selection for Language Model Adaptation	arXiv
Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples	arXiv
Knowledge Distillation and Data Selection for Semi-Supervised Learning in CTC Acoustic Models	arXiv
Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models	arXiv
A Dropout-Based Single Model Committee Approach for Active Learning in ASR	IEEE
Investigate Automatic Speech Recognition and Keyword Search for Very Low-Resource Language	IEEE
Data-selective Transfer Learning for Multi-Domain Speech Recognition	arXiv
Active Learning and Semi-Supervised Learning in Tibetan Language Speech Recognition	IEEE
Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training	JST
Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training	ACL
Data Selection for Noise Robust Exemplar Matching	IEEE
Efficient data selection employing Semantic Similarity-based Graph Structures for model training	arXiv
A General Procedure for Improving Language Models in Low-Resource Speech Recognition	IEEE
Data selection for speech recognition	IEEE
Submodularity in data subset selection and active learning	PMLR
ASR Data Selection from Multiple Sources: A Practical Approach on Performance Scaling	NSF

🛡️ Audio Anti-Spoofing

Data selection techniques for improving spoofing detection and anti-spoofing systems.

Title	URL
Data selection for i-vector based automatic speaker verification anti-spoofing	Elsevier
Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure	IEEE
Reducing the Cost of Spoof Detection Labeling using Mixed-Strategy Active Learning and Pretrained Models	IEEE
Dataset pruning for resource-constrained spoofed audio detection	ISCA
Self supervised dataset pruning for efficient training in audio anti-spoofing	ISCA
Fake audio detection in resource-constrained settings using microfeatures	ISCA
Implementation of active data selection algorithms for data choosing in asv systems	Repository

😊 Speech Emotion Recognition

Data selection approaches for improving emotion recognition from speech signals.

Title	URL
Cooperative Learning and its Application to Emotion Recognition from Speech	IEEE
An Active Learning Paradigm for Online Audio-Visual Emotion Recognition	IEEE
Active Learning for Speech Emotion Recognition Using Deep Neural Network	IEEE
An optimal two stage feature selection for speech emotion recognition using acoustic features	Springer
Active learning by sparse instance tracking and classifier confidence in acoustic emotion recognition	ISCA
Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition	Elsevier
Dynamic Active Learning Based on Agreement and Applied to Emotion Recognition in Spoken Interactions	ACM
ENsemble Feature Selection for Domain Adaptation in Speech Emotion Recognition	IEEE
Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech	MDPI
Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction	PLOS ONE
Active learning by label uncertainty for acoustic emotion recognition	ISCA
Incremental Adaptation Using Active Learning for Acoustic Emotion Recognition	ACM
Feature Subset Selection Based on Evolutionary Algorithms for Automatic Emotion Recognition in Spoken Spanish and Standard Basque Language	Springer
Personalized music emotion classification via active learning	ACM
Studying Self- and Active-Training Methods for Multi-feature Set Emotion Recognition	Springer
Data Selection for Acoustic Emotion Recognition: Analyzing and Comparing Utterance and Sub-Utterance Selection Strategies	IEEE
Active Learning for Speech Emotion Recognition Using Conditional Random Fields	IEEE
Active learning for dimensional speech emotion recognition	ISCA
On Instance Selection in Audio Based Emotion Recognition	Springer
Trustability-Based Dynamic Active Learning for Crowdsourced Labelling of Emotional Audio Data	IEEE
Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition	ISCA
Multi-Task Active Learning for Simultaneous Emotion Classification and Regression	IEEE
Extracting Audio-Visual Features for Emotion Recognition Through Active Feature Selection	IEEE
RANSAC-Based Training Data Selection on Spectral Features for Emotion Recognition from Spontaneous Speech	Springer
After: Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition	IEEE
Adaptability of Simple Classifier and Active Learning in Music Emotion Recognition	ACM
An Efficient Framework for Constructing Speech Emotion Corpus Based on Integrated Active Learning Strategies	IEEE
Stream-based Active Learning for Speech Emotion Recognition via Hybrid Data Selection and Continuous Learning	Springer
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection	arXiv
A Comparison Using Different Speech Parameters in the Automatic Emotion Recognition Using Feature Subset Selection Based on Evolutionary Algorithms	Springer
Application of Feature Subset Selection Based on Evolutionary Algorithms for Automatic Emotion Recognition in Speech	Springer
Hybrid Intelligent Model for Speech Emotion Recognition Using Active Learning and Residual Network	Springer
Cross-Task Inconsistency Based Active Learning (CTIAL) for Emotion Recognition	IEEE
Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition	arXiv
Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition	arXiv
Maximal Information Coefficient and Predominant Correlation-Based Feature Selection Toward A Three-Layer Model for Speech Emotion Recognition	IEEE

🔊 Speaker Recognition

Data selection methods for speaker identification and verification systems.

Title	URL
GA-based Feature Subset Selection Application to Arabic Speaker Recognition System	IEEE
Limited labels for unlimited data: active learning for speaker recognition	ISCA
Nature-inspired feature subset selection application to arabic speaker recognition system	Springer
Autonomous selection of i-vectors for PLDA modelling in speaker verification	Elsevier
Ensemble based speaker recognition using unsupervised data selection	Now Publishers
Maximum entropy based data selection for speaker recognition	ISCA
Importance of nasality measures for speaker recognition data selection and performance prediction	ISCA
Optimized Active Learning Strategy for Audiovisual Speaker Recognition	Springer
Spectral entropy and spectral shape based pre-quantization for real time speaker identification system	Springer
Ensemble classifiers using unsupervised data selection for speaker recognition	ISCA
Data selection with kurtosis and nasality features for speaker recognition	ISCA
UBM Data Selection for Effective Speaker Modeling	IEEE
Effective background data selection for SVM-based speaker recognition with unseen test environments: more is not always better	Springer
Towards Structured Approaches to Arbitrary Data Selection and Performance Prediction for Speaker Recognition	Springer
How to Reduce Dimension while Improving Performance	Springer
Wavelet-based Parametric Feature Subset Selection for Speaker and Accent Recognition using Genetic Algorithm	JTEC
Robust speaker identification using combined feature selection and missing data recognition	IEEE
An efficient feature selection method for speaker recognition	ISCA
Feature Selection Method for Speaker Recognition using Neural Network	IJCA
Normalizations and selection of speech segments for speaker recognition scoring	IEEE

🗣️ Text-to-Speech Synthesis

Data selection strategies for improving TTS systems and voice synthesis quality.

Title	URL
Data Selection for Improving Naturalness of TTS Voices Trained on Small Found Corpuses	IEEE
A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis	ISCA
Modeling Irregular Voice in End-to-End Speech Synthesis via Speaker Adaptation	IEEE
Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis	ISCA
Developing a unit selection voice given audio without corresponding text	SpringerOpen
Active Learning for Prediction of Prosodic Word Boundaries in Chinese TTS Using Maximum Entropy Markov Model	JSW
Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech	IEEE
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features	arXiv
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection	arXiv
Text-To-Speech Synthesis In The Wild	arXiv
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study	arXiv
Enhancing Voice Cloning Quality through Data Selection and Alignment-Based Metrics	Preprints
Data pruning approach to unit selection for inventory generation of concatenative embeddable Chinese TTS systems	ISCA
Automatic sentence selection from speech corpora including diverse speech for improved hmm-tts synthesis quality	ISCA
Data selection and adaptation for naturalness in hmm-based speech synthesis	ISCA
Optimal utterance selection for unit selection speech synthesis databases	ISCA
Design of an Efficient Corpus for High-Quality Unit Selection TTS for Bulgarian	ILSP

📌 Contributing

Contributions are welcome! If you have relevant papers, implementations, or insights related to data selection in speech processing, feel free to submit a pull request.

How to Contribute

Fork this repository
Add new papers to the appropriate category
Follow the existing format: | Title | [Publisher](URL) |
Submit a pull request with a clear description of your additions

📚 Citation

If you find this repository useful in your research, please consider citing:

@misc{speech-data-selection-survey,
  title={A Survey on Data Selection for Efficient Speech Processing},
  author={Azeemi, Abdul Hameed and Qazi, Ihsan Ayyub and Raza, Agha Ali},
  year={2025},
  journal={IEEE Access},
  doi={10.1109/ACCESS.2025.3582395}
}

📄 License

This work is licensed under a MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CategorizationOfDataSelectionTasks.jpg		CategorizationOfDataSelectionTasks.jpg
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Selection for Efficient Speech Processing

📌 Contents

🎤 Automatic Speech Recognition

🛡️ Audio Anti-Spoofing

😊 Speech Emotion Recognition

🔊 Speaker Recognition

🗣️ Text-to-Speech Synthesis

📌 Contributing

How to Contribute

📚 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Data Selection for Efficient Speech Processing

📌 Contents

🎤 Automatic Speech Recognition

🛡️ Audio Anti-Spoofing

😊 Speech Emotion Recognition

🔊 Speaker Recognition

🗣️ Text-to-Speech Synthesis

📌 Contributing

How to Contribute

📚 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages