π Welcome to the Data Selection in Speech Processing repo! Here youβll find a carefully curated list of essential papers and resources on data selection methods used in various speech processing tasks.
Our work is based on a comprehensive survey of data selection methodologies in speech processing, covering automatic speech recognition, speaker recognition, speech emotion recognition, speech synthesis, and audio anti-spoofing. The survey paper is published in IEEE Access (https://ieeexplore.ieee.org/document/11048490).
β Star and fork this repository to stay up-to-date with the latest advancements and support the community.
| Section | Description |
|---|---|
| π€ Automatic Speech Recognition | Data selection for ASR systems |
| π Speaker Recognition | Data selection for speaker identification and verification |
| π Speech Emotion Recognition | Data selection for emotion recognition from speech |
| π£οΈ Text-to-Speech Synthesis | Data selection for speech synthesis systems |
| π‘οΈ Audio Anti-Spoofing | Data selection for spoofing detection systems |
Data selection techniques for improving ASR performance through strategic training data selection, active learning, and domain adaptation.
| Title | URL |
|---|---|
| Active learning: theory and applications to automatic speech recognition | IEEE |
| Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion | Elsevier |
| Active Learning for Automatic Speech Recognition | IEEE |
| Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition | IEEE |
| Submodular Subset Selection for Large-Scale Speech Training Data | IEEE |
| Unsupervised and Active Learning in Automatic Speech Recognition for Call Classification | IEEE |
| Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition | Springer |
| An active approach to spoken language processing | ACM |
| Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR | IEEE |
| Towards Data Selection on TTS Data for Children's Speech Recognition | IEEE |
| XenC: An Open-Source Tool for Data Selection in Natural Language Processing | UFAL |
| Active learning for accent adaptation in automatic speech recognition | IEEE |
| Gradient-based Active Learning Query Strategy for End-to-end Speech Recognition | IEEE |
| In Search of Optimal Data Selection for Training of Automatic Speech Recognition Systems | IEEE |
| Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses | IEEE |
| Submodular Data Selection with Acoustic and Phonetic Features for Automatic Speech Recognition | IEEE |
| Maximizing global entropy reduction for active learning in speech recognition | IEEE |
| Loss Prediction: End-to-End Active Learning Approach For Speech Recognition | IEEE |
| Supervised and Unsupervised Active Learning for Automatic Speech Recognition of Low-Resource Languages | IEEE |
| Efficient data selection for ASR | Springer |
| Tibetan Language Continuous Speech Recognition Based on Active WS-DBN | IEEE |
| Data Selection from Multiple ASR Systems' Hypotheses for Unsupervised Acoustic Model Training | IEEE |
| Optimal Subset Selection from Text Databases | IEEE |
| Grammar-Based Semi-Supervised Incremental Learning in Automatic Speech Recognition and Labeling | Elsevier |
| Data pruning for template-based automatic speech recognition | ISCA |
| Active Learning for LF-MMI Trained Neural Networks in ASR | ISCA |
| Improved Data Selection for Domain Adaptation in ASR | IEEE |
| LMC-SMCA: A New Active Learning Method in ASR | IEEE |
| Textual Data Selection for Language Modelling in the Scope of Automatic Speech Recognition | Elsevier |
| Efficient Use of Training Data for Sinhala Speech Recognition using Active Learning | IEEE |
| Unsupervised Language Model Adaptation by Data Selection for Speech Recognition | Springer |
| Kullback-leibler divergence-based ASR training data selection | ISCA |
| Automatic data selection for MLP-based feature extraction for ASR | ISCA |
| Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition | arXiv |
| Representative Subset Selection for Efficient Fine-Tuning in Self-Supervised Speech Recognition | arXiv |
| Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition | ISCA |
| Distributed Submodular Maximization for Large Vocabulary Continuous Speech Recognition | IEEE |
| Analysing Acoustic Model Changes for Active Learning in Automatic Speech Recognition | IEEE |
| Data Selection using Spoken Language Identification for Low-Resource and Zero-Resource Speech Recognition | IEEE |
| Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages | arXiv |
| Unsupervised Multi-Domain Data Selection for Asr Fine-Tuning | IEEE |
| Semi-Supervised Learning For Code-Switching ASR With Large Language Model Filter | arXiv |
| TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR | arXiv |
| Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation | arXiv |
| Dynamic Data Pruning for Automatic Speech Recognition | arXiv |
| Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition | arXiv |
| Speech Corpora Divergence Based Unsupervised Data Selection for ASR | arXiv |
| Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models | arXiv |
| Data Selection Based on Phoneme Affinity Matrix for Electrolarynx Speech Recognition | IEEE |
| Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models | arXiv |
| Unsupervised Data Selection via Discrete Speech Representation for ASR | arXiv |
| Ask2Mask: Guided Data Selection for Masked Speech Modeling | arXiv |
| Unsupervised data selection for Speech Recognition with contrastive loss ratios | arXiv |
| DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation | arXiv |
| Sequence-level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models | arXiv |
| Scaling ASR Improves Zero and Few Shot Learning | arXiv |
| Seed Words Based Data Selection for Language Model Adaptation | arXiv |
| Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples | arXiv |
| Knowledge Distillation and Data Selection for Semi-Supervised Learning in CTC Acoustic Models | arXiv |
| Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models | arXiv |
| A Dropout-Based Single Model Committee Approach for Active Learning in ASR | IEEE |
| Investigate Automatic Speech Recognition and Keyword Search for Very Low-Resource Language | IEEE |
| Data-selective Transfer Learning for Multi-Domain Speech Recognition | arXiv |
| Active Learning and Semi-Supervised Learning in Tibetan Language Speech Recognition | IEEE |
| Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training | JST |
| Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training | ACL |
| Data Selection for Noise Robust Exemplar Matching | IEEE |
| Efficient data selection employing Semantic Similarity-based Graph Structures for model training | arXiv |
| A General Procedure for Improving Language Models in Low-Resource Speech Recognition | IEEE |
| Data selection for speech recognition | IEEE |
| Submodularity in data subset selection and active learning | PMLR |
| ASR Data Selection from Multiple Sources: A Practical Approach on Performance Scaling | NSF |
Data selection techniques for improving spoofing detection and anti-spoofing systems.
| Title | URL |
|---|---|
| Data selection for i-vector based automatic speaker verification anti-spoofing | Elsevier |
| Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure | IEEE |
| Reducing the Cost of Spoof Detection Labeling using Mixed-Strategy Active Learning and Pretrained Models | IEEE |
| Dataset pruning for resource-constrained spoofed audio detection | ISCA |
| Self supervised dataset pruning for efficient training in audio anti-spoofing | ISCA |
| Fake audio detection in resource-constrained settings using microfeatures | ISCA |
| Implementation of active data selection algorithms for data choosing in asv systems | Repository |
Data selection approaches for improving emotion recognition from speech signals.
| Title | URL |
|---|---|
| Cooperative Learning and its Application to Emotion Recognition from Speech | IEEE |
| An Active Learning Paradigm for Online Audio-Visual Emotion Recognition | IEEE |
| Active Learning for Speech Emotion Recognition Using Deep Neural Network | IEEE |
| An optimal two stage feature selection for speech emotion recognition using acoustic features | Springer |
| Active learning by sparse instance tracking and classifier confidence in acoustic emotion recognition | ISCA |
| Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition | Elsevier |
| Dynamic Active Learning Based on Agreement and Applied to Emotion Recognition in Spoken Interactions | ACM |
| ENsemble Feature Selection for Domain Adaptation in Speech Emotion Recognition | IEEE |
| Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech | MDPI |
| Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction | PLOS ONE |
| Active learning by label uncertainty for acoustic emotion recognition | ISCA |
| Incremental Adaptation Using Active Learning for Acoustic Emotion Recognition | ACM |
| Feature Subset Selection Based on Evolutionary Algorithms for Automatic Emotion Recognition in Spoken Spanish and Standard Basque Language | Springer |
| Personalized music emotion classification via active learning | ACM |
| Studying Self- and Active-Training Methods for Multi-feature Set Emotion Recognition | Springer |
| Data Selection for Acoustic Emotion Recognition: Analyzing and Comparing Utterance and Sub-Utterance Selection Strategies | IEEE |
| Active Learning for Speech Emotion Recognition Using Conditional Random Fields | IEEE |
| Active learning for dimensional speech emotion recognition | ISCA |
| On Instance Selection in Audio Based Emotion Recognition | Springer |
| Trustability-Based Dynamic Active Learning for Crowdsourced Labelling of Emotional Audio Data | IEEE |
| Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition | ISCA |
| Multi-Task Active Learning for Simultaneous Emotion Classification and Regression | IEEE |
| Extracting Audio-Visual Features for Emotion Recognition Through Active Feature Selection | IEEE |
| RANSAC-Based Training Data Selection on Spectral Features for Emotion Recognition from Spontaneous Speech | Springer |
| After: Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition | IEEE |
| Adaptability of Simple Classifier and Active Learning in Music Emotion Recognition | ACM |
| An Efficient Framework for Constructing Speech Emotion Corpus Based on Integrated Active Learning Strategies | IEEE |
| Stream-based Active Learning for Speech Emotion Recognition via Hybrid Data Selection and Continuous Learning | Springer |
| Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection | arXiv |
| A Comparison Using Different Speech Parameters in the Automatic Emotion Recognition Using Feature Subset Selection Based on Evolutionary Algorithms | Springer |
| Application of Feature Subset Selection Based on Evolutionary Algorithms for Automatic Emotion Recognition in Speech | Springer |
| Hybrid Intelligent Model for Speech Emotion Recognition Using Active Learning and Residual Network | Springer |
| Cross-Task Inconsistency Based Active Learning (CTIAL) for Emotion Recognition | IEEE |
| Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition | arXiv |
| Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition | arXiv |
| Maximal Information Coefficient and Predominant Correlation-Based Feature Selection Toward A Three-Layer Model for Speech Emotion Recognition | IEEE |
Data selection methods for speaker identification and verification systems.
| Title | URL |
|---|---|
| GA-based Feature Subset Selection Application to Arabic Speaker Recognition System | IEEE |
| Limited labels for unlimited data: active learning for speaker recognition | ISCA |
| Nature-inspired feature subset selection application to arabic speaker recognition system | Springer |
| Autonomous selection of i-vectors for PLDA modelling in speaker verification | Elsevier |
| Ensemble based speaker recognition using unsupervised data selection | Now Publishers |
| Maximum entropy based data selection for speaker recognition | ISCA |
| Importance of nasality measures for speaker recognition data selection and performance prediction | ISCA |
| Optimized Active Learning Strategy for Audiovisual Speaker Recognition | Springer |
| Spectral entropy and spectral shape based pre-quantization for real time speaker identification system | Springer |
| Ensemble classifiers using unsupervised data selection for speaker recognition | ISCA |
| Data selection with kurtosis and nasality features for speaker recognition | ISCA |
| UBM Data Selection for Effective Speaker Modeling | IEEE |
| Effective background data selection for SVM-based speaker recognition with unseen test environments: more is not always better | Springer |
| Towards Structured Approaches to Arbitrary Data Selection and Performance Prediction for Speaker Recognition | Springer |
| How to Reduce Dimension while Improving Performance | Springer |
| Wavelet-based Parametric Feature Subset Selection for Speaker and Accent Recognition using Genetic Algorithm | JTEC |
| Robust speaker identification using combined feature selection and missing data recognition | IEEE |
| An efficient feature selection method for speaker recognition | ISCA |
| Feature Selection Method for Speaker Recognition using Neural Network | IJCA |
| Normalizations and selection of speech segments for speaker recognition scoring | IEEE |
Data selection strategies for improving TTS systems and voice synthesis quality.
| Title | URL |
|---|---|
| Data Selection for Improving Naturalness of TTS Voices Trained on Small Found Corpuses | IEEE |
| A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis | ISCA |
| Modeling Irregular Voice in End-to-End Speech Synthesis via Speaker Adaptation | IEEE |
| Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis | ISCA |
| Developing a unit selection voice given audio without corresponding text | SpringerOpen |
| Active Learning for Prediction of Prosodic Word Boundaries in Chinese TTS Using Maximum Entropy Markov Model | JSW |
| Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech | IEEE |
| Diversity-based core-set selection for text-to-speech with linguistic and acoustic features | arXiv |
| Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection | arXiv |
| Text-To-Speech Synthesis In The Wild | arXiv |
| Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study | arXiv |
| Enhancing Voice Cloning Quality through Data Selection and Alignment-Based Metrics | Preprints |
| Data pruning approach to unit selection for inventory generation of concatenative embeddable Chinese TTS systems | ISCA |
| Automatic sentence selection from speech corpora including diverse speech for improved hmm-tts synthesis quality | ISCA |
| Data selection and adaptation for naturalness in hmm-based speech synthesis | ISCA |
| Optimal utterance selection for unit selection speech synthesis databases | ISCA |
| Design of an Efficient Corpus for High-Quality Unit Selection TTS for Bulgarian | ILSP |
Contributions are welcome! If you have relevant papers, implementations, or insights related to data selection in speech processing, feel free to submit a pull request.
- Fork this repository
- Add new papers to the appropriate category
- Follow the existing format:
| Title | [Publisher](URL) | - Submit a pull request with a clear description of your additions
If you find this repository useful in your research, please consider citing:
@misc{speech-data-selection-survey,
title={A Survey on Data Selection for Efficient Speech Processing},
author={Azeemi, Abdul Hameed and Qazi, Ihsan Ayyub and Raza, Agha Ali},
year={2025},
journal={IEEE Access},
doi={10.1109/ACCESS.2025.3582395}
}This work is licensed under a MIT License.
