Skip to content

CSALT-LUMS/Awesome-Speech-Data-Selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Data Selection for Efficient Speech Processing

MIT License
Contribution Welcome

πŸ‘‹ Welcome to the Data Selection in Speech Processing repo! Here you’ll find a carefully curated list of essential papers and resources on data selection methods used in various speech processing tasks.

Our work is based on a comprehensive survey of data selection methodologies in speech processing, covering automatic speech recognition, speaker recognition, speech emotion recognition, speech synthesis, and audio anti-spoofing. The survey paper is published in IEEE Access (https://ieeexplore.ieee.org/document/11048490).

⭐ Star and fork this repository to stay up-to-date with the latest advancements and support the community.


πŸ“Œ Contents

Section Description
🎀 Automatic Speech Recognition Data selection for ASR systems
πŸ”Š Speaker Recognition Data selection for speaker identification and verification
😊 Speech Emotion Recognition Data selection for emotion recognition from speech
πŸ—£οΈ Text-to-Speech Synthesis Data selection for speech synthesis systems
πŸ›‘οΈ Audio Anti-Spoofing Data selection for spoofing detection systems

🎀 Automatic Speech Recognition

Data selection techniques for improving ASR performance through strategic training data selection, active learning, and domain adaptation.

Title URL
Active learning: theory and applications to automatic speech recognition IEEE
Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion Elsevier
Active Learning for Automatic Speech Recognition IEEE
Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition IEEE
Submodular Subset Selection for Large-Scale Speech Training Data IEEE
Unsupervised and Active Learning in Automatic Speech Recognition for Call Classification IEEE
Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition Springer
An active approach to spoken language processing ACM
Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR IEEE
Towards Data Selection on TTS Data for Children's Speech Recognition IEEE
XenC: An Open-Source Tool for Data Selection in Natural Language Processing UFAL
Active learning for accent adaptation in automatic speech recognition IEEE
Gradient-based Active Learning Query Strategy for End-to-end Speech Recognition IEEE
In Search of Optimal Data Selection for Training of Automatic Speech Recognition Systems IEEE
Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses IEEE
Submodular Data Selection with Acoustic and Phonetic Features for Automatic Speech Recognition IEEE
Maximizing global entropy reduction for active learning in speech recognition IEEE
Loss Prediction: End-to-End Active Learning Approach For Speech Recognition IEEE
Supervised and Unsupervised Active Learning for Automatic Speech Recognition of Low-Resource Languages IEEE
Efficient data selection for ASR Springer
Tibetan Language Continuous Speech Recognition Based on Active WS-DBN IEEE
Data Selection from Multiple ASR Systems' Hypotheses for Unsupervised Acoustic Model Training IEEE
Optimal Subset Selection from Text Databases IEEE
Grammar-Based Semi-Supervised Incremental Learning in Automatic Speech Recognition and Labeling Elsevier
Data pruning for template-based automatic speech recognition ISCA
Active Learning for LF-MMI Trained Neural Networks in ASR ISCA
Improved Data Selection for Domain Adaptation in ASR IEEE
LMC-SMCA: A New Active Learning Method in ASR IEEE
Textual Data Selection for Language Modelling in the Scope of Automatic Speech Recognition Elsevier
Efficient Use of Training Data for Sinhala Speech Recognition using Active Learning IEEE
Unsupervised Language Model Adaptation by Data Selection for Speech Recognition Springer
Kullback-leibler divergence-based ASR training data selection ISCA
Automatic data selection for MLP-based feature extraction for ASR ISCA
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition arXiv
Representative Subset Selection for Efficient Fine-Tuning in Self-Supervised Speech Recognition arXiv
Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition ISCA
Distributed Submodular Maximization for Large Vocabulary Continuous Speech Recognition IEEE
Analysing Acoustic Model Changes for Active Learning in Automatic Speech Recognition IEEE
Data Selection using Spoken Language Identification for Low-Resource and Zero-Resource Speech Recognition IEEE
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages arXiv
Unsupervised Multi-Domain Data Selection for Asr Fine-Tuning IEEE
Semi-Supervised Learning For Code-Switching ASR With Large Language Model Filter arXiv
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR arXiv
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation arXiv
Dynamic Data Pruning for Automatic Speech Recognition arXiv
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition arXiv
Speech Corpora Divergence Based Unsupervised Data Selection for ASR arXiv
Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models arXiv
Data Selection Based on Phoneme Affinity Matrix for Electrolarynx Speech Recognition IEEE
Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models arXiv
Unsupervised Data Selection via Discrete Speech Representation for ASR arXiv
Ask2Mask: Guided Data Selection for Masked Speech Modeling arXiv
Unsupervised data selection for Speech Recognition with contrastive loss ratios arXiv
DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation arXiv
Sequence-level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models arXiv
Scaling ASR Improves Zero and Few Shot Learning arXiv
Seed Words Based Data Selection for Language Model Adaptation arXiv
Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples arXiv
Knowledge Distillation and Data Selection for Semi-Supervised Learning in CTC Acoustic Models arXiv
Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models arXiv
A Dropout-Based Single Model Committee Approach for Active Learning in ASR IEEE
Investigate Automatic Speech Recognition and Keyword Search for Very Low-Resource Language IEEE
Data-selective Transfer Learning for Multi-Domain Speech Recognition arXiv
Active Learning and Semi-Supervised Learning in Tibetan Language Speech Recognition IEEE
Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training JST
Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training ACL
Data Selection for Noise Robust Exemplar Matching IEEE
Efficient data selection employing Semantic Similarity-based Graph Structures for model training arXiv
A General Procedure for Improving Language Models in Low-Resource Speech Recognition IEEE
Data selection for speech recognition IEEE
Submodularity in data subset selection and active learning PMLR
ASR Data Selection from Multiple Sources: A Practical Approach on Performance Scaling NSF

πŸ›‘οΈ Audio Anti-Spoofing

Data selection techniques for improving spoofing detection and anti-spoofing systems.

Title URL
Data selection for i-vector based automatic speaker verification anti-spoofing Elsevier
Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure IEEE
Reducing the Cost of Spoof Detection Labeling using Mixed-Strategy Active Learning and Pretrained Models IEEE
Dataset pruning for resource-constrained spoofed audio detection ISCA
Self supervised dataset pruning for efficient training in audio anti-spoofing ISCA
Fake audio detection in resource-constrained settings using microfeatures ISCA
Implementation of active data selection algorithms for data choosing in asv systems Repository

😊 Speech Emotion Recognition

Data selection approaches for improving emotion recognition from speech signals.

Title URL
Cooperative Learning and its Application to Emotion Recognition from Speech IEEE
An Active Learning Paradigm for Online Audio-Visual Emotion Recognition IEEE
Active Learning for Speech Emotion Recognition Using Deep Neural Network IEEE
An optimal two stage feature selection for speech emotion recognition using acoustic features Springer
Active learning by sparse instance tracking and classifier confidence in acoustic emotion recognition ISCA
Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition Elsevier
Dynamic Active Learning Based on Agreement and Applied to Emotion Recognition in Spoken Interactions ACM
ENsemble Feature Selection for Domain Adaptation in Speech Emotion Recognition IEEE
Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech MDPI
Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction PLOS ONE
Active learning by label uncertainty for acoustic emotion recognition ISCA
Incremental Adaptation Using Active Learning for Acoustic Emotion Recognition ACM
Feature Subset Selection Based on Evolutionary Algorithms for Automatic Emotion Recognition in Spoken Spanish and Standard Basque Language Springer
Personalized music emotion classification via active learning ACM
Studying Self- and Active-Training Methods for Multi-feature Set Emotion Recognition Springer
Data Selection for Acoustic Emotion Recognition: Analyzing and Comparing Utterance and Sub-Utterance Selection Strategies IEEE
Active Learning for Speech Emotion Recognition Using Conditional Random Fields IEEE
Active learning for dimensional speech emotion recognition ISCA
On Instance Selection in Audio Based Emotion Recognition Springer
Trustability-Based Dynamic Active Learning for Crowdsourced Labelling of Emotional Audio Data IEEE
Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition ISCA
Multi-Task Active Learning for Simultaneous Emotion Classification and Regression IEEE
Extracting Audio-Visual Features for Emotion Recognition Through Active Feature Selection IEEE
RANSAC-Based Training Data Selection on Spectral Features for Emotion Recognition from Spontaneous Speech Springer
After: Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition IEEE
Adaptability of Simple Classifier and Active Learning in Music Emotion Recognition ACM
An Efficient Framework for Constructing Speech Emotion Corpus Based on Integrated Active Learning Strategies IEEE
Stream-based Active Learning for Speech Emotion Recognition via Hybrid Data Selection and Continuous Learning Springer
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection arXiv
A Comparison Using Different Speech Parameters in the Automatic Emotion Recognition Using Feature Subset Selection Based on Evolutionary Algorithms Springer
Application of Feature Subset Selection Based on Evolutionary Algorithms for Automatic Emotion Recognition in Speech Springer
Hybrid Intelligent Model for Speech Emotion Recognition Using Active Learning and Residual Network Springer
Cross-Task Inconsistency Based Active Learning (CTIAL) for Emotion Recognition IEEE
Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition arXiv
Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition arXiv
Maximal Information Coefficient and Predominant Correlation-Based Feature Selection Toward A Three-Layer Model for Speech Emotion Recognition IEEE

πŸ”Š Speaker Recognition

Data selection methods for speaker identification and verification systems.

Title URL
GA-based Feature Subset Selection Application to Arabic Speaker Recognition System IEEE
Limited labels for unlimited data: active learning for speaker recognition ISCA
Nature-inspired feature subset selection application to arabic speaker recognition system Springer
Autonomous selection of i-vectors for PLDA modelling in speaker verification Elsevier
Ensemble based speaker recognition using unsupervised data selection Now Publishers
Maximum entropy based data selection for speaker recognition ISCA
Importance of nasality measures for speaker recognition data selection and performance prediction ISCA
Optimized Active Learning Strategy for Audiovisual Speaker Recognition Springer
Spectral entropy and spectral shape based pre-quantization for real time speaker identification system Springer
Ensemble classifiers using unsupervised data selection for speaker recognition ISCA
Data selection with kurtosis and nasality features for speaker recognition ISCA
UBM Data Selection for Effective Speaker Modeling IEEE
Effective background data selection for SVM-based speaker recognition with unseen test environments: more is not always better Springer
Towards Structured Approaches to Arbitrary Data Selection and Performance Prediction for Speaker Recognition Springer
How to Reduce Dimension while Improving Performance Springer
Wavelet-based Parametric Feature Subset Selection for Speaker and Accent Recognition using Genetic Algorithm JTEC
Robust speaker identification using combined feature selection and missing data recognition IEEE
An efficient feature selection method for speaker recognition ISCA
Feature Selection Method for Speaker Recognition using Neural Network IJCA
Normalizations and selection of speech segments for speaker recognition scoring IEEE

πŸ—£οΈ Text-to-Speech Synthesis

Data selection strategies for improving TTS systems and voice synthesis quality.

Title URL
Data Selection for Improving Naturalness of TTS Voices Trained on Small Found Corpuses IEEE
A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis ISCA
Modeling Irregular Voice in End-to-End Speech Synthesis via Speaker Adaptation IEEE
Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis ISCA
Developing a unit selection voice given audio without corresponding text SpringerOpen
Active Learning for Prediction of Prosodic Word Boundaries in Chinese TTS Using Maximum Entropy Markov Model JSW
Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech IEEE
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features arXiv
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection arXiv
Text-To-Speech Synthesis In The Wild arXiv
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study arXiv
Enhancing Voice Cloning Quality through Data Selection and Alignment-Based Metrics Preprints
Data pruning approach to unit selection for inventory generation of concatenative embeddable Chinese TTS systems ISCA
Automatic sentence selection from speech corpora including diverse speech for improved hmm-tts synthesis quality ISCA
Data selection and adaptation for naturalness in hmm-based speech synthesis ISCA
Optimal utterance selection for unit selection speech synthesis databases ISCA
Design of an Efficient Corpus for High-Quality Unit Selection TTS for Bulgarian ILSP

πŸ“Œ Contributing

Contributions are welcome! If you have relevant papers, implementations, or insights related to data selection in speech processing, feel free to submit a pull request.

How to Contribute

  1. Fork this repository
  2. Add new papers to the appropriate category
  3. Follow the existing format: | Title | [Publisher](URL) |
  4. Submit a pull request with a clear description of your additions

πŸ“š Citation

If you find this repository useful in your research, please consider citing:

@misc{speech-data-selection-survey,
  title={A Survey on Data Selection for Efficient Speech Processing},
  author={Azeemi, Abdul Hameed and Qazi, Ihsan Ayyub and Raza, Agha Ali},
  year={2025},
  journal={IEEE Access},
  doi={10.1109/ACCESS.2025.3582395}
}

πŸ“„ License

This work is licensed under a MIT License.

About

Data Selection for Speech Processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors