Validating neural networks for spectroscopic classification on a universal synthetic dataset

To aid the development of machine learning models for automated spectroscopic data classification, we created a universal synthetic dataset for the validation of their performance. The dataset mimics the characteristic appearance of experimental measurements from techniques such as X-ray diffraction, nuclear magnetic resonance, and Raman spectroscopy among others. We applied eight neural network architectures to classify artificial spectra, evaluating their ability to handle common experimental artifacts. While all models achieved over 98% accuracy on the synthetic dataset, misclassifications occurred when spectra had overlapping peaks or intensities. We found that non-linear activation functions, specifically ReLU in the fully-connected layers, were crucial for distinguishing between these classes, while adding more sophisticated components, such as residual blocks or normalization layers, provided no performance benefit. Based on these findings, we summarize key design principles for neural networks in spectroscopic data classification and publicly share all scripts used in this study.

Manuscript

The manuscript is published in npj Computational Materials and can be found here.

Concept

Spectroscopic and diffraction signals are visually similar with characteristic intensity peaks when zoomed to matching segment lengths.

$Similarity of spectra and diffraction signals from different techniques$

Correspondingly, a synthetic dataset is formed which incorporates the characteristics of the different signals. The dataset contains multiple unique classes with distinct patterns (number, position and height of peaks). To account for realistic artifacts, the ideal spectra information is varied and multiple samples are generated per class.

Then, multiple established neural network architectures are trained on the synthetic spectra and their performance and classification behavior is evaluated in detail.

Replicating the benchmarks

The models are trained in a Docker container to ensure matching package versions. The images are available in the Github container registry for this repository. An Weights & Biases Account is required to track the training and metrics.

First, pull the relevant image(s):

docker pull ghcr.io/jschuetzke/synthetic-spectra-benchmark:benchmark
docker pull ghcr.io/jschuetzke/synthetic-spectra-benchmark:challenge-activation
docker pull ghcr.io/jschuetzke/synthetic-spectra-benchmark:challenge-modification

To run the benchmark, create a wandb project named synthetic-benchmark and copy your W&b API key. Replace the YOURKEY123 placeholder in the following line with your personal key:

docker run --rm -e "WANDBKEY=YOURKEY123" ghcr.io/jschuetzke/synthetic-spectra-benchmark:VERSION

Usage

This repository can be used to analyze the benchmark data and for the generation of further synthetic spectra datasets.

Clone the repository and install the required Python packages as defined in the requirements.txt file

git clone https://github.com/jschuetzke/synthetic-spectra-benchmark
cd synthetic-spectra-benchmark
pip install -r requirement.txt

Generating new datasets

There are two relevant scripts for producing further synthetic spectra datasets to fit unique constraints:

Change the values in the config_generator
Modify the number of samples or degree of variation in the spectra_from_config script

External ressources

The exact training, validation, and test samples for the general and challenge benchmark, as well as the weights of the trained models, are available here: https://figshare.com/articles/dataset/Synthetic_spectra_challenging_dataset/22188433

Documentation of the training runs can be found here: https://wandb.ai/jschuetzke/synthetic-benchmark/overview

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
dataset_configs		dataset_configs
evaluation		evaluation
figures		figures
model_implementations		model_implementations
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
challenge_dataset.py		challenge_dataset.py
dataset_config_generator.py		dataset_config_generator.py
requirements.txt		requirements.txt
spectra_from_config.py		spectra_from_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Validating neural networks for spectroscopic classification on a universal synthetic dataset

Manuscript

Concept

Replicating the benchmarks

Usage

Generating new datasets

External ressources

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Validating neural networks for spectroscopic classification on a universal synthetic dataset

Manuscript

Concept

Replicating the benchmarks

Usage

Generating new datasets

External ressources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages