Standalone C port of the 16 kHz Silero VAD model with embedded weights and no ONNX dependency.
This repository packages the model as a small shared library for: Windows, macOS, and Linux.
The code currently focuses on the 16 kHz model path and supports both chunked inference and full-audio probability extraction.
- 16 kHz model path implemented
- Embedded-weight builds supported
- Windows DLL, Linux and macOS shared-library builds supported through CMake
- Several performance optimizations are already implemented:
- stft-conv replaced by fixed 256-point FFT
- fused STFT to first conv path
- shared SIMD abstraction for scalar, SSE, AVX2 and NEON backends
- AVX2 conv and LSTM kernels for x86_64
- SSE conv and LSTM kernels for x86 and x86_64
- NEON conv and LSTM kernels for arm64
Load a built shared library and run full-audio inference:
from run_silero_vad_clib import SileroVadClib
from src.silero_vad.utils_vad import read_audio
audio = read_audio("src/silero_vad/test/tests_data_test.wav", 16000)
with SileroVadClib("./downloads/silero-vad-linux-x86_64-avx2/silero_vad.so") as model:
probs = model.forward_audio(audio)Download prebuilt binaries from the releases site, or by running download_releases.py.
This command downloads the binaries for the local system and unzips them.
python download_releases.py --unzipUse --os all to download all binaries, independent of the system.
python download_releases.py --os all --unzipFull-audio benchmark on the 60-second test clip at src/silero_vad/test/tests_data_test.wav.
| CPU | Build | t (s) | t/d |
|---|---|---|---|
| AMD Ryzen 9 7900 | silero-vad-windows-x64-sse |
0.068892 | 0.001148 |
| AMD Ryzen 9 7900 | silero-vad-windows-x64-avx2 |
0.058982 | 0.000983 |
| AMD Ryzen 9 7900 | torch.hub(onnx=False) |
0.447120 | 0.007452 |
| AMD Ryzen 9 7900 | torch.hub(onnx=True) |
0.257800 | 0.004297 |
t is the inference time averaged over 10 iterations. d is the audio duration (here 60 seconds).
Both native C builds are much faster than the torch.hub baselines on the tested machine. The AVX2 build is the fastest result here, with the SSE build close behind, and both run at well under 1% of real time for this 60-second input.
silero_vad.c/silero_vad.h/silero_vad_simd.h- core C implementation, public API, and SIMD abstraction layer
export_silero_vad_weights.py- exports weights from safetensors
export_silero_vad_jit_weights.py- exports weights from the Torch hub JIT model
run_silero_vad_clib.py- Python
ctypesexample / test runner for the shared library
- Python
package_release.py- creates release folders and zip files from built binaries
Two weight sources are supported:
jit- exported from
torch.hub.load(..., model='silero_vad') - best choice if you want parity with the original Silero Torch hub model
- exported from
safetensors- exported from the local safetensors checkpoint
Build and release commands are collected in:
That file includes: Windows builds, Linux builds, macOS builds, and release packaging commands
The public API is declared in silero_vad.h.
Main entry points:
silero_vad_model_createsilero_vad_model_destroysilero_vad_model_resetsilero_vad_model_forwardsilero_vad_model_forward_audio
The current full-model path is designed around 16 kHz audio and typically uses:
576samples per chunk64samples of left context512new samples
- x86 and x86_64 SIMD backends are now build-selectable:
- scalar baseline
- SSE with
SILERO_VAD_ENABLE_SSE=ON - AVX2 with
SILERO_VAD_ENABLE_AVX2=ON
- Use the baseline or SSE build as the compatibility build for older x86 and x86_64 CPUs without AVX2.
- Use the AVX2 build when you want the faster x86_64 path on AVX2-capable CPUs.
SILERO_VAD_FAST_MATHis available, but on current tests it did not improve performance.- ARM builds support the NEON-optimized path when
SILERO_VAD_ENABLE_NEON=ON.
This repository is based on the MIT-licensed Silero VAD project.