| name | local-embedding |
|---|---|
| description | Run embedding on-device with ONNX Runtime. Build from source, model selection, offline mode. Use when setting up local embedding without an API key. |
Pre-built binaries do NOT include local embedding.
cd Memoria
make build-local
sudo cp memoria/target/release/memoria /usr/local/bin/Binary is ~50-80MB (bundles ONNX Runtime). Expected.
memoria init --tool kiro # No --embedding-* flags neededLeave EMBEDDING_* env vars empty in mcp.json → local embedding is the default.
- First query → model downloads to
~/.cache/fastembed/(~30MB default) - Model loads via ONNX Runtime (~3-5s)
- Subsequent queries are fast (in-process)
| Model | Dim | Size | Notes |
|---|---|---|---|
all-MiniLM-L6-v2 |
384 | ~30MB | Default. Fast, English |
BAAI/bge-m3 |
1024 | ~1.2GB | Best quality, multilingual |
Change model in mcp.json env block:
{ "EMBEDDING_MODEL": "BAAI/bge-m3", "EMBEDDING_DIM": "1024" }| Local | Remote (OpenAI/SiliconFlow) | |
|---|---|---|
| Privacy | ✅ Offline | |
| Cost | Free | API key |
| First query | ~3-5s | Fast |
| Build | From source | Pre-built works |
| Offline | ✅ | ❌ |
Recommendation: Use remote unless you need offline/strict privacy.
| Problem | Fix |
|---|---|
| "compiled without local-embedding" | Build from source: make build-local |
| Model download fails | Set HF_ENDPOINT for mirror, or manually download to ~/.cache/fastembed/ |
| High memory | Default ~100MB. bge-m3 ~1-2GB. Choose based on available RAM |