fix(core): resolve FastEmbed cache under data dir instead of /tmp#743
fix(core): resolve FastEmbed cache under data dir instead of /tmp#743phernandez merged 3 commits intomainfrom
Conversation
FastEmbed's own default caches ONNX model artifacts under <system tmp>/fastembed_cache. In sandboxed MCP runtimes (e.g. Codex CLI) that directory is wiped between invocations, so every semantic search raised NO_SUCHFILE once the first run ended. Resolve the cache dir to <basic-memory data dir>/fastembed_cache (or FASTEMBED_CACHE_PATH when set) at provider construction time. Keep the config field nullable with default=None so nothing gets baked into config.json — that way later changes to BASIC_MEMORY_CONFIG_DIR or FASTEMBED_CACHE_PATH take effect on every run. The provider cache key uses the resolved path so env-driven changes correctly invalidate the cached provider. Also extract resolve_data_dir() as the single source of truth for the per-user data directory and point BasicMemoryConfig.data_dir_path and ConfigManager.__init__ at it (removes two existing duplications). Fixes #741 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>
|
Claude finished @groksrc's task in 2m 7s —— View job PR Review: fix(core): resolve FastEmbed cache under data dir instead of /tmp
This is a well-reasoned fix for a real problem. The root cause (FastEmbed's Team Checklist
Issues Found1. def _resolve_cache_dir(app_config: BasicMemoryConfig) -> str:
return app_config.semantic_embedding_cache_dir or default_fastembed_cache_dir()If a user sets def _resolve_cache_dir(app_config: BasicMemoryConfig) -> str:
if app_config.semantic_embedding_cache_dir is not None:
return app_config.semantic_embedding_cache_dir
return default_fastembed_cache_dir()2. Inconsistent env-var access style (
3. home = os.getenv("HOME", Path.home())
return Path(home) / DATA_DIR_NAME
def resolve_data_dir() -> Path:
if config_dir := os.getenv("BASIC_MEMORY_CONFIG_DIR"):
return Path(config_dir)
return Path.home() / DATA_DIR_NAMEThis also removes the need to convert 4. Existing test uses config = BasicMemoryConfig(
...
projects={"test-project": "/tmp/basic-memory-test"},
semantic_embedding_cache_dir="/tmp/fastembed-cache",
...
)
assert provider.cache_dir == "/tmp/fastembed-cache"This pre-existing test (not newly introduced, only What's Done Well
SummaryThe fix is correct and production-ready. The |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a8fbd5340
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| resolved = default_fastembed_cache_dir() | ||
|
|
||
| assert "/tmp/fastembed_cache" not in resolved | ||
| assert not resolved.startswith(tempfile.gettempdir()) |
There was a problem hiding this comment.
Remove brittle temp-root assertion
This assertion will fail in common test environments because config_home points HOME at tmp_path, and tmp_path is normally created under tempfile.gettempdir(). In that case a correct resolved cache path like <tmp_path>/.basic-memory/fastembed_cache still starts with the temp root, so this check reports a false failure even though the code no longer uses FastEmbed’s /tmp/fastembed_cache default.
Useful? React with 👍 / 👎.
- Use explicit ``is not None`` instead of ``or`` in _resolve_cache_dir so
an empty-string override doesn't silently fall through to the default.
Empty string is an invalid path; letting FastEmbed surface the error
is clearer than swapping it out for ~/.basic-memory/fastembed_cache.
- Drop the ``os.getenv("HOME", Path.home())`` dance in resolve_data_dir.
``Path.home()`` already reads $HOME on POSIX and %USERPROFILE% on
Windows, so the env check was redundant and made tests that patched
os.name to simulate Windows on non-Windows hosts fragile.
- Switch default_fastembed_cache_dir to ``os.getenv`` for consistency
with the rest of config.py.
- Fix test_default_fastembed_cache_dir_never_falls_back_to_tmp so it no
longer mis-fires on Linux CI. ``tempfile.gettempdir()`` is ``/tmp``
there and pytest's ``tmp_path`` lives under it, so the previous
``startswith(tempfile.gettempdir())`` guard flagged legitimate paths.
Compare exact paths against FastEmbed's concrete
``<tempdir>/fastembed_cache`` fallback instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
|
Claude finished @groksrc's task in 2m 4s —— View job PR Review: fix(core): resolve FastEmbed cache under data dir instead of /tmp
This PR is in excellent shape. The follow-up commit Team Checklist
Issues Addressed in
|
Hundreds of ``test-int/`` tests were failing on Windows CI with ``Project not found: 'main'`` after resolve_data_dir() was simplified to use ``Path.home()`` directly (#743). Root cause: ``Path.home()`` reads ``$HOME`` on POSIX and ``%USERPROFILE%`` on Windows. The ``test-int/conftest.py`` ``config_home`` fixture only monkeypatched ``HOME``, which worked before because ``ConfigManager.data_dir_path`` used ``os.getenv("HOME", Path.home())`` — that preferred ``HOME`` even on Windows. Now that ``Path.home()`` is called directly, ``USERPROFILE`` wins and the fixture redirects nothing, so the test data dir falls back to the real user home where no ``main`` project is configured. Mirror the pattern already used in ``tests/conftest.py:config_home``: when running on Windows, also set ``USERPROFILE`` to ``tmp_path`` so ``Path.home()`` sees the redirected location. No-op on POSIX. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>
|
Claude finished @groksrc's task in 1m 22s —— View job PR Review: fix(core): resolve FastEmbed cache under data dir instead of /tmp
This review covers the latest commit What changed in
|
| Category | Item | Status |
|---|---|---|
| Code Quality | Coding conventions (CLAUDE.md) | ✅ Explanatory comment follows literate programming style |
| Code Quality | Python 3.12+ type annotations | ✅ No new annotatable code |
| Code Quality | 100-character line length | ✅ |
| Code Quality | No commented-out code | ✅ |
| Testing | Fix matches existing unit test pattern | ✅ Mirrors tests/conftest.py:config_home exactly |
| Testing | No-op on POSIX | ✅ Guarded by if os.name == "nt" |
| Testing | os already imported |
✅ Present at line 53 |
| Architecture | No production code changes | ✅ Test fixture only |
| Compatibility | Windows CI regression fixed | ✅ This is the entire point of the commit |
| Compatibility | No emojis / unicode in output | ✅ |
Assessment
This is exactly the right fix. The two config_home fixtures (tests/conftest.py and test-int/conftest.py) now match:
# tests/conftest.py (existing pattern — unchanged)
monkeypatch.setenv("HOME", str(tmp_path))
if os.name == "nt":
monkeypatch.setenv("USERPROFILE", str(tmp_path))
# test-int/conftest.py (now matches)
monkeypatch.setenv("HOME", str(tmp_path))
if os.name == "nt":
monkeypatch.setenv("USERPROFILE", str(tmp_path))The unit test conftest already had this pattern (0d8c106 references it explicitly in the commit message). The integration test conftest had not been updated to match — this commit closes that gap. The fix is minimal, targeted, no-op on POSIX, and directly caused by the correct simplification in fe92c06.
No issues found in this commit.
Verdict: Ready to merge. All three commits together constitute a correct and well-tested fix. The Windows regression introduced by simplifying resolve_data_dir() is now properly addressed in both test suites.
Summary
<basic-memory data dir>/fastembed_cache(orFASTEMBED_CACHE_PATHwhen set) instead of letting FastEmbed fall back to<system tmp>/fastembed_cache, which is wiped between runs in sandboxed MCP runtimes like Codex CLI and causes every semantic search to raiseNO_SUCHFILE.semantic_embedding_cache_dirnullable withdefault=Noneso nothing is baked intoconfig.json— later changes toBASIC_MEMORY_CONFIG_DIRorFASTEMBED_CACHE_PATHtake effect on every run instead of being frozen by the first save.resolve_data_dir()as the single source of truth for the per-user data directory and pointBasicMemoryConfig.data_dir_pathandConfigManager.__init__at it, removing two existing duplications.Fixes #741. Follow-up tracking issue for related
Path.home() / ".basic-memory"call sites that ignoreBASIC_MEMORY_CONFIG_DIR: #742.What changed
src/basic_memory/config.pyresolve_data_dir()helper — honorsBASIC_MEMORY_CONFIG_DIR, cross-platform viaPath.home().default_fastembed_cache_dir()helper — returnsFASTEMBED_CACHE_PATHwhen set, else<data_dir>/fastembed_cache.semantic_embedding_cache_dirdescription updated to explain runtime resolution; field staysstr | Nonewithdefault=None.BasicMemoryConfig.data_dir_pathandConfigManager.__init__now delegate toresolve_data_dir().src/basic_memory/repository/embedding_provider_factory.py_resolve_cache_dir()helper centralizes theconfig or defaultfallback.create_embedding_provider()always passes a concretecache_dirto FastEmbed — never relies on FastEmbed's<tmp>/fastembed_cachedefault._provider_cache_key()hashes the resolved cache dir so differentFASTEMBED_CACHE_PATHvalues produce distinct providers instead of silently returning a stale cached one.Why
default=Noneinstead ofdefault_factoryAn earlier draft used
default_factory=default_fastembed_cache_dir. That persisted the resolved path intoconfig.jsonon first save, which caused two problems:BASIC_MEMORY_CONFIG_DIR=/opt/afroze/opt/a/fastembed_cacheinto the saved config — a later run with/opt/bsilently kept pointing at the old path.FASTEMBED_CACHE_PATHchanges were ignored for the same reason.Keeping the field
Noneand resolving at the factory call site fixes both.Test plan
just fast-checkequivalent: lint, format, typecheck (ruff + ty) all green.tests/test_config.py: 6 new tests coveringresolve_data_dir()anddefault_fastembed_cache_dir()helpers, plus regression guards that the config field staysNone, thatmodel_dumpnever serializes a concrete path, and that explicit user overrides still round-trip.tests/repository/test_openai_provider.py: 2 new factory tests covering the default-resolution path and the cache-key behavior (FASTEMBED_CACHE_PATHchanges invalidate the provider cache).tests/sync/+tests/services/test_search_service.py+tests/services/test_semantic_search.py: 199 passed, 3 skipped — broader sanity pass on the areas that sit downstream of the embedding provider factory.Path.home() / ".basic-memory" / "fastembed_cache"resolves correctly.🤖 Generated with Claude Code