In-Memory Tree Cache Optimization

Overview

This document describes the in-memory tree cache optimization implemented to reduce File I/O overhead in the @nxworker/workspace:move-file generator.

Problem

Based on profiling, File I/O operations accounted for approximately 30-40% of execution time. While the existing AST cache handles content caching for files being parsed, there were additional tree operations that could benefit from caching:

Direct tree.read() calls - Configuration files (tsconfig.json), index files, and other files read outside the AST parsing flow
tree.children() calls - Directory listings to find tsconfig files
Redundant reads - Files read multiple times during a single generator execution

Solution

Implemented a TreeReadCache class that provides in-memory caching for Tree operations:

class TreeReadCache {
  private contentCache = new Map<string, string | null>();
  private childrenCache = new Map<string, string[]>();

  read(tree: Tree, filePath: string, encoding: BufferEncoding): string | null {
    // Check cache first, then read from tree and cache result
  }

  children(tree: Tree, dirPath: string): string[] {
    // Check cache first, then get from tree and cache result
  }

  invalidateFile(filePath: string): void {
    // Invalidate cache when file is written
  }

  clear(): void {
    // Clear all caches at generator start
  }
}

Integration Points

The tree cache is integrated at key points:

Generator initialization: clearAllCaches() clears tree cache along with other caches
File reads: 7 tree.read() calls replaced with treeReadCache.read(tree, ...)
Directory listings: 1 tree.children() call replaced with treeReadCache.children(tree, ...)
File writes: Cache invalidated when files are modified
Statistics: Cache stats logged alongside other cache metrics

Cache Invalidation Strategy

The cache is invalidated:

On generator start: clearAllCaches() ensures fresh state
On file write: invalidateFile() called after tree.write()
Automatic: No invalidation needed for reads (files don't change during execution)

Performance Impact

Benchmark Results

Benchmark Tests (single file operations):

Small/medium/large files: ~0-2% variation (within margin of error)
No regressions in any test cases

Stress Tests (large-scale operations):

10 projects with cross-dependencies: -406ms (-0.96%)
100 large files: -310ms (-3.13%)
50 intra-project dependencies: -167ms (-3.68%)
Combined (15 projects, 450 files): -67ms (-2.46%)
- Per-file improvement: 5.89ms vs 6.04ms (-2.48%)

Key Insights

Complementary to AST Cache: The tree cache complements the existing AST cache by handling non-AST file reads (tsconfig, index files) and directory listings
Most beneficial in large workspaces: Performance improvements are most noticeable in stress tests with many projects and files
Low overhead: The cache adds minimal memory overhead and no performance regression
Cache hit rate: Typical operations show 3 file reads cached and 1 directory listing cached

Implementation Details

Files Modified

New file:

packages/workspace/src/generators/move-file/tree-cache.ts - Tree cache implementation

Modified files:

packages/workspace/src/generators/move-file/generator.ts - Integration of tree cache

Functions Updated

Direct tree.read() replacements:

Reading source file content (line ~650)
Reading moved file for relative import updates (line ~826)
Reading moved file for alias import updates (line ~883)
Checking if file is exported from index (line ~1263)
Reading tsconfig files (line ~1342)
Adding export to index file (line ~1795)
Removing export from index file (line ~1827)

Directory listing replacement:

Finding tsconfig files at root (line ~1329)

Cache management:

clearAllCaches() - Added tree cache clearing
createTargetFile() - Added cache invalidation after write
addFileExport() - Added cache invalidation after write
removeFileExport() - Added cache invalidation after write

Cache Characteristics

Scope: Per-generator-execution (cleared at start)
Granularity: Per-file for content, per-directory for listings
Invalidation: Explicit on writes, automatic on generator start
Strategy: Lazy loading (populated on first access)

Comparison to Other Optimizations

This optimization complements existing optimizations:

Glob Pattern Batching (GLOB_OPTIMIZATION.md): Reduces tree traversals for pattern matching
AST Optimizations (docs/performance-optimization.md): Reduces parsing overhead with AST and content caching
Pattern Analysis (PATTERN_ANALYSIS_OPTIMIZATION.md): Reduces file listing overhead with project source files caching
Smart File Cache (SMART_FILE_CACHE_OPTIMIZATION.md): Reduces existence checks and incremental cache updates
Tree Cache (this doc): Reduces I/O overhead for non-AST file reads and directory listings

Together, these create a comprehensive optimization strategy:

Glob batching → fewer tree traversals
File tree caching → fewer file listings
Tree cache → fewer non-AST file reads and directory listings
AST cache → fewer file reads and AST parses
File existence cache → fewer existence checks

Conclusion

The in-memory tree cache optimization successfully reduces File I/O overhead by 2-4% in large-scale operations without any regressions. The optimization is particularly effective in workspaces with many projects and files, where the cumulative benefit of cached reads becomes significant.

Impact Summary

✅ All 135 unit tests pass - No regressions ✅ All 7 performance benchmark tests pass - Performance maintained ✅ All 4 stress tests pass - 2-4% improvement in large workspaces ✅ Build succeeds - No compilation errors ✅ No breaking changes - Same API and behavior

The optimization is production-ready and provides measurable performance benefits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In-Memory Tree Cache Optimization

Overview

Problem

Solution

Integration Points

Cache Invalidation Strategy

Performance Impact

Benchmark Results

Key Insights

Implementation Details

Files Modified

Functions Updated

Cache Characteristics

Comparison to Other Optimizations

Conclusion

Impact Summary

Uh oh!

FilesExpand file tree

TREE_CACHE_OPTIMIZATION.md

Latest commit

History

TREE_CACHE_OPTIMIZATION.md

File metadata and controls

In-Memory Tree Cache Optimization

Overview

Problem

Solution

Integration Points

Cache Invalidation Strategy

Performance Impact

Benchmark Results

Key Insights

Implementation Details

Files Modified

Functions Updated

Cache Characteristics

Comparison to Other Optimizations

Conclusion

Impact Summary