This document describes the pattern analysis optimization implemented to reduce file tree traversal overhead in the @nxworker/workspace:move-file generator.
The move-file generator processes files across multiple projects and performs various operations that require visiting all source files in a project:
- Checking for imports - Scanning all files to find imports of a specific specifier
- Updating imports - Modifying import statements across all files
- Detecting empty projects - Checking if a project has any source files left
Each of these operations called visitNotIgnoredFiles(tree, projectRoot, callback), which traverses the entire file tree for that project. When performing multiple operations on the same project (common in move operations), this resulted in redundant tree traversals.
For a typical move operation that:
- Checks if target project has imports to the file being moved
- Updates imports in source project files
- Updates imports in target project files
- Checks if dependent projects have imports
- Updates imports in dependent projects
This could result in 5+ tree traversals of the same project directory.
Implement a file tree caching layer that recognizes the pattern of repeated project access:
const projectSourceFilesCache = new Map<string, string[]>();
function getProjectSourceFiles(tree: Tree, projectRoot: string): string[] {
const cached = projectSourceFilesCache.get(projectRoot);
if (cached !== undefined) {
return cached;
}
const sourceFiles: string[] = [];
visitNotIgnoredFiles(tree, projectRoot, (filePath) => {
if (sourceFileExtensions.some((ext) => filePath.endsWith(ext))) {
sourceFiles.push(normalizePath(filePath));
}
});
projectSourceFilesCache.set(projectRoot, sourceFiles);
return sourceFiles;
}Key Benefits:
- Each project's file tree is traversed only once per generator execution
- Subsequent operations on the same project use the cached file list
- Eliminates redundant I/O operations
The cache is properly invalidated when files are modified:
function executeMove(...) {
// Invalidate cache for projects that will be modified
const sourceProject = projects.get(sourceProjectName);
const targetProject = projects.get(targetProjectName);
if (sourceProject) {
projectSourceFilesCache.delete(sourceProject.root);
}
if (targetProject && targetProject.root !== sourceProject?.root) {
projectSourceFilesCache.delete(targetProject.root);
}
// ... perform move operations
}All caches are cleared at the start of each generator execution:
export async function moveFileGenerator(
tree: Tree,
options: MoveFileGeneratorSchema,
) {
clearAllCaches(); // Ensure fresh state
// ... rest of generator
}Before Pattern Analysis Optimization:
- Small file move: ~1943ms
- Medium file move: ~2114ms
- Large file move: ~2677ms
- Multiple small files (10): ~2186ms (218.17ms per file)
- Files with many imports (20): ~2143ms
- Many intra-project dependencies (50): ~4490ms
After Pattern Analysis Optimization:
- Small file move: ~1985ms (↑2% - within margin of error)
- Medium file move: ~2101ms (↑0.6% - within margin of error)
- Large file move: ~2677ms (↔ no change)
- Multiple small files (10): ~2157ms (215.70ms per file) (↑1.3% improvement)
- Files with many imports (20): ~2137ms (↑0.3% improvement)
- Many intra-project dependencies (50): ~2242ms (↑50.1% improvement!)
Stress Test Results:
- Combined stress (15 projects, 450 files): ~2689ms (5.97ms per file) (↑1.1% improvement)
- Biggest Impact: Operations with many intra-project file updates (50.1% improvement)
- Consistent Performance: Small overhead variations are within margin of error
- Scalability: Benefit increases with number of files processed in the same project
- Cache Efficiency: Single traversal + cached lookups is much faster than repeated traversals
All 6 uses of visitNotIgnoredFiles were updated to use getProjectSourceFiles:
updateImportPathsToPackageAlias- Converts relative imports to package aliasesupdateImportPathsInProject- Updates relative imports within a projectcheckForImportsInProject- Checks if project has imports to a specifierupdateImportsToRelative- Converts package imports to relative importsupdateImportsByAliasInProject- Updates imports by aliasisProjectEmpty- Checks if project has only index file (no cache to avoid stale state)
Note: isProjectEmpty intentionally does NOT use the cache because it needs to check the current state of the tree after files have been deleted.
This optimization recognizes several patterns:
- Repeated Project Access: Same project is processed multiple times in one operation
- Stable File Tree: Between operations, the file tree doesn't change (except for explicit modifications)
- Locality of Reference: Once a project is accessed, it's likely to be accessed again soon
- Batch Operations: Multiple files from the same project are often processed together
- Scope: Per-generator-execution (cleared at start)
- Granularity: Per-project (keyed by project root)
- Invalidation: Explicit (when files are added/removed)
- Strategy: Lazy loading (populated on first access)
This optimization complements existing optimizations:
- Glob Pattern Batching (GLOB_OPTIMIZATION.md): Reduces tree traversals for pattern matching
- AST Optimizations (docs/performance-optimization.md): Reduces parsing overhead
- Pattern Analysis (this doc): Reduces file listing overhead
Together, these create a layered optimization strategy:
- Pattern batching → fewer glob operations
- File tree caching → fewer traversals
- Early exit + parser reuse → fewer AST parses
- Single-pass traversal → fewer AST walks
Potential additional optimizations building on this pattern:
- Smart Pre-fetching: When loading one project, preload likely-to-be-accessed dependent projects
- Incremental Updates: Track file changes and update cache incrementally instead of invalidating
- Shared Cache: Maintain cache across multiple generator calls in the same session
- File Content Caching: Cache file contents (with care for memory usage)