This document describes the performance optimizations applied to the @nxworker/workspace:move-file generator to improve execution speed after migrating from regex-based to jscodeshift-based codemods.
After switching from regex-based to jscodeshift-based codemods, the performance was approximately 50% slower in benchmarks. The move-file generator needed to be optimized to execute significantly faster on any OS platform while maintaining the same correctness guarantees.
- Parser Instance Creation: Each function call created a new parser instance via
jscodeshift.withParser('tsx'), which is expensive - Multiple AST Traversals: Each function performed multiple
.find()operations, traversing the AST 5-6 times per file - No Early Exit: Files were parsed even when they contained no imports or the specific specifier being searched
- Redundant Work: The same checks were performed multiple times in separate
.filter()and.forEach()chains
Before:
export function updateImportSpecifier(...) {
const j = jscodeshift.withParser('tsx'); // Created on every call
const root = j(content);
// ...
}After:
// Create parser instance once at module level and reuse it
const j = jscodeshift.withParser('tsx');
export function updateImportSpecifier(...) {
const root = j(content); // Reuse existing parser
// ...
}Impact: Eliminates parser instantiation overhead on every function call.
Added helper functions:
function mightContainImports(content: string): boolean {
return (
content.includes('import') ||
content.includes('require') ||
content.includes('export')
);
}
function mightContainSpecifier(content: string, specifier: string): boolean {
return content.includes(specifier);
}Usage:
export function updateImportSpecifier(...) {
const content = tree.read(filePath, 'utf-8');
// Early exit: quick string check before expensive parsing
if (!mightContainSpecifier(content, oldSpecifier)) {
return false;
}
// Only parse if necessary
const root = j(content);
// ...
}Impact: Avoids expensive AST parsing for files that don't contain the target specifier. Simple string search is orders of magnitude faster than parsing.
Before (Multiple Traversals):
// Traversal 1: Update static imports
root.find(j.ImportDeclaration).filter(...).forEach(...);
// Traversal 2: Update export declarations
root.find(j.ExportNamedDeclaration).filter(...).forEach(...);
// Traversal 3: Update export all
root.find(j.ExportAllDeclaration).filter(...).forEach(...);
// Traversal 4: Update dynamic imports
root.find(j.CallExpression, { callee: { type: 'Import' } }).filter(...).forEach(...);
// Traversal 5: Update require calls
root.find(j.CallExpression, { callee: { type: 'Identifier', name: 'require' } }).filter(...).forEach(...);
// Traversal 6: Update require.resolve calls
root.find(j.CallExpression, { callee: { type: 'MemberExpression', ... } }).filter(...).forEach(...);After (Single Traversal):
// Single traversal: visit all nodes once and handle different types
root.find(j.Node).forEach((path) => {
const node = path.node as ASTNode;
// Handle ImportDeclaration
if (j.ImportDeclaration.check(node) && node.source.value === oldSpecifier) {
node.source.value = newSpecifier;
hasChanges = true;
}
// Handle ExportNamedDeclaration
else if (
j.ExportNamedDeclaration.check(node) &&
node.source?.value === oldSpecifier
) {
node.source.value = newSpecifier;
hasChanges = true;
}
// Handle ExportAllDeclaration
else if (
j.ExportAllDeclaration.check(node) &&
node.source.value === oldSpecifier
) {
node.source.value = newSpecifier;
hasChanges = true;
}
// Handle CallExpression (dynamic imports, require, require.resolve)
else if (j.CallExpression.check(node)) {
const { callee, arguments: args } = node;
if (
args.length > 0 &&
j.StringLiteral.check(args[0]) &&
args[0].value === oldSpecifier
) {
if (j.Import.check(callee)) {
args[0].value = newSpecifier;
hasChanges = true;
}
// ... handle require and require.resolve
}
}
});Impact: Reduces AST traversal overhead from 5-6 passes to a single pass, significantly improving performance for large files.
Before:
if (node.type === 'ImportDeclaration') {
// TypeScript doesn't narrow the type automatically
const source = node.source.value; // Type error!
}After:
if (j.ImportDeclaration.check(node)) {
// jscodeshift's check() function acts as a type guard
const source = node.source.value; // TypeScript knows this is safe
}Impact: Maintains type safety without performance overhead, prevents runtime errors.
- Lines of Code: Reduced from 466 to 350 lines (~25% reduction)
- Complexity: Simplified logic with single-pass traversals
- Test Execution Time: Improved from 2.292s to 1.892s (~17% faster)
- Production Usage: Expected significant improvement in real-world usage where:
- Many files don't contain the target specifier (early exit optimization)
- Files contain multiple import types (single-pass optimization)
- Generator is called repeatedly (parser reuse optimization)
- ✅ All 129 tests pass
- ✅ Build succeeds
- ✅ Linting passes
- ✅ No functional regressions
- ✅ Improved type safety
- Profile Before Optimizing: Identified that AST parsing and traversal were the main bottlenecks
- Early Exit: Simple string checks can avoid expensive operations
- Reuse Resources: Parser instances can be safely reused
- Single Pass: Combining multiple operations in one traversal significantly reduces overhead
- Type Safety: Use built-in type guards (
.check()) instead of string comparisons
AST Caching: ✅ IMPLEMENTED - See Incremental Updates Optimization- For files that are checked multiple times, cache the parsed AST
- Cache file content to avoid redundant reads
- Track parse failures to avoid retry overhead
- Parallel Processing: Process multiple files concurrently
- Incremental Parsing: For very large files, consider incremental parsing strategies
- Smart Filtering: Pre-filter files based on file extensions or content analysis before processing
The generator performed multiple tree traversals of the same project when:
- Checking for imports in a project
- Updating imports across project files
- Verifying project state
For operations touching 5+ projects, this could result in dozens of redundant tree traversals.
Implement a caching layer that recognizes the pattern of repeated project access:
const projectSourceFilesCache = new Map<string, string[]>();
function getProjectSourceFiles(tree: Tree, projectRoot: string): string[] {
const cached = projectSourceFilesCache.get(projectRoot);
if (cached !== undefined) {
return cached; // Reuse cached file list
}
// First access: traverse and cache
const sourceFiles: string[] = [];
visitNotIgnoredFiles(tree, projectRoot, (filePath) => {
if (sourceFileExtensions.some((ext) => filePath.endsWith(ext))) {
sourceFiles.push(normalizePath(filePath));
}
});
projectSourceFilesCache.set(projectRoot, sourceFiles);
return sourceFiles;
}- Performance: Reduces tree traversals from N calls to 1 call per project
- Biggest Win: 50.1% improvement for operations with many intra-project file updates
- Cache Management: Properly invalidated when files are modified
- Scalability: Benefit increases with number of files in the project
This optimization recognizes several patterns:
- Repeated Project Access: Same project processed multiple times
- Stable File Tree: File tree stable between operations
- Locality of Reference: Accessed project likely to be accessed again
- Batch Operations: Multiple files from same project processed together
For detailed documentation, see PATTERN_ANALYSIS_OPTIMIZATION.md.
When users provided multiple comma-separated glob patterns (e.g., "src/**/*.ts,lib/**/*.ts,app/**/*.ts"), the generator would call globAsync sequentially for each pattern:
for (const pattern of patterns) {
if (isGlobPattern) {
const matches = await globAsync(tree, [pattern]); // N separate calls
filePaths.push(...matches);
}
}This caused the file tree to be traversed N times (once per pattern), which was inefficient for bulk operations.
Batch all glob patterns into a single globAsync call:
// Separate glob patterns from direct file paths
const globPatterns: string[] = [];
const directPaths: string[] = [];
for (const pattern of patterns) {
const normalizedPattern = normalizePath(pattern);
const isGlobPattern = /[*?[\]{}]/.test(normalizedPattern);
if (isGlobPattern) {
globPatterns.push(normalizedPattern);
} else {
directPaths.push(normalizedPattern);
}
}
// Single call for all glob patterns
const filePaths: string[] = [...directPaths];
if (globPatterns.length > 0) {
const matches = await globAsync(tree, globPatterns); // Single call
filePaths.push(...matches);
}- Performance: Reduces tree traversal from N calls to 1 call for N glob patterns
- Scalability: Significant improvement for bulk operations with multiple patterns
- Compatibility: No change in functionality, maintains same error messages
- Testing: All 135 tests pass, including new performance benchmark for comma-separated patterns
A new benchmark test was added (should efficiently handle comma-separated glob patterns) that moves 15 files using 3 comma-separated glob patterns, demonstrating the optimization in action.
The AST traversal was visiting all nodes in the syntax tree using root.find(j.Node), including irrelevant nodes like variable declarations, literals, identifiers, binary expressions, etc.
For a typical TypeScript file with ~200 lines of code:
- Total nodes in AST: ~2,000-5,000 nodes
- Relevant nodes (imports/exports): ~10-50 nodes
- Wasted effort: 98-99% of nodes were irrelevant
Filter to only relevant node types during traversal:
Before:
// Visit ALL nodes (2,000-5,000 nodes per file)
root.find(j.Node).forEach((path) => {
const node = path.node as ASTNode;
if (j.ImportDeclaration.check(node)) { ... }
else if (j.ExportNamedDeclaration.check(node)) { ... }
// ... etc
});After:
// Filter to only relevant nodes (10-50 nodes per file)
const relevantNodes = root.find(j.Node, (node) => {
return (
j.ImportDeclaration.check(node) ||
j.ExportNamedDeclaration.check(node) ||
j.ExportAllDeclaration.check(node) ||
j.CallExpression.check(node)
);
});
relevantNodes.forEach((path) => {
// Process only relevant nodes
});- Node visit reduction: From 100% of nodes to ~2-5% of nodes
- Type check reduction: 50-100x fewer type checks per file
- Performance improvement: ~1% overall improvement (0.4-1.2% in stress tests)
- Scalability: More noticeable with larger files and more complex codebases
Benchmark Results:
| Scenario | Before | After | Improvement |
|---|---|---|---|
| 10+ projects | 41,371ms | 40,861ms | -1.23% ✓ |
| 100+ large files | 9,865ms | 9,826ms | -0.40% ✓ |
| 50 intra-project deps | 4,549ms | 4,502ms | -1.03% ✓ |
| Combined (450 files) | 2,719ms | 2,686ms | -1.21% ✓ |
The ~1% improvement is appropriate because:
- Other bottlenecks dominate: AST traversal is only ~15-20% of total time
- Already optimized: Existing optimizations (caching, early exit) were very effective
- Realistic impact: Reducing traversal overhead from ~15-20% to ~12-15% = ~1% total
The optimization is valuable because it's:
- ✓ A free improvement with no downsides
- ✓ Cumulative across multiple operations
- ✓ More impactful in large-scale refactoring scenarios
- jscodeshift Documentation
- AST Explorer - for understanding AST structures
- Recast Documentation - the parser used by jscodeshift
- JSCODESHIFT_OPTIMIZATION_RESULTS.md - Detailed results of node filtering optimization