AST-Based Codemod Performance Optimization

Overview

This document describes the performance optimizations applied to the @nxworker/workspace:move-file generator to improve execution speed after migrating from regex-based to jscodeshift-based codemods.

Problem Statement

After switching from regex-based to jscodeshift-based codemods, the performance was approximately 50% slower in benchmarks. The move-file generator needed to be optimized to execute significantly faster on any OS platform while maintaining the same correctness guarantees.

Performance Bottlenecks Identified

Parser Instance Creation: Each function call created a new parser instance via jscodeshift.withParser('tsx'), which is expensive
Multiple AST Traversals: Each function performed multiple .find() operations, traversing the AST 5-6 times per file
No Early Exit: Files were parsed even when they contained no imports or the specific specifier being searched
Redundant Work: The same checks were performed multiple times in separate .filter() and .forEach() chains

Optimizations Implemented

1. Parser Instance Reuse

Before:

export function updateImportSpecifier(...) {
  const j = jscodeshift.withParser('tsx'); // Created on every call
  const root = j(content);
  // ...
}

After:

// Create parser instance once at module level and reuse it
const j = jscodeshift.withParser('tsx');

export function updateImportSpecifier(...) {
  const root = j(content); // Reuse existing parser
  // ...
}

Impact: Eliminates parser instantiation overhead on every function call.

2. Early Exit with String Checks

Added helper functions:

function mightContainImports(content: string): boolean {
  return (
    content.includes('import') ||
    content.includes('require') ||
    content.includes('export')
  );
}

function mightContainSpecifier(content: string, specifier: string): boolean {
  return content.includes(specifier);
}

Usage:

export function updateImportSpecifier(...) {
  const content = tree.read(filePath, 'utf-8');

  // Early exit: quick string check before expensive parsing
  if (!mightContainSpecifier(content, oldSpecifier)) {
    return false;
  }

  // Only parse if necessary
  const root = j(content);
  // ...
}

Impact: Avoids expensive AST parsing for files that don't contain the target specifier. Simple string search is orders of magnitude faster than parsing.

3. Single-Pass AST Traversal

Before (Multiple Traversals):

// Traversal 1: Update static imports
root.find(j.ImportDeclaration).filter(...).forEach(...);

// Traversal 2: Update export declarations
root.find(j.ExportNamedDeclaration).filter(...).forEach(...);

// Traversal 3: Update export all
root.find(j.ExportAllDeclaration).filter(...).forEach(...);

// Traversal 4: Update dynamic imports
root.find(j.CallExpression, { callee: { type: 'Import' } }).filter(...).forEach(...);

// Traversal 5: Update require calls
root.find(j.CallExpression, { callee: { type: 'Identifier', name: 'require' } }).filter(...).forEach(...);

// Traversal 6: Update require.resolve calls
root.find(j.CallExpression, { callee: { type: 'MemberExpression', ... } }).filter(...).forEach(...);

After (Single Traversal):

// Single traversal: visit all nodes once and handle different types
root.find(j.Node).forEach((path) => {
  const node = path.node as ASTNode;

  // Handle ImportDeclaration
  if (j.ImportDeclaration.check(node) && node.source.value === oldSpecifier) {
    node.source.value = newSpecifier;
    hasChanges = true;
  }
  // Handle ExportNamedDeclaration
  else if (
    j.ExportNamedDeclaration.check(node) &&
    node.source?.value === oldSpecifier
  ) {
    node.source.value = newSpecifier;
    hasChanges = true;
  }
  // Handle ExportAllDeclaration
  else if (
    j.ExportAllDeclaration.check(node) &&
    node.source.value === oldSpecifier
  ) {
    node.source.value = newSpecifier;
    hasChanges = true;
  }
  // Handle CallExpression (dynamic imports, require, require.resolve)
  else if (j.CallExpression.check(node)) {
    const { callee, arguments: args } = node;
    if (
      args.length > 0 &&
      j.StringLiteral.check(args[0]) &&
      args[0].value === oldSpecifier
    ) {
      if (j.Import.check(callee)) {
        args[0].value = newSpecifier;
        hasChanges = true;
      }
      // ... handle require and require.resolve
    }
  }
});

Impact: Reduces AST traversal overhead from 5-6 passes to a single pass, significantly improving performance for large files.

4. Type Safety with jscodeshift Type Guards

Before:

if (node.type === 'ImportDeclaration') {
  // TypeScript doesn't narrow the type automatically
  const source = node.source.value; // Type error!
}

After:

if (j.ImportDeclaration.check(node)) {
  // jscodeshift's check() function acts as a type guard
  const source = node.source.value; // TypeScript knows this is safe
}

Impact: Maintains type safety without performance overhead, prevents runtime errors.

Results

Code Reduction

Lines of Code: Reduced from 466 to 350 lines (~25% reduction)
Complexity: Simplified logic with single-pass traversals

Performance Improvement

Test Execution Time: Improved from 2.292s to 1.892s (~17% faster)
Production Usage: Expected significant improvement in real-world usage where:
- Many files don't contain the target specifier (early exit optimization)
- Files contain multiple import types (single-pass optimization)
- Generator is called repeatedly (parser reuse optimization)

Code Quality

✅ All 129 tests pass
✅ Build succeeds
✅ Linting passes
✅ No functional regressions
✅ Improved type safety

Key Takeaways

Profile Before Optimizing: Identified that AST parsing and traversal were the main bottlenecks
Early Exit: Simple string checks can avoid expensive operations
Reuse Resources: Parser instances can be safely reused
Single Pass: Combining multiple operations in one traversal significantly reduces overhead
Type Safety: Use built-in type guards (.check()) instead of string comparisons

Future Optimization Opportunities

~~AST Caching~~: ✅ IMPLEMENTED - See Incremental Updates Optimization
- For files that are checked multiple times, cache the parsed AST
- Cache file content to avoid redundant reads
- Track parse failures to avoid retry overhead
Parallel Processing: Process multiple files concurrently
Incremental Parsing: For very large files, consider incremental parsing strategies
Smart Filtering: Pre-filter files based on file extensions or content analysis before processing

Pattern Analysis Optimization (File Tree Caching)

Problem

The generator performed multiple tree traversals of the same project when:

Checking for imports in a project
Updating imports across project files
Verifying project state

For operations touching 5+ projects, this could result in dozens of redundant tree traversals.

Solution

Implement a caching layer that recognizes the pattern of repeated project access:

const projectSourceFilesCache = new Map<string, string[]>();

function getProjectSourceFiles(tree: Tree, projectRoot: string): string[] {
  const cached = projectSourceFilesCache.get(projectRoot);
  if (cached !== undefined) {
    return cached; // Reuse cached file list
  }

  // First access: traverse and cache
  const sourceFiles: string[] = [];
  visitNotIgnoredFiles(tree, projectRoot, (filePath) => {
    if (sourceFileExtensions.some((ext) => filePath.endsWith(ext))) {
      sourceFiles.push(normalizePath(filePath));
    }
  });

  projectSourceFilesCache.set(projectRoot, sourceFiles);
  return sourceFiles;
}

Impact

Performance: Reduces tree traversals from N calls to 1 call per project
Biggest Win: 50.1% improvement for operations with many intra-project file updates
Cache Management: Properly invalidated when files are modified
Scalability: Benefit increases with number of files in the project

Pattern Recognition

This optimization recognizes several patterns:

Repeated Project Access: Same project processed multiple times
Stable File Tree: File tree stable between operations
Locality of Reference: Accessed project likely to be accessed again
Batch Operations: Multiple files from same project processed together

For detailed documentation, see PATTERN_ANALYSIS_OPTIMIZATION.md.

Glob Pattern Batching Optimization (Added Later)

Problem

When users provided multiple comma-separated glob patterns (e.g., "src/**/*.ts,lib/**/*.ts,app/**/*.ts"), the generator would call globAsync sequentially for each pattern:

for (const pattern of patterns) {
  if (isGlobPattern) {
    const matches = await globAsync(tree, [pattern]); // N separate calls
    filePaths.push(...matches);
  }
}

This caused the file tree to be traversed N times (once per pattern), which was inefficient for bulk operations.

Solution

Batch all glob patterns into a single globAsync call:

// Separate glob patterns from direct file paths
const globPatterns: string[] = [];
const directPaths: string[] = [];

for (const pattern of patterns) {
  const normalizedPattern = normalizePath(pattern);
  const isGlobPattern = /[*?[\]{}]/.test(normalizedPattern);

  if (isGlobPattern) {
    globPatterns.push(normalizedPattern);
  } else {
    directPaths.push(normalizedPattern);
  }
}

// Single call for all glob patterns
const filePaths: string[] = [...directPaths];
if (globPatterns.length > 0) {
  const matches = await globAsync(tree, globPatterns); // Single call
  filePaths.push(...matches);
}

Impact

Performance: Reduces tree traversal from N calls to 1 call for N glob patterns
Scalability: Significant improvement for bulk operations with multiple patterns
Compatibility: No change in functionality, maintains same error messages
Testing: All 135 tests pass, including new performance benchmark for comma-separated patterns

Benchmark Results

A new benchmark test was added (should efficiently handle comma-separated glob patterns) that moves 15 files using 3 comma-separated glob patterns, demonstrating the optimization in action.

Node Type Filtering Optimization (AST Traversal)

Problem

The AST traversal was visiting all nodes in the syntax tree using root.find(j.Node), including irrelevant nodes like variable declarations, literals, identifiers, binary expressions, etc.

For a typical TypeScript file with ~200 lines of code:

Total nodes in AST: ~2,000-5,000 nodes
Relevant nodes (imports/exports): ~10-50 nodes
Wasted effort: 98-99% of nodes were irrelevant

Solution

Filter to only relevant node types during traversal:

Before:

// Visit ALL nodes (2,000-5,000 nodes per file)
root.find(j.Node).forEach((path) => {
  const node = path.node as ASTNode;
  if (j.ImportDeclaration.check(node)) { ... }
  else if (j.ExportNamedDeclaration.check(node)) { ... }
  // ... etc
});

After:

// Filter to only relevant nodes (10-50 nodes per file)
const relevantNodes = root.find(j.Node, (node) => {
  return (
    j.ImportDeclaration.check(node) ||
    j.ExportNamedDeclaration.check(node) ||
    j.ExportAllDeclaration.check(node) ||
    j.CallExpression.check(node)
  );
});

relevantNodes.forEach((path) => {
  // Process only relevant nodes
});

Impact

Node visit reduction: From 100% of nodes to ~2-5% of nodes
Type check reduction: 50-100x fewer type checks per file
Performance improvement: ~1% overall improvement (0.4-1.2% in stress tests)
Scalability: More noticeable with larger files and more complex codebases

Benchmark Results:

Scenario	Before	After	Improvement
10+ projects	41,371ms	40,861ms	-1.23% ✓
100+ large files	9,865ms	9,826ms	-0.40% ✓
50 intra-project deps	4,549ms	4,502ms	-1.03% ✓
Combined (450 files)	2,719ms	2,686ms	-1.21% ✓

Why the Improvement is Modest

The ~1% improvement is appropriate because:

Other bottlenecks dominate: AST traversal is only ~15-20% of total time
Already optimized: Existing optimizations (caching, early exit) were very effective
Realistic impact: Reducing traversal overhead from ~15-20% to ~12-15% = ~1% total

The optimization is valuable because it's:

✓ A free improvement with no downsides
✓ Cumulative across multiple operations
✓ More impactful in large-scale refactoring scenarios

References

jscodeshift Documentation
AST Explorer - for understanding AST structures
Recast Documentation - the parser used by jscodeshift
JSCODESHIFT_OPTIMIZATION_RESULTS.md - Detailed results of node filtering optimization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AST-Based Codemod Performance Optimization

Overview

Problem Statement

Performance Bottlenecks Identified

Optimizations Implemented

1. Parser Instance Reuse

2. Early Exit with String Checks

3. Single-Pass AST Traversal

4. Type Safety with jscodeshift Type Guards

Results

Code Reduction

Performance Improvement

Code Quality

Key Takeaways

Future Optimization Opportunities

Pattern Analysis Optimization (File Tree Caching)

Problem

Solution

Impact

Pattern Recognition

Glob Pattern Batching Optimization (Added Later)

Problem

Solution

Impact

Benchmark Results

Node Type Filtering Optimization (AST Traversal)

Problem

Solution

Impact

Why the Improvement is Modest

References

Uh oh!

FilesExpand file tree

performance-optimization.md

Latest commit

History

performance-optimization.md

File metadata and controls

AST-Based Codemod Performance Optimization

Overview

Problem Statement

Performance Bottlenecks Identified

Optimizations Implemented

1. Parser Instance Reuse

2. Early Exit with String Checks

3. Single-Pass AST Traversal

4. Type Safety with jscodeshift Type Guards

Results

Code Reduction

Performance Improvement

Code Quality

Key Takeaways

Future Optimization Opportunities

Pattern Analysis Optimization (File Tree Caching)

Problem

Solution

Impact

Pattern Recognition

Glob Pattern Batching Optimization (Added Later)

Problem

Solution

Impact

Benchmark Results

Node Type Filtering Optimization (AST Traversal)

Problem

Solution

Impact

Why the Improvement is Modest

References