Skip to content

Reduce torch eager dispatch overhead#22616

Closed
MarcosAsh wants to merge 5 commits intokeras-team:masterfrom
MarcosAsh:torch-perf-overhead
Closed

Reduce torch eager dispatch overhead#22616
MarcosAsh wants to merge 5 commits intokeras-team:masterfrom
MarcosAsh:torch-perf-overhead

Conversation

@MarcosAsh
Copy link
Copy Markdown
Contributor

Summary

Reduces Keras[torch] eager dispatch overhead by avoiding unnecessary allocations and indirection in the hot path.

Addresses #22561.

Changes

  • any_symbolic_tensors: iterate args directly instead of tree.flatten
  • convert_to_tensor: fast path for already-correct torch tensors
  • Operation.call: inline error handling, skip wrapper creation per call
  • Layer.call: fast path for single-tensor eager inference (bypasses CallSpec, input validation, mask population, autocast scoping)
  • _set_mask_metadata/ mask population: skip tree ops for single tensors
  • assert_input_compatibility: skip tree.flatten for single-spec case

Benchmark

Colab notebook on T4
GPU:

Metric PyTorch master This PR Improvement
add (256x256) 9.0us 51.5us (5.71x) 16.3us (1.81x) 3.2x faster
matmul (256x256) 17.9us 90.3us (5.04x) 43.9us (2.45x) 2.1x faster
softmax (64x1000) 8.8us 51.8us (5.91x) 17.8us (1.95x) 2.9x faster
CNN forward 232.1us 3099.7us (13.35x) 1896.7us (8.50x) 1.6x faster
Dense call overhead -- 263.2us 46.8us 5.6x less overhead

The remaining CNN gap includes Conv2D's NHWC/NCHW data format conversion overhead, which is a separate problem (#18457).

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements performance optimizations across Keras by introducing fast paths for common execution scenarios, such as single-tensor inputs and eager inference, to reduce overhead from tree.flatten and traceback wrappers. Key changes include optimized symbolic tensor detection, faster tensor conversion in the Torch backend, and streamlined layer call logic. A bug was identified in the new error-handling mechanism in operation.py that would cause the loss of the original traceback during exception injection, and a fix was suggested to ensure proper debugging information is preserved.

Comment thread keras/src/ops/operation.py Outdated
MarcosAsh and others added 2 commits April 2, 2026 16:08
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 58.78378% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.19%. Comparing base (9708582) to head (8f00faa).
⚠️ Report is 13 commits behind head on master.

Files with missing lines Patch % Lines
keras/src/utils/traceback_utils.py 3.03% 32 Missing ⚠️
keras/src/layers/input_spec.py 54.54% 13 Missing and 7 partials ⚠️
keras/src/ops/operation.py 28.57% 5 Missing ⚠️
keras/src/backend/common/keras_tensor.py 81.25% 1 Missing and 2 partials ⚠️
keras/src/backend/torch/core.py 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #22616      +/-   ##
==========================================
- Coverage   83.29%   83.19%   -0.10%     
==========================================
  Files         596      596              
  Lines       68138    68311     +173     
  Branches    10613    10694      +81     
==========================================
+ Hits        56754    56834      +80     
- Misses       8638     8683      +45     
- Partials     2746     2794      +48     
Flag Coverage Δ
keras 83.01% <58.78%> (-0.10%) ⬇️
keras-jax 59.60% <52.02%> (-0.07%) ⬇️
keras-numpy 55.24% <51.35%> (-0.10%) ⬇️
keras-openvino 53.29% <49.32%> (-0.10%) ⬇️
keras-tensorflow 60.95% <52.02%> (-0.09%) ⬇️
keras-torch 59.76% <58.78%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MarcosAsh MarcosAsh force-pushed the torch-perf-overhead branch from 465e6b4 to 09e0791 Compare April 2, 2026 18:02
@MarcosAsh
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several performance optimizations across the Keras backend and layer infrastructure, primarily by implementing fast paths for common eager execution scenarios. These changes include optimized tensor checks in any_symbolic_tensors, a fast path for convert_to_tensor in the Torch backend, streamlined input compatibility checks, and a significant reduction in overhead for layer calls and mask metadata handling. Additionally, the pull request refactors traceback injection to avoid creating wrapper functions per call, instead injecting argument information only when an exception occurs. I have provided a high-severity suggestion to remove an unnecessary boolean cast in the layer call fast path to prevent potential device-to-host synchronization in PyTorch.

Comment thread keras/src/layers/layer.py Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@keerthanakadiri keerthanakadiri added the stat:awaiting keras-eng Awaiting response from Keras engineer label Apr 7, 2026
@MarcosAsh
Copy link
Copy Markdown
Contributor Author

I think this was too large a PR so I am doing them in more digestible smaller PRs.

@MarcosAsh MarcosAsh closed this Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L stat:awaiting keras-eng Awaiting response from Keras engineer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants