Reduce torch eager dispatch overhead#22616
Reduce torch eager dispatch overhead#22616MarcosAsh wants to merge 5 commits intokeras-team:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements performance optimizations across Keras by introducing fast paths for common execution scenarios, such as single-tensor inputs and eager inference, to reduce overhead from tree.flatten and traceback wrappers. Key changes include optimized symbolic tensor detection, faster tensor conversion in the Torch backend, and streamlined layer call logic. A bug was identified in the new error-handling mechanism in operation.py that would cause the loss of the original traceback during exception injection, and a fix was suggested to ensure proper debugging information is preserved.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #22616 +/- ##
==========================================
- Coverage 83.29% 83.19% -0.10%
==========================================
Files 596 596
Lines 68138 68311 +173
Branches 10613 10694 +81
==========================================
+ Hits 56754 56834 +80
- Misses 8638 8683 +45
- Partials 2746 2794 +48
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
465e6b4 to
09e0791
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces several performance optimizations across the Keras backend and layer infrastructure, primarily by implementing fast paths for common eager execution scenarios. These changes include optimized tensor checks in any_symbolic_tensors, a fast path for convert_to_tensor in the Torch backend, streamlined input compatibility checks, and a significant reduction in overhead for layer calls and mask metadata handling. Additionally, the pull request refactors traceback injection to avoid creating wrapper functions per call, instead injecting argument information only when an exception occurs. I have provided a high-severity suggestion to remove an unnecessary boolean cast in the layer call fast path to prevent potential device-to-host synchronization in PyTorch.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
I think this was too large a PR so I am doing them in more digestible smaller PRs. |
Summary
Reduces Keras[torch] eager dispatch overhead by avoiding unnecessary allocations and indirection in the hot path.
Addresses #22561.
Changes
Benchmark
Colab notebook on T4
GPU:
The remaining CNN gap includes Conv2D's NHWC/NCHW data format conversion overhead, which is a separate problem (#18457).