[Fix] issues mentioned in comments of #22472#22595
[Fix] issues mentioned in comments of #22472#22595ChiragSW wants to merge 11 commits intokeras-team:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces input validation and axis canonicalization for several image, neural network, and numpy operations to improve API reliability. However, the logic used to determine tensor rank is fragile, as it fails for Python lists and causes crashes for KerasTensor instances with unknown rank. Additionally, the roll operation incorrectly validates the shift argument by not supporting broadcasting, and some manual checks are redundant with existing utility functions. The feedback highlights the need for more robust tensor conversion and consistent use of backend utilities to handle all valid input types and edge cases without crashing.
| ndim = len(getattr(x, "shape", [])) | ||
| if isinstance(axis, int): | ||
| if axis < -ndim or axis >= ndim: | ||
| raise ValueError( | ||
| f"axis {axis} is out of bounds for array of dimension {ndim}" | ||
| ) | ||
| axis = axis if axis >= 0 else axis + ndim | ||
| elif isinstance(axis, tuple): | ||
| canonical_axis = [] | ||
| for a in axis: | ||
| if not isinstance(a, int): | ||
| raise TypeError( | ||
| "Argument `axis` must be an integer or tuple of integers. " | ||
| f"Received: axis={axis}" | ||
| ) | ||
| if a < -ndim or a >= ndim: | ||
| raise ValueError( | ||
| f"axis {a} is out of bounds for array of dimension {ndim}" | ||
| ) | ||
| a = a if a >= 0 else a + ndim | ||
| canonical_axis.append(a) | ||
| axis = tuple(canonical_axis) |
There was a problem hiding this comment.
Using len(getattr(x, "shape", [])) to determine rank is problematic for two reasons:
- It returns
0for Python lists (which lack a.shapeattribute), causing valid calls likelog_softmax([1.0, 2.0])to fail with an out-of-bounds error. - It will crash with a
TypeErrorifx.shapeisNone, which is the case forKerasTensorinstances with unknown rank.
It is recommended to use backend.convert_to_tensor(x) first and then check if x.shape is not None before validating the axis.
References
- APIs should be consistent with established conventions (NumPy) and handle common input types like lists gracefully. (link)
| ndim = len(getattr(x, "shape", [])) | ||
| if not isinstance(axis, int): | ||
| raise TypeError( | ||
| f"Argument `axis` must be an integer. Received: axis={axis}" | ||
| ) | ||
| if axis < -ndim or axis >= ndim: | ||
| raise ValueError( | ||
| f"axis {axis} is out of bounds for array of dimension {ndim}" | ||
| ) | ||
| axis = axis if axis >= 0 else axis + ndim |
| ndim = len(getattr(x, "shape", [])) | ||
| if isinstance(axis, int): | ||
| canonicalize_axis(axis, ndim) | ||
| elif isinstance(axis, (tuple, list)): | ||
| for a in axis: | ||
| if not isinstance(a, int): | ||
| raise TypeError( | ||
| "Argument `axis` must be an integer or a sequence of " | ||
| f"integers. Received: axis={axis}" | ||
| ) | ||
| canonicalize_axis(a, ndim) | ||
| else: | ||
| raise TypeError( | ||
| "Argument `axis` must be an integer, a sequence of integers, " | ||
| f"or `None`. Received: axis={axis}" | ||
| ) |
There was a problem hiding this comment.
| ndim = len(getattr(x, "shape", [])) | ||
| if axis is None: | ||
| if isinstance(shift, (tuple, list)): | ||
| raise ValueError( | ||
| "When `axis` is `None`, `shift` must be an integer. " | ||
| f"Received: shift={shift}" | ||
| ) | ||
| elif isinstance(axis, int): | ||
| canonicalize_axis(axis, ndim) | ||
| if isinstance(shift, (tuple, list)): | ||
| raise ValueError( | ||
| "When `axis` is an integer, `shift` must be an integer. " | ||
| f"Received: shift={shift}" | ||
| ) | ||
| elif isinstance(axis, (tuple, list)): | ||
| for a in axis: | ||
| if not isinstance(a, int): | ||
| raise TypeError( | ||
| "Argument `axis` must be an integer or a sequence of " | ||
| f"integers. Received: axis={axis}" | ||
| ) | ||
| canonicalize_axis(a, ndim) | ||
| if not isinstance(shift, (tuple, list)) or len(shift) != len(axis): | ||
| raise ValueError( | ||
| "`shift` and `axis` must have the same size. " | ||
| f"Received: shift={shift}, axis={axis}" | ||
| ) | ||
| for s in shift: | ||
| if not isinstance(s, int): | ||
| raise TypeError( | ||
| "Argument `shift` must be an integer or a sequence of " | ||
| f"integers. Received: shift={shift}" | ||
| ) | ||
| else: | ||
| raise TypeError( | ||
| "Argument `axis` must be an integer, a sequence of integers, " | ||
| f"or `None`. Received: axis={axis}" | ||
| ) |
There was a problem hiding this comment.
There are two issues in this block:
- The
ndimdetection fails for lists and crashes on unknown rank tensors. - The logic for
shiftvalidation whenaxisis a sequence (lines 7054-7058) is incorrect. It currently raises aValueErrorifshiftis a single integer butaxisis a tuple/list. NumPy allows a single integershiftto be broadcast across all specified axes.
Suggested fix for the shift logic:
if isinstance(shift, (tuple, list)):
if len(shift) != len(axis):
raise ValueError(
"`shift` and `axis` must have the same size. "
f"Received: shift={shift}, axis={axis}"
)
for s in shift:
if not isinstance(s, int):
raise TypeError(
"Argument `shift` must be an integer or a sequence of "
f"integers. Received: shift={shift}"
)
elif not isinstance(shift, int):
raise TypeError(
"Argument `shift` must be an integer or a sequence of "
f"integers. Received: shift={shift}"
)| ndim = len(getattr(x, "shape", [])) | ||
| axis1 = canonicalize_axis(axis1, ndim) | ||
| axis2 = canonicalize_axis(axis2, ndim) | ||
| if axis1 == axis2: | ||
| raise ValueError( | ||
| f"axis1 and axis2 must be different. Received: axis1={axis1}, " | ||
| f"axis2={axis2}" | ||
| ) |
| ndim = len(getattr(x, "shape", [])) | ||
| axis1 = canonicalize_axis(self.axis1, ndim) | ||
| axis2 = canonicalize_axis(self.axis2, ndim) | ||
| if axis1 == axis2: | ||
| raise ValueError( | ||
| f"axis1 and axis2 must be different. Received: " | ||
| f"axis1={self.axis1}, axis2={self.axis2}" | ||
| ) |
There was a problem hiding this comment.
In compute_output_spec, x is a KerasTensor. If the rank is unknown (x.shape is None), len(getattr(x, "shape", [])) will crash. Since trace requires at least 2 dimensions, you should handle the None shape case by returning a KerasTensor with unknown shape or raising a more descriptive error if rank must be known.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #22595 +/- ##
==========================================
- Coverage 83.26% 83.01% -0.26%
==========================================
Files 596 596
Lines 67828 69050 +1222
Branches 10562 10855 +293
==========================================
+ Hits 56480 57322 +842
- Misses 8605 8861 +256
- Partials 2743 2867 +124
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
There are some crashes and bugs that need to be fixed. I will look into it soon |
Summary of new changes
|
|
The changes resolve the issues. Please review @keerthanakadiri @hertschuh |
hertschuh
left a comment
There was a problem hiding this comment.
I'm not following the context of this. It seems unrelated to the linked bug.
Are we missing some validation for the axis argument in some places?
If so, can you add tests with self.assertRaises to demonstrate the problem.
Also, there should never be any code in-between if any_symbolic_tensor(...) and the call to symbolic_call. Some validation can happen before the if. But some validation has to happen in the backend specific implementation after convert_to_tensor.
| if isinstance(axis, int): | ||
| if axis < -ndim or axis >= ndim: | ||
| raise ValueError( | ||
| f"axis {axis} is out of bounds for array of dimension {ndim}" | ||
| ) | ||
| axis = axis if axis >= 0 else axis + ndim |
There was a problem hiding this comment.
Use canonicalize_axis for this.
|
Problem is that Keras ops have two execution paths, symbolic and eager. Validation checks only lived in the eager path, so invalid inputs via keras.Input would silently pass through without any error. Fix:
Tests added |
|
The issue has been fixed. Please review @hertschuh |
hertschuh
left a comment
There was a problem hiding this comment.
First, this PR is combining a few unrelated things into one. Please split this PR in at least 3 separate PRs:
- the image padding / cropping validation
- the axis verification / canonicalization
- the RNG seed generator changes (and I don't understand the context of these changes)
Then, I see a lot of code duplication like this:
if axis is not None:
ndim = get_static_tensor_ndim(x)
if isinstance(axis, int):
if ndim is not None:
canonicalize_axis(axis, ndim)
elif isinstance(axis, (tuple, list)):
for a in axis:
if not isinstance(a, int):
raise TypeError(
"Argument `axis` must be an integer or a sequence "
f"of integers. Received: axis={axis}"
)
if ndim is not None:
canonicalize_axis(a, ndim)
else:
raise TypeError(
"Argument `axis` must be an integer, a sequence of "
f"integers, or `None`. Received: axis={axis}"
)Please create a helper function called canonicalize_axes after canonicalize_axis. It will take an int or a list of ints and always return a tuple of ints after doing the validation.
Next, see this pattern a lot:
# Some validation
if any_symbolic_tensors(...)
...
x = convert_to_tensor(x)
# The same validationThis is not a pattern that we should use:
- It creates a ton of code duplication
- We should let the backend implementation do the
x = convert_to_tensor(x)part, sometimes the backend implementation needs to look at the type or value of x before converting it to a tensor
If you can't do the validation because you don't fully know the shape of the input, it means it shouldn't be here. It's ok to move the validation in the backend specific implementations (as long as it's factored as one-liner) and in symbolic_call.
| and hasattr(images, "_keras_history") | ||
| and images._keras_history.operation.__class__.__name__ == "InputLayer" |
There was a problem hiding this comment.
I don't understand why you are looking at the keras history, you should never have to do that. Also, you shouldn't even care if it's a keras tensor or a normal tensor.
| import keras | ||
| from keras.src import backend |
There was a problem hiding this comment.
Do not import keras in unit test, please import the feature from keras.src
|
For now I will close this PR and put up 3 separate PRs. I will add the details in what each PR fixes. Should I go ahead with this process @hertschuh ?
|
Yes, thank you! |
Fixes issues mentioned in comments of #22472
Root cause
keras.ops functions branch like:
if any_symbolic_tensors(inputs): call Operation(...).symbolic_call() which relies on compute_output_spec() for validation
else: run backend/eager code paths where the real argument checks happen
So any checks that only existed in the eager helper/backend path were skipped for keras.Input.
Fix
keras.ops.image.pad_images / crop_images (
keras/src/ops/image.py)pad_images(): added the missing “must specify exactly two of …” argument validation in the public wrapper so it runs for both eager and symbolic inputs.crop_images(): added the same “exactly two of …” validation in the wrapper.CropImages.compute_output_spec(): added validation by inferring missing crop amounts when input_height/input_width are known, and raising if the inferred values would be negative.keras.ops.flip, keras.ops.roll, keras.ops.trace (
keras/src/ops/numpy.py)flip(): validate axis is None, int, or a sequence of ints.roll(): validate axis/shift compatibility.trace(): validate axis1 != axis2 during symbolic shape inference.keras.ops.log_softmax, keras.ops.sparsemax, keras.ops.sparse_categorical_crossentropy (
keras/src/ops/nn.py)log_softmax()andsparsemax(): validate axis bounds against rank when rank is known (so axis=-3 for a rank-2 input raises for symbolic too)sparse_categorical_crossentropy(): enforce the existing constraint axis == -1 in the wrapper so symbolic inputs don’t bypass it.