Description
While using keras.mixed_precision.set_global_policy("mixed_float16"), the GroupNormalization layer produces NaN outputs when the epsilon parameter is set to a value smaller than the precision limit of float16 (e.g., $1e-12$). Since the smallest positive normalized value in FP16 is approximately $6.1e-5$, a very small epsilon effectively acts as zero during variance normalization, leading to numerical collapse.
Poc
import os
os.environ["KERAS_BACKEND"] = "tensorflow"
import keras
import numpy as np
keras.mixed_precision.set_global_policy("mixed_float16")
inputs = keras.Input(shape=(16, 16, 64))
# Epsilon 1e-12 causes NaN in FP16
x = keras.layers.GroupNormalization(groups=8, epsilon=1e-12)(inputs)
model = keras.Model(inputs, x)
extreme_data = np.random.uniform(low=60000, high=70000, size=(1, 16, 16, 64)).astype("float32")
output = model(extreme_data)
print(f"NaN detected: {np.any(np.isnan(output))}")
Observed Result
Expected Behavior
The layer should either:
-
Automatically clip the epsilon to a safe minimum (e.g., 1e-7) when the policy is float16.
-
Perform the normalization internal math in float32 regardless of the policy (Upcasting).
Actual Behavior
The layer produces NaN values immediately when exposed to large input values or small epsilons in mixed_float16 mode.
Notes
- This issue arises from float16 precision limits rather than user misuse.
- Other normalization layers (e.g., BatchNormalization) internally upcast to float32 to avoid similar issues.
Description
While using keras.mixed_precision.set_global_policy("mixed_float16"), the GroupNormalization layer produces NaN outputs when the epsilon parameter is set to a value smaller than the precision limit of float16 (e.g.,$1e-12$ ). Since the smallest positive normalized value in FP16 is approximately $6.1e-5$ , a very small epsilon effectively acts as zero during variance normalization, leading to numerical collapse.
Poc
Observed Result
Expected Behavior
The layer should either:
Automatically clip the epsilon to a safe minimum (e.g., 1e-7) when the policy is float16.
Perform the normalization internal math in float32 regardless of the policy (Upcasting).
Actual Behavior
The layer produces NaN values immediately when exposed to large input values or small epsilons in mixed_float16 mode.
Notes