-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SparseCategoricalCrossentropy and Mixed Precision Training #15012
Comments
@zuyezheng, |
@rmothukuru ah thanks, looks like that one extended the findings from my original bug in the tf repo. |
copybara-service bot
pushed a commit
that referenced
this issue
Jul 30, 2021
PiperOrigin-RevId: 387838298
copybara-service bot
pushed a commit
that referenced
this issue
Jul 30, 2021
PiperOrigin-RevId: 387838298
copybara-service bot
pushed a commit
that referenced
this issue
Jul 30, 2021
PiperOrigin-RevId: 387838298
copybara-service bot
pushed a commit
that referenced
this issue
Jul 30, 2021
PiperOrigin-RevId: 387844394
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System information.
tf.keras.losses.SparseCategoricalCrossentropy()
Describe the problem.
sparse_categorical_crossentropy
inlosses.py
performs an unnecessary cast ofy_true
toy_pred.dtype
since it's then cast toint64
insparse_categorical_crossentropy
inkeras.backend.py
. Eventual call tosparse_softmax_cross_entropy_with_logits
innn_ops.py
is documented to expect int64 as well.This seems to be the same code as in
categorical_crossentropy
, but causes issues with sparse, especially with mixed precision training and float16 as the loss in precision causes incorrect encodings or labels outside the domain resulting in incorrect ornan
loss. With float16, issues start with a couple thousand labels and a couple hundred labels with bfloat16.Describe the current behavior.
Loss of precision for labels.
Describe the expected behavior.
Cast of
y_true
toy_pred.dtype
should be skipped.Contributing.
Standalone code to reproduce the issue.
https://colab.research.google.com/drive/1oRbNOnCo1i2HcXD2V4_-D1Bz2EVxiT65
The text was updated successfully, but these errors were encountered: