SparseCategoricalCrossentropy and Mixed Precision Training #15012

zuyezheng · 2021-07-28T04:24:52Z

System information.

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
TensorFlow installed from (source or binary): Binary
TensorFlow version (use command below): v2.5.0-0
Python version: 3.8
Bazel version (if compiling from source):
GPU model and memory: A6000
Exact command to reproduce:

tf.keras.losses.SparseCategoricalCrossentropy()

Describe the problem.

sparse_categorical_crossentropy in losses.py performs an unnecessary cast of y_true to y_pred.dtype since it's then cast to int64 in sparse_categorical_crossentropy in keras.backend.py. Eventual call to sparse_softmax_cross_entropy_with_logits in nn_ops.py is documented to expect int64 as well.

This seems to be the same code as in categorical_crossentropy, but causes issues with sparse, especially with mixed precision training and float16 as the loss in precision causes incorrect encodings or labels outside the domain resulting in incorrect or nan loss. With float16, issues start with a couple thousand labels and a couple hundred labels with bfloat16.

Describe the current behavior.

Loss of precision for labels.

Describe the expected behavior.

Cast of y_true to y_pred.dtype should be skipped.

Contributing.

Do you want to contribute a PR? (yes/no):
If yes, please read this page for instructions
Briefly describe your candidate solution(if contributing):

Standalone code to reproduce the issue.

https://colab.research.google.com/drive/1oRbNOnCo1i2HcXD2V4_-D1Bz2EVxiT65

The text was updated successfully, but these errors were encountered:

rmothukuru · 2021-07-28T12:07:42Z

@zuyezheng,
Similar issue has been raised in #15014 with other losses and a PR also has been raised. Can we close this issue so that we can track it in #15014? Thanks!

zuyezheng · 2021-07-28T16:57:32Z

@rmothukuru ah thanks, looks like that one extended the findings from my original bug in the tf repo.

PiperOrigin-RevId: 387838298

PiperOrigin-RevId: 387844394

zuyezheng mentioned this issue Jul 28, 2021

SparseCategoricalCrossentropy and Mixed Precision Training tensorflow/tensorflow#50964

Closed

rmothukuru self-assigned this Jul 28, 2021

rmothukuru added the type:bug/performance label Jul 28, 2021

rmothukuru added the stat:awaiting response from contributor label Jul 28, 2021

zuyezheng closed this as completed Jul 28, 2021

zuyezheng mentioned this issue Jul 28, 2021

y_true gets casted to y_pred.dtype in every losses #15014

Closed

copybara-service bot mentioned this issue Jul 30, 2021

Add a unit test to cover https://github.com/keras-team/keras/issues/15012. #15037

Closed

copybara-service bot pushed a commit that referenced this issue Jul 30, 2021

Add a unit test to cover #15012.

ba40907

PiperOrigin-RevId: 387838298

copybara-service bot pushed a commit that referenced this issue Jul 30, 2021

Add a unit test to cover #15012.

9a650ad

PiperOrigin-RevId: 387838298

copybara-service bot pushed a commit that referenced this issue Jul 30, 2021

Add a unit test to cover #15012.

4ad0dc8

PiperOrigin-RevId: 387838298

copybara-service bot pushed a commit that referenced this issue Jul 30, 2021

Add a unit test to cover #15012.

2d79a2e

PiperOrigin-RevId: 387844394

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SparseCategoricalCrossentropy and Mixed Precision Training #15012

SparseCategoricalCrossentropy and Mixed Precision Training #15012

zuyezheng commented Jul 28, 2021 •

edited

Loading

rmothukuru commented Jul 28, 2021

zuyezheng commented Jul 28, 2021 •

edited

Loading

SparseCategoricalCrossentropy and Mixed Precision Training #15012

SparseCategoricalCrossentropy and Mixed Precision Training #15012

Comments

zuyezheng commented Jul 28, 2021 • edited Loading

rmothukuru commented Jul 28, 2021

zuyezheng commented Jul 28, 2021 • edited Loading

zuyezheng commented Jul 28, 2021 •

edited

Loading

zuyezheng commented Jul 28, 2021 •

edited

Loading