Trainer.repo.push_to_hub returns None, causing raised exception #23712

RobertBaruch · 2023-05-23T18:07:11Z

System Info

transformers version: 4.28.1
Platform: Windows-10-10.0.22621-SP0
Python version: 3.11.2
Huggingface_hub version: 0.14.1
Safetensors version: not installed
PyTorch version (GPU?): 2.0.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: NO
Using distributed or parallel set-up in script?: NO

Who can help?

@sgugger

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

For some root cause that I'm not certain of, Trainer.repo.push_to_hub can return None, which causes Trainer._push_from_checkpoint to raise an exception (as it expects a tuple to be returned).

Traceback (most recent call last):
  File "F:\eo-reco\run_speech_recognition_ctc.py", line 810, in <module>
    main()
  File "F:\eo-reco\run_speech_recognition_ctc.py", line 756, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\eo-reco\.env\Lib\site-packages\transformers\trainer.py", line 1664, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "F:\eo-reco\.env\Lib\site-packages\transformers\trainer.py", line 2019, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "F:\eo-reco\.env\Lib\site-packages\transformers\trainer.py", line 2308, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "F:\eo-reco\.env\Lib\site-packages\transformers\trainer.py", line 2462, in _save_checkpoint
    self._push_from_checkpoint(output_dir)
  File "F:\eo-reco\.env\Lib\site-packages\transformers\trainer.py", line 3649, in _push_from_checkpoint
    _, self.push_in_progress = self.repo.push_to_hub(
    ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot unpack non-iterable NoneType object

(Note: line numbers in run_speech_recognition_ctc.py will not be accurate, as I've copied it and modified it)

repo.push_to_hub can return None if the repo is clean, which will cause the issue. However, that might not have happened in my case, since there was no corresponding log message about that (assuming logging would immediately be logged, and not buffered).

Expected behavior

No exception, maybe just a warning.

The text was updated successfully, but these errors were encountered:

sgugger · 2023-05-23T18:46:09Z

cc @Wauplin can we have a consistent return type? That would solve this issue.

Wauplin · 2023-05-25T06:37:42Z

Hmm, what do you mean by a consistent return type ? If nothing is pushed, we can't really return a CommandInProgress object. In general I would prefer not to touch the return type of a method that seems to have been around for 2 years and that might be integrated in a lot of scripts already.

(+ I expect the usage of Repository to slowly disappear once we switch to upload_folder)

sgugger · 2023-05-25T11:36:23Z

I mean always a tuple so we don't have to make weird workarounds. But I will do the weird workaround in Transformers to fix this then.

sgugger mentioned this issue May 25, 2023

Fix psuh_to_hub in Trainer when nothing needs pushing #23751

Merged

sgugger closed this as completed in #23751 May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainer.repo.push_to_hub returns None, causing raised exception #23712

Trainer.repo.push_to_hub returns None, causing raised exception #23712

RobertBaruch commented May 23, 2023

sgugger commented May 23, 2023 •

edited

Loading

Wauplin commented May 25, 2023

sgugger commented May 25, 2023

Trainer.repo.push_to_hub returns None, causing raised exception #23712

Trainer.repo.push_to_hub returns None, causing raised exception #23712

Comments

RobertBaruch commented May 23, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

sgugger commented May 23, 2023 • edited Loading

Wauplin commented May 25, 2023

sgugger commented May 25, 2023

sgugger commented May 23, 2023 •

edited

Loading