Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer.repo.push_to_hub returns None, causing raised exception #23712

Closed
3 of 4 tasks
RobertBaruch opened this issue May 23, 2023 · 3 comments · Fixed by #23751
Closed
3 of 4 tasks

Trainer.repo.push_to_hub returns None, causing raised exception #23712

RobertBaruch opened this issue May 23, 2023 · 3 comments · Fixed by #23751

Comments

@RobertBaruch
Copy link
Contributor

System Info

  • transformers version: 4.28.1
  • Platform: Windows-10-10.0.22621-SP0
  • Python version: 3.11.2
  • Huggingface_hub version: 0.14.1
  • Safetensors version: not installed
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: NO
  • Using distributed or parallel set-up in script?: NO

Who can help?

@sgugger

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

For some root cause that I'm not certain of, Trainer.repo.push_to_hub can return None, which causes Trainer._push_from_checkpoint to raise an exception (as it expects a tuple to be returned).

Traceback (most recent call last):
  File "F:\eo-reco\run_speech_recognition_ctc.py", line 810, in <module>
    main()
  File "F:\eo-reco\run_speech_recognition_ctc.py", line 756, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\eo-reco\.env\Lib\site-packages\transformers\trainer.py", line 1664, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "F:\eo-reco\.env\Lib\site-packages\transformers\trainer.py", line 2019, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "F:\eo-reco\.env\Lib\site-packages\transformers\trainer.py", line 2308, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "F:\eo-reco\.env\Lib\site-packages\transformers\trainer.py", line 2462, in _save_checkpoint
    self._push_from_checkpoint(output_dir)
  File "F:\eo-reco\.env\Lib\site-packages\transformers\trainer.py", line 3649, in _push_from_checkpoint
    _, self.push_in_progress = self.repo.push_to_hub(
    ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot unpack non-iterable NoneType object

(Note: line numbers in run_speech_recognition_ctc.py will not be accurate, as I've copied it and modified it)

repo.push_to_hub can return None if the repo is clean, which will cause the issue. However, that might not have happened in my case, since there was no corresponding log message about that (assuming logging would immediately be logged, and not buffered).

Expected behavior

No exception, maybe just a warning.

@sgugger
Copy link
Collaborator

sgugger commented May 23, 2023

cc @Wauplin can we have a consistent return type? That would solve this issue.

@Wauplin
Copy link
Contributor

Wauplin commented May 25, 2023

Hmm, what do you mean by a consistent return type ? If nothing is pushed, we can't really return a CommandInProgress object. In general I would prefer not to touch the return type of a method that seems to have been around for 2 years and that might be integrated in a lot of scripts already.

(+ I expect the usage of Repository to slowly disappear once we switch to upload_folder)

@sgugger
Copy link
Collaborator

sgugger commented May 25, 2023

I mean always a tuple so we don't have to make weird workarounds. But I will do the weird workaround in Transformers to fix this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants