Fix uploading processors/tokenizers to WandB on train end #35701

jack89roberts · 2025-01-15T00:08:35Z

What does this PR do?

Since tokenizer was renamed to processing_class in #32385 models uploaded to WandB at the end of training have not included the tokenizer/processor config. This is due to a missed update to the argument names in WandbCallback.on_train_end, which is fixed in this PR. The bug has only impacted the final artifacts saved at the end of training, checkpoints saved during training still upload both the model and the processor config.

If you have wandb setup and logged in you can see the problem by running examples/pytorch/text-classification on main and this branch, and comparing the artifacts logged to your wandb run (you'll need to delete the local output directory rm -rf /tmp/mrpc before starting the script on each branch):

export TASK_NAME=mrpc                            
export WANDB_LOG_MODEL=checkpoint
python run_glue.py \
  --model_name_or_path google-bert/bert-base-cased \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 2 \
  --output_dir /tmp/$TASK_NAME/ \
  --report_to wandb \
  --save_strategy epoch \
  --save_total_limit 2 \
  --load_best_model_at_end True \
  --eval_strategy epoch

The files in the logged :latest model artifact in your run will include the tokenizer config for the run from this branch, but not in the run started on main.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Previous changes: @amyeroberts

Maybe Trainer: @muellerzr, @SunMarc , though this is more an integration/WandB issue.

jack89roberts · 2025-01-15T00:23:51Z

tests_torch_and_tf has failed in circleci but I don't see how that can be related to these changes. I'm happy to try to fix it if there's something I can do/someone can point me in the right direction, though:

   1 failed because `AssertionError: 0.00021042596 not less than or equal to 0.0002 : outputs.logits` -> Difference between PyTorch and TF is 0.00021042596199549735 (>= 0.0002) for Data2VecVisionForSemanticSegmentation
Number of failed tests: 1

Edit: Passed after the 2nd commit so I suspect it was some unlucky randomness in the tests?

jack89roberts · 2025-01-15T00:54:59Z

Looking at the rest of transformers/integrations/integration_utils.py I suspect all instances of tokenizer should be replaced with processing_class. I have pushed that here as well - there are 6 other instances of it across ClearMLCallback and DVCLiveCallback but I'm unfamiliar with those and so less certain about them than the WandB updates.

…llback

ayulockin · 2025-01-15T04:48:07Z

Hey @jack89roberts, I am from W&B. Thanks for opening this PR. LGTM. :)

HuggingFaceDocBuilderDev · 2025-01-15T08:38:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Indeed, this is something we missed from the PR you mentioned. We need to have this change since we have the following in CallbackHandler

def call_event(self, event, args, state, control, **kwargs):
        for callback in self.callbacks:
            result = getattr(callback, event)(
                args,
                state,
                control,
                model=self.model,
                processing_class=self.processing_class,
                optimizer=self.optimizer,
                lr_scheduler=self.lr_scheduler,
                train_dataloader=self.train_dataloader,
                eval_dataloader=self.eval_dataloader,
                **kwargs,
            )
            # A Callback can skip the return of `control` if it doesn't change it.
            if result is not None:
                control = result
        return control

jack89roberts · 2025-01-23T11:09:26Z

Hi, sorry to prod but is there anything I can do to help this get merged/make sure it doesn't get forgotten about? It's stopping us from using transformers >=4.46.0.

MekkCyber · 2025-01-23T11:17:59Z

Hi @jack89roberts, sorry for the delay we are just waiting for the review of a core maintainer

ArthurZucker

Wow sorry and thanks all for the update! INdeed we changed the api I think 2-3 month ago!

jack89roberts · 2025-01-23T12:59:18Z

Thanks!

…e#35701) * rename tokenizer to processing_class in WandbCallback.on_train_end * rename tokenizer to processing_class in ClearMLCallback and DVCLiveCallback

rename tokenizer to processing_class in WandbCallback.on_train_end

ca25d9b

jack89roberts requested review from SunMarc, MekkCyber and muellerzr as code owners January 15, 2025 00:08

rename tokenizer to processing_class in ClearMLCallback and DVCLiveCa…

52a2eba

…llback

SunMarc approved these changes Jan 15, 2025

View reviewed changes

SunMarc requested a review from ArthurZucker January 15, 2025 14:10

MekkCyber approved these changes Jan 23, 2025

View reviewed changes

ArthurZucker approved these changes Jan 23, 2025

View reviewed changes

ArthurZucker merged commit 0a950e0 into huggingface:main Jan 23, 2025
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix uploading processors/tokenizers to WandB on train end #35701

Fix uploading processors/tokenizers to WandB on train end #35701

jack89roberts commented Jan 15, 2025 •

edited

Loading

jack89roberts commented Jan 15, 2025 •

edited

Loading

jack89roberts commented Jan 15, 2025 •

edited

Loading

ayulockin commented Jan 15, 2025

HuggingFaceDocBuilderDev commented Jan 15, 2025

SunMarc left a comment

jack89roberts commented Jan 23, 2025

MekkCyber commented Jan 23, 2025

ArthurZucker left a comment

jack89roberts commented Jan 23, 2025

Fix uploading processors/tokenizers to WandB on train end #35701

Fix uploading processors/tokenizers to WandB on train end #35701

Conversation

jack89roberts commented Jan 15, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

jack89roberts commented Jan 15, 2025 • edited Loading

jack89roberts commented Jan 15, 2025 • edited Loading

ayulockin commented Jan 15, 2025

HuggingFaceDocBuilderDev commented Jan 15, 2025

SunMarc left a comment

Choose a reason for hiding this comment

jack89roberts commented Jan 23, 2025

MekkCyber commented Jan 23, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

jack89roberts commented Jan 23, 2025

jack89roberts commented Jan 15, 2025 •

edited

Loading

jack89roberts commented Jan 15, 2025 •

edited

Loading

jack89roberts commented Jan 15, 2025 •

edited

Loading