Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Cambricon MLUs support #29627

Merged
merged 9 commits into from
Mar 27, 2024
Merged

add Cambricon MLUs support #29627

merged 9 commits into from
Mar 27, 2024

Conversation

huismiling
Copy link
Contributor

@huismiling huismiling commented Mar 13, 2024

What does this PR do?

Currently, Accelerate has supported cambricon mlu (huggingface/accelerate#2552).
This PR enables users to leverage the cambricon mlu for training and inference of 🤗 Transformers models.
For example, you can run the official glue text-classification task using cambricon mlu with below command:

python -m torch.distributed.run --nproc_per_node 8 run_glue.py \
  --model_name_or_path bert-base-cased \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --output_dir ./output

Below are the output logs:

WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
03/13/2024 17:15:35 - WARNING - __main__ - Process rank: 7, device: mlu:7, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 1, device: mlu:1, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 4, device: mlu:4, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 6, device: mlu:6, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 5, device: mlu:5, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 2, device: mlu:2, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 3, device: mlu:3, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 0, device: mlu:0, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./output/runs/Mar13_17-15-36_375a816cfa18,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_kwargs={},
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=./output,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=32,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=./output,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=500,
save_strategy=steps,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
split_batches=None,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
Generating dataset glue (/root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
03/13/2024 17:15:56 - INFO - datasets.builder - Generating dataset glue (/root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
Downloading and preparing dataset glue/sst2 to /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c...
03/13/2024 17:15:56 - INFO - datasets.builder - Downloading and preparing dataset glue/sst2 to /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c...
Dataset not on Hf google storage. Downloading and preparing it from source
03/13/2024 17:15:58 - INFO - datasets.builder - Dataset not on Hf google storage. Downloading and preparing it from source
hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/train-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899.incomplete
03/13/2024 17:15:59 - INFO - datasets.utils.file_utils - hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/train-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899.incomplete

Downloading data:   0%|          | 0.00/3.11M [00:00<?, ?B/s]
Downloading data: 100%|██████████| 3.11M/3.11M [00:07<00:00, 440kB/s]
Downloading data: 100%|██████████| 3.11M/3.11M [00:07<00:00, 439kB/s]
storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/train-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899
03/13/2024 17:16:07 - INFO - datasets.utils.file_utils - storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/train-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899
creating metadata file for /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899
03/13/2024 17:16:07 - INFO - datasets.utils.file_utils - creating metadata file for /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899
hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/validation-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3.incomplete
03/13/2024 17:16:07 - INFO - datasets.utils.file_utils - hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/validation-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3.incomplete

Downloading data:   0%|          | 0.00/72.8k [00:00<?, ?B/s]
Downloading data: 100%|██████████| 72.8k/72.8k [00:03<00:00, 22.7kB/s]
Downloading data: 100%|██████████| 72.8k/72.8k [00:03<00:00, 22.7kB/s]
storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/validation-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3
03/13/2024 17:16:11 - INFO - datasets.utils.file_utils - storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/validation-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3
creating metadata file for /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3
03/13/2024 17:16:11 - INFO - datasets.utils.file_utils - creating metadata file for /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3
hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/test-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e.incomplete
03/13/2024 17:16:12 - INFO - datasets.utils.file_utils - hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/test-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e.incomplete

Downloading data:   0%|          | 0.00/148k [00:00<?, ?B/s]
Downloading data: 100%|██████████| 148k/148k [00:02<00:00, 51.3kB/s]
Downloading data: 100%|██████████| 148k/148k [00:02<00:00, 51.2kB/s]
storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/test-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e
03/13/2024 17:16:16 - INFO - datasets.utils.file_utils - storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/test-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e
creating metadata file for /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e
03/13/2024 17:16:16 - INFO - datasets.utils.file_utils - creating metadata file for /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e
Downloading took 0.0 min
03/13/2024 17:16:16 - INFO - datasets.download.download_manager - Downloading took 0.0 min
Checksum Computation took 0.0 min
03/13/2024 17:16:16 - INFO - datasets.download.download_manager - Checksum Computation took 0.0 min
Generating train split
03/13/2024 17:16:16 - INFO - datasets.builder - Generating train split

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 67349/67349 [00:00<00:00, 803179.31 examples/s]
Generating validation split
03/13/2024 17:16:16 - INFO - datasets.builder - Generating validation split

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]
Generating validation split: 100%|██████████| 872/872 [00:00<00:00, 254802.36 examples/s]
Generating test split
03/13/2024 17:16:16 - INFO - datasets.builder - Generating test split

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]
Generating test split: 100%|██████████| 1821/1821 [00:00<00:00, 433992.14 examples/s]
All the splits matched successfully.
03/13/2024 17:16:16 - INFO - datasets.utils.info_utils - All the splits matched successfully.
Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c. Subsequent calls will reuse this data.
03/13/2024 17:16:16 - INFO - datasets.builder - Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c. Subsequent calls will reuse this data.
[INFO|configuration_utils.py:726] 2024-03-13 17:16:16,916 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/config.json
[INFO|configuration_utils.py:789] 2024-03-13 17:16:16,921 >> Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "finetuning_task": "sst2",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|configuration_utils.py:726] 2024-03-13 17:16:17,266 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/config.json
[INFO|configuration_utils.py:789] 2024-03-13 17:16:17,267 >> Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|tokenization_utils_base.py:2057] 2024-03-13 17:16:17,269 >> loading file vocab.txt from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/vocab.txt
[INFO|tokenization_utils_base.py:2057] 2024-03-13 17:16:17,269 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/tokenizer.json
[INFO|tokenization_utils_base.py:2057] 2024-03-13 17:16:17,269 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2057] 2024-03-13 17:16:17,269 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:2057] 2024-03-13 17:16:17,269 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/tokenizer_config.json
[INFO|configuration_utils.py:726] 2024-03-13 17:16:17,270 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/config.json
[INFO|configuration_utils.py:789] 2024-03-13 17:16:17,270 >> Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|modeling_utils.py:3262] 2024-03-13 17:16:17,366 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/model.safetensors
[WARNING|modeling_utils.py:4003] 2024-03-13 17:16:17,576 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:4003] 2024-03-13 17:16:17,576 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:3991] 2024-03-13 17:16:17,605 >> Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:4003] 2024-03-13 17:16:17,605 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Running tokenizer on dataset:   0%|          | 0/67349 [00:00<?, ? examples/s]Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-640415216a56de40.arrow
03/13/2024 17:16:17 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-640415216a56de40.arrow

Running tokenizer on dataset: 100%|██████████| 67349/67349 [00:04<00:00, 16134.96 examples/s]

Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-c96b916c1e3d687e.arrow
03/13/2024 17:16:21 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-c96b916c1e3d687e.arrow

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 14229.71 examples/s]

Running tokenizer on dataset:   0%|          | 0/1821 [00:00<?, ? examples/s]Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-e4504bc916b92272.arrow
03/13/2024 17:16:21 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-e4504bc916b92272.arrow

Running tokenizer on dataset: 100%|██████████| 1821/1821 [00:00<00:00, 17498.81 examples/s]
Running tokenizer on dataset: 100%|██████████| 1821/1821 [00:00<00:00, 16532.99 examples/s]
[WARNING|modeling_utils.py:4003] 2024-03-13 17:17:02,433 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:4003] 2024-03-13 17:17:38,812 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:4003] 2024-03-13 17:17:38,984 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:4003] 2024-03-13 17:17:39,255 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:4003] 2024-03-13 17:17:48,521 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
03/13/2024 17:17:52 - INFO - __main__ - Sample 14592 of the training set: {'sentence': 'a great movie ', 'label': 1, 'idx': 14592, 'input_ids': [101, 170, 1632, 2523, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.

Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]03/13/2024 17:17:52 - INFO - __main__ - Sample 3278 of the training set: {'sentence': 'entertaining , if somewhat standardized , action ', 'label': 1, 'idx': 3278, 'input_ids': [101, 15021, 117, 1191, 4742, 18013, 117, 2168, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
03/13/2024 17:17:52 - INFO - __main__ - Sample 36048 of the training set: {'sentence': 'even when there are lulls , the emotions seem authentic , ', 'label': 1, 'idx': 36048, 'input_ids': [101, 1256, 1165, 1175, 1132, 181, 11781, 1116, 117, 1103, 6288, 3166, 16047, 117, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 9416.42 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 8446.26 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 8542.17 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 8249.32 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 8223.53 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 8410.62 examples/s]
Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 7751.82 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 7052.99 examples/s]
Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 6520.01 examples/s]
[INFO|trainer.py:1826] 2024-03-13 17:18:27,506 >> ***** Running training *****
[INFO|trainer.py:1827] 2024-03-13 17:18:27,506 >>   Num examples = 67,349
[INFO|trainer.py:1828] 2024-03-13 17:18:27,506 >>   Num Epochs = 3
[INFO|trainer.py:1829] 2024-03-13 17:18:27,506 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:1832] 2024-03-13 17:18:27,506 >>   Total train batch size (w. parallel, distributed & accumulation) = 256
[INFO|trainer.py:1833] 2024-03-13 17:18:27,506 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1834] 2024-03-13 17:18:27,506 >>   Total optimization steps = 792
[INFO|trainer.py:1835] 2024-03-13 17:18:27,507 >>   Number of trainable parameters = 108,311,810

***** train metrics *****
  epoch                    =        3.0
  train_loss               =     0.1624
  train_samples            =      67349

***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.9106
  eval_loss               =     0.2796
  eval_samples            =        872

@huismiling huismiling closed this Mar 14, 2024
@huismiling huismiling reopened this Mar 14, 2024
@huismiling
Copy link
Contributor Author

@muellerzr Hi, could you help to review this PR? thx.

Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me, I can't see any real issues with what we have going on here.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Do you want to add a bit of doc about this? 🤗

@huismiling
Copy link
Contributor Author

LGTM! Do you want to add a bit of doc about this? 🤗

The MLU torch has good compatibility with CUDA torch, and it is simple and convenient to use. The usage of MLU torch is the same as CUDA torch, and you can refer to the CUDA documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants