add Cambricon MLUs support #29627

huismiling · 2024-03-13T09:34:55Z

What does this PR do?

Currently, Accelerate has supported cambricon mlu (huggingface/accelerate#2552).
This PR enables users to leverage the cambricon mlu for training and inference of 🤗 Transformers models.
For example, you can run the official glue text-classification task using cambricon mlu with below command:

python -m torch.distributed.run --nproc_per_node 8 run_glue.py \
  --model_name_or_path bert-base-cased \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --output_dir ./output

Below are the output logs:

WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
03/13/2024 17:15:35 - WARNING - __main__ - Process rank: 7, device: mlu:7, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 1, device: mlu:1, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 4, device: mlu:4, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 6, device: mlu:6, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 5, device: mlu:5, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 2, device: mlu:2, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 3, device: mlu:3, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - WARNING - __main__ - Process rank: 0, device: mlu:0, n_gpu: 1, distributed training: True, 16-bits training: False
03/13/2024 17:15:36 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./output/runs/Mar13_17-15-36_375a816cfa18,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_kwargs={},
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=./output,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=32,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=./output,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=500,
save_strategy=steps,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
split_batches=None,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
Generating dataset glue (/root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
03/13/2024 17:15:56 - INFO - datasets.builder - Generating dataset glue (/root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
Downloading and preparing dataset glue/sst2 to /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c...
03/13/2024 17:15:56 - INFO - datasets.builder - Downloading and preparing dataset glue/sst2 to /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c...
Dataset not on Hf google storage. Downloading and preparing it from source
03/13/2024 17:15:58 - INFO - datasets.builder - Dataset not on Hf google storage. Downloading and preparing it from source
hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/train-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899.incomplete
03/13/2024 17:15:59 - INFO - datasets.utils.file_utils - hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/train-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899.incomplete

Downloading data:   0%|          | 0.00/3.11M [00:00<?, ?B/s]
Downloading data: 100%|██████████| 3.11M/3.11M [00:07<00:00, 440kB/s]
Downloading data: 100%|██████████| 3.11M/3.11M [00:07<00:00, 439kB/s]
storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/train-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899
03/13/2024 17:16:07 - INFO - datasets.utils.file_utils - storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/train-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899
creating metadata file for /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899
03/13/2024 17:16:07 - INFO - datasets.utils.file_utils - creating metadata file for /root/.cache/huggingface/datasets/downloads/a531a91fec9104efb64466a1b57912233ee243aa1de5cbe6a1c45642333e4899
hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/validation-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3.incomplete
03/13/2024 17:16:07 - INFO - datasets.utils.file_utils - hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/validation-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3.incomplete

Downloading data:   0%|          | 0.00/72.8k [00:00<?, ?B/s]
Downloading data: 100%|██████████| 72.8k/72.8k [00:03<00:00, 22.7kB/s]
Downloading data: 100%|██████████| 72.8k/72.8k [00:03<00:00, 22.7kB/s]
storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/validation-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3
03/13/2024 17:16:11 - INFO - datasets.utils.file_utils - storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/validation-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3
creating metadata file for /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3
03/13/2024 17:16:11 - INFO - datasets.utils.file_utils - creating metadata file for /root/.cache/huggingface/datasets/downloads/20be13c6ed68ffad9b6e0535c77967449aa1c61808348eacfc6114b0178639a3
hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/test-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e.incomplete
03/13/2024 17:16:12 - INFO - datasets.utils.file_utils - hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/test-00000-of-00001.parquet not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e.incomplete

Downloading data:   0%|          | 0.00/148k [00:00<?, ?B/s]
Downloading data: 100%|██████████| 148k/148k [00:02<00:00, 51.3kB/s]
Downloading data: 100%|██████████| 148k/148k [00:02<00:00, 51.2kB/s]
storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/test-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e
03/13/2024 17:16:16 - INFO - datasets.utils.file_utils - storing hf://datasets/glue@bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/sst2/test-00000-of-00001.parquet in cache at /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e
creating metadata file for /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e
03/13/2024 17:16:16 - INFO - datasets.utils.file_utils - creating metadata file for /root/.cache/huggingface/datasets/downloads/215861dee86309fdb9492a68ae4640bf01708f31253b35a9c3449e19aa2f111e
Downloading took 0.0 min
03/13/2024 17:16:16 - INFO - datasets.download.download_manager - Downloading took 0.0 min
Checksum Computation took 0.0 min
03/13/2024 17:16:16 - INFO - datasets.download.download_manager - Checksum Computation took 0.0 min
Generating train split
03/13/2024 17:16:16 - INFO - datasets.builder - Generating train split

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 67349/67349 [00:00<00:00, 803179.31 examples/s]
Generating validation split
03/13/2024 17:16:16 - INFO - datasets.builder - Generating validation split

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]
Generating validation split: 100%|██████████| 872/872 [00:00<00:00, 254802.36 examples/s]
Generating test split
03/13/2024 17:16:16 - INFO - datasets.builder - Generating test split

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]
Generating test split: 100%|██████████| 1821/1821 [00:00<00:00, 433992.14 examples/s]
All the splits matched successfully.
03/13/2024 17:16:16 - INFO - datasets.utils.info_utils - All the splits matched successfully.
Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c. Subsequent calls will reuse this data.
03/13/2024 17:16:16 - INFO - datasets.builder - Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c. Subsequent calls will reuse this data.
[INFO|configuration_utils.py:726] 2024-03-13 17:16:16,916 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/config.json
[INFO|configuration_utils.py:789] 2024-03-13 17:16:16,921 >> Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "finetuning_task": "sst2",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|configuration_utils.py:726] 2024-03-13 17:16:17,266 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/config.json
[INFO|configuration_utils.py:789] 2024-03-13 17:16:17,267 >> Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|tokenization_utils_base.py:2057] 2024-03-13 17:16:17,269 >> loading file vocab.txt from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/vocab.txt
[INFO|tokenization_utils_base.py:2057] 2024-03-13 17:16:17,269 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/tokenizer.json
[INFO|tokenization_utils_base.py:2057] 2024-03-13 17:16:17,269 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2057] 2024-03-13 17:16:17,269 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:2057] 2024-03-13 17:16:17,269 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/tokenizer_config.json
[INFO|configuration_utils.py:726] 2024-03-13 17:16:17,270 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/config.json
[INFO|configuration_utils.py:789] 2024-03-13 17:16:17,270 >> Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|modeling_utils.py:3262] 2024-03-13 17:16:17,366 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--bert-base-cased/snapshots/cd5ef92a9fb2f889e972770a36d4ed042daf221e/model.safetensors
[WARNING|modeling_utils.py:4003] 2024-03-13 17:16:17,576 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:4003] 2024-03-13 17:16:17,576 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:3991] 2024-03-13 17:16:17,605 >> Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:4003] 2024-03-13 17:16:17,605 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Running tokenizer on dataset:   0%|          | 0/67349 [00:00<?, ? examples/s]Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-640415216a56de40.arrow
03/13/2024 17:16:17 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-640415216a56de40.arrow

Running tokenizer on dataset: 100%|██████████| 67349/67349 [00:04<00:00, 16134.96 examples/s]

Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-c96b916c1e3d687e.arrow
03/13/2024 17:16:21 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-c96b916c1e3d687e.arrow

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 14229.71 examples/s]

Running tokenizer on dataset:   0%|          | 0/1821 [00:00<?, ? examples/s]Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-e4504bc916b92272.arrow
03/13/2024 17:16:21 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/glue/sst2/0.0.0/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c/cache-e4504bc916b92272.arrow

Running tokenizer on dataset: 100%|██████████| 1821/1821 [00:00<00:00, 17498.81 examples/s]
Running tokenizer on dataset: 100%|██████████| 1821/1821 [00:00<00:00, 16532.99 examples/s]
[WARNING|modeling_utils.py:4003] 2024-03-13 17:17:02,433 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:4003] 2024-03-13 17:17:38,812 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:4003] 2024-03-13 17:17:38,984 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:4003] 2024-03-13 17:17:39,255 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:4003] 2024-03-13 17:17:48,521 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
03/13/2024 17:17:52 - INFO - __main__ - Sample 14592 of the training set: {'sentence': 'a great movie ', 'label': 1, 'idx': 14592, 'input_ids': [101, 170, 1632, 2523, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.

Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]
Running tokenizer on dataset:   0%|          | 0/872 [00:00<?, ? examples/s]03/13/2024 17:17:52 - INFO - __main__ - Sample 3278 of the training set: {'sentence': 'entertaining , if somewhat standardized , action ', 'label': 1, 'idx': 3278, 'input_ids': [101, 15021, 117, 1191, 4742, 18013, 117, 2168, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
03/13/2024 17:17:52 - INFO - __main__ - Sample 36048 of the training set: {'sentence': 'even when there are lulls , the emotions seem authentic , ', 'label': 1, 'idx': 36048, 'input_ids': [101, 1256, 1165, 1175, 1132, 181, 11781, 1116, 117, 1103, 6288, 3166, 16047, 117, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 9416.42 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 8446.26 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 8542.17 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 8249.32 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 8223.53 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 8410.62 examples/s]
Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 7751.82 examples/s]

Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 7052.99 examples/s]
Running tokenizer on dataset: 100%|██████████| 872/872 [00:00<00:00, 6520.01 examples/s]
[INFO|trainer.py:1826] 2024-03-13 17:18:27,506 >> ***** Running training *****
[INFO|trainer.py:1827] 2024-03-13 17:18:27,506 >>   Num examples = 67,349
[INFO|trainer.py:1828] 2024-03-13 17:18:27,506 >>   Num Epochs = 3
[INFO|trainer.py:1829] 2024-03-13 17:18:27,506 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:1832] 2024-03-13 17:18:27,506 >>   Total train batch size (w. parallel, distributed & accumulation) = 256
[INFO|trainer.py:1833] 2024-03-13 17:18:27,506 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1834] 2024-03-13 17:18:27,506 >>   Total optimization steps = 792
[INFO|trainer.py:1835] 2024-03-13 17:18:27,507 >>   Number of trainable parameters = 108,311,810

***** train metrics *****
  epoch                    =        3.0
  train_loss               =     0.1624
  train_samples            =      67349

***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.9106
  eval_loss               =     0.2796
  eval_samples            =        872

huismiling · 2024-03-14T09:25:47Z

@muellerzr Hi, could you help to review this PR? thx.

muellerzr

Overall this looks good to me, I can't see any real issues with what we have going on here.

HuggingFaceDocBuilderDev · 2024-03-25T15:34:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

LGTM! Do you want to add a bit of doc about this? 🤗

huismiling · 2024-03-27T07:17:59Z

LGTM! Do you want to add a bit of doc about this? 🤗

The MLU torch has good compatibility with CUDA torch, and it is simple and convenient to use. The usage of MLU torch is the same as CUDA torch, and you can refer to the CUDA documentation.

huismiling added 5 commits March 13, 2024 17:10

add Cambricon MLUs support

2d36736

fix mlu device rng state

3ac7123

up for quality check

51ec085

up mlu to support fp16

db0fbe6

fix mlu device dependency error

fe39b56

huismiling closed this Mar 14, 2024

huismiling added 2 commits March 14, 2024 11:17

fix mlu device dependency error

c43b68e

enable mlu device for bf16

bd4d3f9

huismiling reopened this Mar 14, 2024

huismiling added 2 commits March 25, 2024 16:45

fix mlu device memory tracker

988b2d6

merge from hf main

e33678f

muellerzr approved these changes Mar 25, 2024

View reviewed changes

ArthurZucker approved these changes Mar 26, 2024

View reviewed changes

ArthurZucker merged commit 7576974 into huggingface:main Mar 27, 2024
21 checks passed

huismiling mentioned this pull request Apr 29, 2024

support Cambricon MLUs device huggingface/peft#1687

Merged

huismiling mentioned this pull request May 11, 2024

support for Cambricon MLUs device huggingface/safetensors#479

Closed

fmo-mt mentioned this pull request Jul 11, 2024

Support MUSA (Moore Threads GPU) backend in transformers #31913

Merged

huismiling mentioned this pull request Oct 24, 2024

Add support for Cambricon mlu devices huggingface/safetensors#535

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Cambricon MLUs support #29627

add Cambricon MLUs support #29627

huismiling commented Mar 13, 2024 •

edited

Loading

huismiling commented Mar 14, 2024

muellerzr left a comment

HuggingFaceDocBuilderDev commented Mar 25, 2024

ArthurZucker left a comment

huismiling commented Mar 27, 2024

add Cambricon MLUs support #29627

add Cambricon MLUs support #29627

Conversation

huismiling commented Mar 13, 2024 • edited Loading

What does this PR do?

huismiling commented Mar 14, 2024

muellerzr left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 25, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

huismiling commented Mar 27, 2024

huismiling commented Mar 13, 2024 •

edited

Loading