Trying to train on an existing model #47

ghost · 2022-07-21T12:27:56Z

Hi!
This is a really great tool and it's been fun using it.
I am trying to train the model 'bert-base-multilingual-uncased' using a tokenized dataset in the correct format. But every time I run the script, it loads the file and weights and promptly stops as weights of the pre trained model aren't initialised.
This is the message I get:

07/21/2022 13:48:50 - WARNING - main - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
07/21/2022 13:48:50 - INFO - awesome_align.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-config.json from cache at /Users/devi/.cache/torch/awesome-align/45629519f3117b89d89fd9c740073d8e4c1f0a70f9842476185100a8afe715d1.65df3cef028a0c91a7b059e4c404a975ebe6843c71267b67019c0e9cfa8a88f0
07/21/2022 13:48:50 - INFO - awesome_align.configuration_utils - Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": null,
"directionality": "bidi",
"do_sample": false,
"eos_token_ids": null,
"finetuning_task": null,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-12,
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_beams": 1,
"num_hidden_layers": 12,
"num_labels": 2,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": true,
"pad_token_id": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"repetition_penalty": 1.0,
"temperature": 1.0,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"type_vocab_size": 2,
"use_bfloat16": false,
"vocab_size": 119547
}

07/21/2022 13:48:51 - INFO - awesome_align.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /Users/devi/.cache/torch/awesome-align/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729
07/21/2022 13:48:52 - INFO - awesome_align.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-pytorch_model.bin from cache at /Users/devi/.cache/torch/awesome-align/5b5b80054cd2c95a946a8e0ce0b93f56326dff9fbda6a6c3e02de3c91c918342.7131dcb754361639a7d5526985f880879c9bfd144b65a0bf50590bddb7de9059
07/21/2022 13:48:56 - INFO - awesome_align.modeling_utils - Weights of BertForMaskedLM not initialized from pretrained model: ['cls.predictions.decoder.bias', 'psi_cls.bias', 'psi_cls.transform.weight', 'psi_cls.transform.bias', 'psi_cls.decoder.weight', 'psi_cls.decoder.bias']
07/21/2022 13:48:56 - INFO - awesome_align.modeling_utils - Weights from pretrained model not used in BertForMaskedLM: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
07/21/2022 13:48:56 - INFO - main - Training/evaluation parameters Namespace(train_data_file=‘de-en_tmx_align.txt', output_dir='align/train_model', train_mlm=False, train_tlm=False, train_tlm_full=False, train_so=False, train_psi=False, train_co=False, train_gold_file=None, eval_gold_file=None, ignore_possible_alignments=False, gold_one_index=False, cache_data=False, align_layer=8, extraction='softmax', softmax_threshold=0.001, eval_data_file='examples/deen_param_test', should_continue=False, model_name_or_path='bert-base-multilingual-cased', mlm_probability=0.15, config_name=None, tokenizer_name=None, cache_dir=None, block_size=512, do_train=False, do_eval=False, per_gpu_train_batch_size=2, per_gpu_eval_batch_size=2, gradient_accumulation_steps=4, learning_rate=2e-05, weight_decay=0.0, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, warmup_steps=0, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, overwrite_output_dir=False, overwrite_cache=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, n_gpu=0, device=device(type='cpu'))

Please help with a solution or if I'm doing something wrong!
Thanks

zdou0830 · 2022-07-22T04:24:48Z

Hi, what are your training command and the size of your training data? it is possible that your data is too large, in which case you can just subsample a portion of the data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to train on an existing model #47

Trying to train on an existing model #47

ghost commented Jul 21, 2022 •

edited by ghost

Loading

zdou0830 commented Jul 22, 2022

Trying to train on an existing model #47

Trying to train on an existing model #47

Comments

ghost commented Jul 21, 2022 • edited by ghost Loading

zdou0830 commented Jul 22, 2022

ghost commented Jul 21, 2022 •

edited by ghost

Loading