Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to train on an existing model #47

Open
ghost opened this issue Jul 21, 2022 · 1 comment
Open

Trying to train on an existing model #47

ghost opened this issue Jul 21, 2022 · 1 comment

Comments

@ghost
Copy link

ghost commented Jul 21, 2022

Hi!
This is a really great tool and it's been fun using it.
I am trying to train the model 'bert-base-multilingual-uncased' using a tokenized dataset in the correct format. But every time I run the script, it loads the file and weights and promptly stops as weights of the pre trained model aren't initialised.
This is the message I get:

07/21/2022 13:48:50 - WARNING - main - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
07/21/2022 13:48:50 - INFO - awesome_align.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-config.json from cache at /Users/devi/.cache/torch/awesome-align/45629519f3117b89d89fd9c740073d8e4c1f0a70f9842476185100a8afe715d1.65df3cef028a0c91a7b059e4c404a975ebe6843c71267b67019c0e9cfa8a88f0
07/21/2022 13:48:50 - INFO - awesome_align.configuration_utils - Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": null,
"directionality": "bidi",
"do_sample": false,
"eos_token_ids": null,
"finetuning_task": null,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-12,
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_beams": 1,
"num_hidden_layers": 12,
"num_labels": 2,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": true,
"pad_token_id": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"repetition_penalty": 1.0,
"temperature": 1.0,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"type_vocab_size": 2,
"use_bfloat16": false,
"vocab_size": 119547
}

07/21/2022 13:48:51 - INFO - awesome_align.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /Users/devi/.cache/torch/awesome-align/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729
07/21/2022 13:48:52 - INFO - awesome_align.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-pytorch_model.bin from cache at /Users/devi/.cache/torch/awesome-align/5b5b80054cd2c95a946a8e0ce0b93f56326dff9fbda6a6c3e02de3c91c918342.7131dcb754361639a7d5526985f880879c9bfd144b65a0bf50590bddb7de9059
07/21/2022 13:48:56 - INFO - awesome_align.modeling_utils - Weights of BertForMaskedLM not initialized from pretrained model: ['cls.predictions.decoder.bias', 'psi_cls.bias', 'psi_cls.transform.weight', 'psi_cls.transform.bias', 'psi_cls.decoder.weight', 'psi_cls.decoder.bias']
07/21/2022 13:48:56 - INFO - awesome_align.modeling_utils - Weights from pretrained model not used in BertForMaskedLM: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
07/21/2022 13:48:56 - INFO - main - Training/evaluation parameters Namespace(train_data_file=‘de-en_tmx_align.txt', output_dir='align/train_model', train_mlm=False, train_tlm=False, train_tlm_full=False, train_so=False, train_psi=False, train_co=False, train_gold_file=None, eval_gold_file=None, ignore_possible_alignments=False, gold_one_index=False, cache_data=False, align_layer=8, extraction='softmax', softmax_threshold=0.001, eval_data_file='examples/deen_param_test', should_continue=False, model_name_or_path='bert-base-multilingual-cased', mlm_probability=0.15, config_name=None, tokenizer_name=None, cache_dir=None, block_size=512, do_train=False, do_eval=False, per_gpu_train_batch_size=2, per_gpu_eval_batch_size=2, gradient_accumulation_steps=4, learning_rate=2e-05, weight_decay=0.0, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, warmup_steps=0, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, overwrite_output_dir=False, overwrite_cache=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, n_gpu=0, device=device(type='cpu'))

Please help with a solution or if I'm doing something wrong!
Thanks

@zdou0830
Copy link
Collaborator

Hi, what are your training command and the size of your training data? it is possible that your data is too large, in which case you can just subsample a portion of the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant