this training process did not consider decoder_attention_mask? #10

zlh1992 · 2023-03-24T08:17:12Z

I see that :

def model_forward(model, inputs):
h = inputs
h = h.to(model.base_model.model.model.embed_tokens.weight.device)
h = model.base_model.model.model.embed_tokens(h)
for layer in model.base_model.model.model.layers:
h = h.to(layer.input_layernorm.weight.device)
h = layer(h)[0]
h = h.to(model.base_model.model.model.norm.weight.device)
h = model.base_model.model.model.norm(h)
h = model.base_model.model.lm_head(h)
return h

the output of this model comes from all sequence?

Maybe you need add _prepare_decoder_attention_mask(h) to avoid this...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

this training process did not consider decoder_attention_mask? #10

this training process did not consider decoder_attention_mask? #10

zlh1992 commented Mar 24, 2023

this training process did not consider decoder_attention_mask? #10

this training process did not consider decoder_attention_mask? #10

Comments

zlh1992 commented Mar 24, 2023