Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T5 #149

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

T5 #149

wants to merge 3 commits into from

Conversation

AkshitaB
Copy link

@AkshitaB AkshitaB commented Dec 8, 2020

No description provided.

@AkshitaB AkshitaB marked this pull request as draft December 8, 2020 08:20
@AkshitaB AkshitaB requested a review from ibeltagy December 8, 2020 08:21
Copy link
Collaborator

@ibeltagy ibeltagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thank you.
I left a few comments, but all of them are code-organization suggestions.

One question, how do you make sure your implementation is correct? The way I do this is I make sure the output of LED-T5 perfectly matches that of T5 for a random short input, say of size 4x256

Comment on lines +70 to +80
# this is for the T5 setting
if "has_relative_attention_bias" in config.to_dict():
self.is_decoder = config.is_decoder
self.relative_attention_num_buckets = config.relative_attention_num_buckets
self.has_relative_attention_bias = config.has_relative_attention_bias
if self.has_relative_attention_bias:
self.relative_attention_bias = nn.Embedding(self.relative_attention_num_buckets, self.num_heads)
self.is_t5 = True
else:
self.is_t5 = False

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving all the T5-specific code from here to a longformer_encoder_decoder.LongformerSelfAttentionForT5 and have it inherit from LonformerSelfAttention

Comment on lines +261 to +278

if self.is_t5:
if position_bias is None:
if not self.has_relative_attention_bias:
raise ValueError("No position_bias provided and no weights to compute position_bias")

position_bias = self.compute_bias(seq_len, seq_len)
# if key and values are already calculated
# we want only the last query position bias
if past_key_value_state is not None:
position_bias = position_bias[:, :, -1:, :]
# TODO: attention_mask should also be the same shape as position_bias.
# Sliding attention window??
# if attention_mask is not None:
# position_bias = position_bias + attention_mask # (1, num_heads, seq_len, 2*window+1)
attn_weights += position_bias


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, move to LongformerSelfAttentionForT5. Here you can only keep only one line, something like:
attn_weights = self.process_relative_positions(attn_weights). This function is empty in LongformerSelfAttention but has more details in LongformerSelfAttentionForT5

relative_buckets += torch.where(is_small, relative_position, relative_postion_if_large)
return relative_buckets

def compute_bias(self, qlen, klen):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qlen, klen are not used


def compute_bias(self, qlen, klen):
""" Compute binned relative position bias """
relative_position = torch.tensor([[i-self.attention_window for i in range(2*self.attention_window+1)]])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment to explain the change

@staticmethod
def _relative_position_bucket(relative_position, bidirectional=True, num_buckets=32, max_distance=128):
"""
Adapted from Mesh Tensorflow:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is copied with no change, right? Please mention that.

@adamwawrzynski
Copy link

Hello @AkshitaB,

I am keeping my fingers crossed for You in porting T5 to use LongformerSelfAttention. I've tried to run code You have uploaded and it didn't work for me.
Steps to reproduce:

  1. Create conda environment and activate.
  2. Install dependencies: python3 -m pip install -r requirements.txt.
  3. Copy script convert_t5_to_longformerencoderdecoder.py into parent directory as I've got problem with importing longformer module.
  4. Run with default settings.

I've received this error:

$ CUDA_VISIBLE_DEVICES=6 python3 convert_t5_to_longformerencoderdecoder.py --save_model_to ./
INFO:__main__:saving model to ./
Some weights of LongformerEncoderDecoderForConditionalGenerationT5 were not initialized from the model checkpoint at ./ and are newly initialized: ['encoder.block.0.layer.0.SelfAttention.longformer_self_attn.query.bias', (...)]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "convert_t5_to_longformerencoderdecoder.py", line 148, in <module>
    main()
  File "convert_t5_to_longformerencoderdecoder.py", line 140, in main
    logits = model(input_ids, attention_mask=attention_mask, decoder_input_ids=decoder_input_ids, use_cache=False)[0]
  File "/dih3/dih3_1/awawrzynski/miniconda3/envs/longformer_t5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/dih3/dih3_1/awawrzynski/miniconda3/envs/longformer_t5/lib/python3.8/site-packages/transformers/modeling_t5.py", line 1151, in forward
    encoder_outputs = self.encoder(
  File "/dih3/dih3_1/awawrzynski/miniconda3/envs/longformer_t5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/dih3/dih3_1/awawrzynski/miniconda3/envs/longformer_t5/lib/python3.8/site-packages/transformers/modeling_t5.py", line 775, in forward
    position_bias = layer_outputs[3 if output_attentions else 2]
IndexError: tuple index out of range

@ibeltagy
Copy link
Collaborator

@adamwawrzynski, thanks for your interest in this work. Can you debug this a bit and see why it is breaking? Looks like a misconfiguration for output_attentions or something similar.

@adamwawrzynski
Copy link

I've checked dimensions of layer_outputs and there is a mismatch. I've added print lines in modeling_t5.py at line 767:

        ...
        for i, (layer_module, past_key_value_state) in enumerate(zip(self.block, past_key_value_states)):
            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)
            layer_outputs = layer_module(
                hidden_states,
                attention_mask=extended_attention_mask,
                position_bias=position_bias,
                encoder_hidden_states=encoder_hidden_states,
                encoder_attention_mask=encoder_extended_attention_mask,
                encoder_decoder_position_bias=encoder_decoder_position_bias,
                head_mask=head_mask[i],
                past_key_value_state=past_key_value_state,
                use_cache=use_cache,
                output_attentions=output_attentions,
            )
            print(type(layer_outputs))
            print(len(layer_outputs))
            print(type(layer_outputs[0]))
        ...

results in:

<class 'tuple'>
2
<class 'torch.Tensor'>

And later in code at line 776 and 779 there is attempt to get 3rd or 5th element of this tuple.

@insomnia777
Copy link

Hello.
I also get a conversion error.

Model - https://huggingface.co/cointegrated/rut5-small

python3 convert_t5_to_longformerencoderdecoder.py --base_model /mnt/1tb/ML/models/ruT5/small/ --save_model_to .
Traceback (most recent call last):
File "convert_t5_to_longformerencoderdecoder.py", line 148, in
main()
File "convert_t5_to_longformerencoderdecoder.py", line 119, in main
relative_attention_num_buckets=args.num_pos_buckets,
File "convert_t5_to_longformerencoderdecoder.py", line 19, in create_long_model
tokenizer = T5Tokenizer.from_pretrained(base_model, model_max_length=max_pos)
File "/mnt/1tb/ML/venv-longformer/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1425, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/mnt/1tb/ML/venv-longformer/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1572, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/mnt/1tb/ML/venv-longformer/lib/python3.7/site-packages/transformers/tokenization_t5.py", line 141, in init
self.sp_model.Load(vocab_file)
File "/mnt/1tb/ML/venv-longformer/lib/python3.7/site-packages/sentencepiece/init.py", line 367, in Load
return self.LoadFromFile(model_file)
File "/mnt/1tb/ML/venv-longformer/lib/python3.7/site-packages/sentencepiece/init.py", line 171, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants