T5 #149

AkshitaB · 2020-12-08T08:20:20Z

No description provided.

ibeltagy

Looks great. Thank you.
I left a few comments, but all of them are code-organization suggestions.

One question, how do you make sure your implementation is correct? The way I do this is I make sure the output of LED-T5 perfectly matches that of T5 for a random short input, say of size 4x256

ibeltagy · 2020-12-08T18:46:26Z

longformer/longformer.py

+        # this is for the T5 setting
+        if "has_relative_attention_bias" in config.to_dict():
+            self.is_decoder = config.is_decoder
+            self.relative_attention_num_buckets = config.relative_attention_num_buckets
+            self.has_relative_attention_bias = config.has_relative_attention_bias
+            if self.has_relative_attention_bias:
+                self.relative_attention_bias = nn.Embedding(self.relative_attention_num_buckets, self.num_heads)
+            self.is_t5 = True
+        else:
+            self.is_t5 = False
+


I would suggest moving all the T5-specific code from here to a longformer_encoder_decoder.LongformerSelfAttentionForT5 and have it inherit from LonformerSelfAttention

ibeltagy · 2020-12-08T18:52:47Z

longformer/longformer.py

+
+        if self.is_t5:
+            if position_bias is None:
+                if not self.has_relative_attention_bias:
+                    raise ValueError("No position_bias provided and no weights to compute position_bias")
+
+                position_bias = self.compute_bias(seq_len, seq_len)
+                # if key and values are already calculated
+                # we want only the last query position bias
+                if past_key_value_state is not None:
+                    position_bias = position_bias[:, :, -1:, :]
+                # TODO: attention_mask should also be the same shape as position_bias.
+                # Sliding attention window??
+                # if attention_mask is not None:
+                #     position_bias = position_bias + attention_mask  # (1, num_heads, seq_len, 2*window+1)
+            attn_weights += position_bias
+
+


as above, move to LongformerSelfAttentionForT5. Here you can only keep only one line, something like:
attn_weights = self.process_relative_positions(attn_weights). This function is empty in LongformerSelfAttention but has more details in LongformerSelfAttentionForT5

ibeltagy · 2020-12-08T18:53:06Z

longformer/longformer.py

+        relative_buckets += torch.where(is_small, relative_position, relative_postion_if_large)
+        return relative_buckets
+
+    def compute_bias(self, qlen, klen):


qlen, klen are not used

ibeltagy · 2020-12-08T18:53:20Z

longformer/longformer.py

+
+    def compute_bias(self, qlen, klen):
+        """ Compute binned relative position bias """
+        relative_position = torch.tensor([[i-self.attention_window for i in range(2*self.attention_window+1)]])


comment to explain the change

ibeltagy · 2020-12-08T18:54:32Z

longformer/longformer.py

+    @staticmethod
+    def _relative_position_bucket(relative_position, bidirectional=True, num_buckets=32, max_distance=128):
+        """
+        Adapted from Mesh Tensorflow:


This function is copied with no change, right? Please mention that.

adamwawrzynski · 2021-02-02T09:01:36Z

Hello @AkshitaB,

I am keeping my fingers crossed for You in porting T5 to use LongformerSelfAttention. I've tried to run code You have uploaded and it didn't work for me.
Steps to reproduce:

Create conda environment and activate.
Install dependencies: python3 -m pip install -r requirements.txt.
Copy script convert_t5_to_longformerencoderdecoder.py into parent directory as I've got problem with importing longformer module.
Run with default settings.

I've received this error:

$ CUDA_VISIBLE_DEVICES=6 python3 convert_t5_to_longformerencoderdecoder.py --save_model_to ./
INFO:__main__:saving model to ./
Some weights of LongformerEncoderDecoderForConditionalGenerationT5 were not initialized from the model checkpoint at ./ and are newly initialized: ['encoder.block.0.layer.0.SelfAttention.longformer_self_attn.query.bias', (...)]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "convert_t5_to_longformerencoderdecoder.py", line 148, in <module>
    main()
  File "convert_t5_to_longformerencoderdecoder.py", line 140, in main
    logits = model(input_ids, attention_mask=attention_mask, decoder_input_ids=decoder_input_ids, use_cache=False)[0]
  File "/dih3/dih3_1/awawrzynski/miniconda3/envs/longformer_t5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/dih3/dih3_1/awawrzynski/miniconda3/envs/longformer_t5/lib/python3.8/site-packages/transformers/modeling_t5.py", line 1151, in forward
    encoder_outputs = self.encoder(
  File "/dih3/dih3_1/awawrzynski/miniconda3/envs/longformer_t5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/dih3/dih3_1/awawrzynski/miniconda3/envs/longformer_t5/lib/python3.8/site-packages/transformers/modeling_t5.py", line 775, in forward
    position_bias = layer_outputs[3 if output_attentions else 2]
IndexError: tuple index out of range

ibeltagy · 2021-02-10T04:23:14Z

@adamwawrzynski, thanks for your interest in this work. Can you debug this a bit and see why it is breaking? Looks like a misconfiguration for output_attentions or something similar.

adamwawrzynski · 2021-02-22T08:01:08Z

I've checked dimensions of layer_outputs and there is a mismatch. I've added print lines in modeling_t5.py at line 767:

        ...
        for i, (layer_module, past_key_value_state) in enumerate(zip(self.block, past_key_value_states)):
            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)
            layer_outputs = layer_module(
                hidden_states,
                attention_mask=extended_attention_mask,
                position_bias=position_bias,
                encoder_hidden_states=encoder_hidden_states,
                encoder_attention_mask=encoder_extended_attention_mask,
                encoder_decoder_position_bias=encoder_decoder_position_bias,
                head_mask=head_mask[i],
                past_key_value_state=past_key_value_state,
                use_cache=use_cache,
                output_attentions=output_attentions,
            )
            print(type(layer_outputs))
            print(len(layer_outputs))
            print(type(layer_outputs[0]))
        ...

results in:

<class 'tuple'>
2
<class 'torch.Tensor'>

And later in code at line 776 and 779 there is attempt to get 3rd or 5th element of this tuple.

insomnia777 · 2021-09-18T13:51:16Z

Hello.
I also get a conversion error.

Model - https://huggingface.co/cointegrated/rut5-small

python3 convert_t5_to_longformerencoderdecoder.py --base_model /mnt/1tb/ML/models/ruT5/small/ --save_model_to .
Traceback (most recent call last):
File "convert_t5_to_longformerencoderdecoder.py", line 148, in
main()
File "convert_t5_to_longformerencoderdecoder.py", line 119, in main
relative_attention_num_buckets=args.num_pos_buckets,
File "convert_t5_to_longformerencoderdecoder.py", line 19, in create_long_model
tokenizer = T5Tokenizer.from_pretrained(base_model, model_max_length=max_pos)
File "/mnt/1tb/ML/venv-longformer/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1425, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/mnt/1tb/ML/venv-longformer/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1572, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/mnt/1tb/ML/venv-longformer/lib/python3.7/site-packages/transformers/tokenization_t5.py", line 141, in init
self.sp_model.Load(vocab_file)
File "/mnt/1tb/ML/venv-longformer/lib/python3.7/site-packages/sentencepiece/init.py", line 367, in Load
return self.LoadFromFile(model_file)
File "/mnt/1tb/ML/venv-longformer/lib/python3.7/site-packages/sentencepiece/init.py", line 171, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

AkshitaB added 3 commits December 7, 2020 23:58

adding t5 options to longformer

572da07

t5 encoder decoder options

2b44bf8

adding convert script

2c3860a

AkshitaB marked this pull request as draft December 8, 2020 08:20

AkshitaB requested a review from ibeltagy December 8, 2020 08:21

ibeltagy requested changes Dec 8, 2020

View reviewed changes

ibeltagy mentioned this pull request Feb 27, 2021

Adding Longformer Encoder Decoder support for T5 huggingface/transformers#10432

Closed

HaokunLiu mentioned this pull request Mar 24, 2021

Long T5 #179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T5 #149

T5 #149

AkshitaB commented Dec 8, 2020

ibeltagy left a comment

ibeltagy Dec 8, 2020

ibeltagy Dec 8, 2020

ibeltagy Dec 8, 2020

ibeltagy Dec 8, 2020

ibeltagy Dec 8, 2020

adamwawrzynski commented Feb 2, 2021

ibeltagy commented Feb 10, 2021

adamwawrzynski commented Feb 22, 2021

insomnia777 commented Sep 18, 2021

T5 #149

Are you sure you want to change the base?

T5 #149

Conversation

AkshitaB commented Dec 8, 2020

ibeltagy left a comment

Choose a reason for hiding this comment

ibeltagy Dec 8, 2020

Choose a reason for hiding this comment

ibeltagy Dec 8, 2020

Choose a reason for hiding this comment

ibeltagy Dec 8, 2020

Choose a reason for hiding this comment

ibeltagy Dec 8, 2020

Choose a reason for hiding this comment

ibeltagy Dec 8, 2020

Choose a reason for hiding this comment

adamwawrzynski commented Feb 2, 2021

ibeltagy commented Feb 10, 2021

adamwawrzynski commented Feb 22, 2021

insomnia777 commented Sep 18, 2021