Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reformer #3351

Merged
merged 162 commits into from
May 7, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
162 commits
Select commit Hold shift + click to select a range
ee0ce08
first copy & past commit from Bert and morgans LSH code
patrickvonplaten Mar 19, 2020
3259115
add easy way to compare to trax original code
patrickvonplaten Mar 20, 2020
25d162e
translate most of function
patrickvonplaten Mar 23, 2020
dc07c08
make trax lsh self attention deterministic with numpy seed + copy pas…
patrickvonplaten Mar 23, 2020
09d4230
add same config
patrickvonplaten Mar 23, 2020
9386450
add same config
patrickvonplaten Mar 23, 2020
3fb1182
fix merge conflicts
patrickvonplaten Apr 6, 2020
ebb9d3f
make layer init work
patrickvonplaten Mar 31, 2020
c910e03
implemented hash_vectors function for lsh attention
patrickvonplaten Apr 1, 2020
b956933
continue reformer translation
patrickvonplaten Apr 1, 2020
a449c2e
hf LSHSelfAttentionLayer gives same output as trax layer
patrickvonplaten Apr 2, 2020
35491d8
refactor code
patrickvonplaten Apr 2, 2020
c9a0919
refactor code
patrickvonplaten Apr 2, 2020
4ef4739
refactor code
patrickvonplaten Apr 2, 2020
f580074
refactor
patrickvonplaten Apr 2, 2020
4176564
refactor + add reformer config
patrickvonplaten Apr 2, 2020
8e02fe7
delete bogus file
patrickvonplaten Apr 2, 2020
2af8377
split reformer attention layer into two layers
patrickvonplaten Apr 3, 2020
6fe9478
fix merge conflicts
patrickvonplaten Apr 6, 2020
9e6e1af
save intermediate step
patrickvonplaten Apr 3, 2020
1855074
save intermediate step
patrickvonplaten Apr 3, 2020
1a4e61a
make test work
patrickvonplaten Apr 3, 2020
da6bfe4
add complete reformer block layer
patrickvonplaten Apr 3, 2020
2825d24
finish reformer layer
patrickvonplaten Apr 4, 2020
45e6635
implement causal and self mask
patrickvonplaten Apr 5, 2020
b5ed5d4
clean reformer test and refactor code
patrickvonplaten Apr 5, 2020
ddb2f09
update init
patrickvonplaten Apr 6, 2020
cbb5ab9
fix device for GPU
patrickvonplaten Apr 6, 2020
f17fd5b
fix chunk length init for tests
patrickvonplaten Apr 6, 2020
eca8cce
include morgans optimization
patrickvonplaten Apr 6, 2020
db2ebb1
improve memory a bit
patrickvonplaten Apr 6, 2020
04aa067
improve comment
patrickvonplaten Apr 6, 2020
4aec75e
factorize num_buckets
patrickvonplaten Apr 7, 2020
d030e39
better testing parameters
patrickvonplaten Apr 7, 2020
d318089
make whole model work
patrickvonplaten Apr 9, 2020
4f0b114
make lm model work
patrickvonplaten Apr 9, 2020
6c8bad6
add t5 copy paste tokenizer
patrickvonplaten Apr 10, 2020
b71ef16
add chunking feed forward
patrickvonplaten Apr 10, 2020
99427c6
clean config
patrickvonplaten Apr 14, 2020
4ffa925
add improved assert statements
patrickvonplaten Apr 14, 2020
b116e3c
make tokenizer work
patrickvonplaten Apr 14, 2020
79a0bab
improve test
patrickvonplaten Apr 14, 2020
aceb586
correct typo
patrickvonplaten Apr 14, 2020
a4814bd
extend config
patrickvonplaten Apr 14, 2020
5eeeb25
add complexer test
patrickvonplaten Apr 14, 2020
0ee5db4
add new axial position embeddings
patrickvonplaten Apr 15, 2020
938aa8b
add local block attention layer
patrickvonplaten Apr 15, 2020
4d7c23b
clean tests
patrickvonplaten Apr 15, 2020
50276de
refactor
patrickvonplaten Apr 15, 2020
37a2b00
better testing
patrickvonplaten Apr 15, 2020
07c0c72
save intermediate progress
patrickvonplaten Apr 15, 2020
060a691
clean test file
patrickvonplaten Apr 16, 2020
ace301f
make shorter input length work for model
patrickvonplaten Apr 16, 2020
80d18db
allow variable input length
patrickvonplaten Apr 16, 2020
86f4ac4
refactor
patrickvonplaten Apr 16, 2020
e571849
make forward pass for pretrained model work
patrickvonplaten Apr 16, 2020
d5e1363
add generation possibility
patrickvonplaten Apr 17, 2020
562d530
finish dropout and init
patrickvonplaten Apr 17, 2020
c98eafe
make style
patrickvonplaten Apr 17, 2020
9c9fab9
refactor
patrickvonplaten Apr 17, 2020
a188a39
add first version of RevNet Layers
patrickvonplaten Apr 17, 2020
8047573
make forward pass work and add convert file
patrickvonplaten Apr 18, 2020
31a596b
make uploaded model forward pass work
patrickvonplaten Apr 18, 2020
bae0700
make uploaded model forward pass work
patrickvonplaten Apr 18, 2020
831dcec
refactor code
patrickvonplaten Apr 18, 2020
57ee09c
add namedtuples and cache buckets
patrickvonplaten Apr 19, 2020
2d23fad
correct head masks
patrickvonplaten Apr 19, 2020
0c35bbf
refactor
patrickvonplaten Apr 19, 2020
232463e
made reformer more flexible
patrickvonplaten Apr 19, 2020
2648a94
make style
patrickvonplaten Apr 19, 2020
902408b
remove set max length
patrickvonplaten Apr 21, 2020
8ed63ab
add attention masks
patrickvonplaten Apr 22, 2020
513bb43
fix up tests
patrickvonplaten Apr 22, 2020
db60c23
fix conflict
patrickvonplaten Apr 30, 2020
9f359af
fix lsh attention mask
patrickvonplaten Apr 23, 2020
48097a0
make random seed optional for the moment
patrickvonplaten Apr 23, 2020
650e00c
improve memory in reformer
patrickvonplaten Apr 23, 2020
ccba9ac
add tests
patrickvonplaten Apr 23, 2020
f83721e
make style
patrickvonplaten Apr 23, 2020
125c86d
make sure masks work correctly
patrickvonplaten Apr 24, 2020
2beda9c
detach gradients
patrickvonplaten Apr 24, 2020
12e35e1
save intermediate
patrickvonplaten Apr 24, 2020
8b058e2
correct backprob through gather
patrickvonplaten Apr 24, 2020
69258b8
make style
patrickvonplaten Apr 24, 2020
44c3a7c
change back num hashes
patrickvonplaten Apr 25, 2020
48fff07
rename to labels
patrickvonplaten Apr 25, 2020
55842be
fix rotation shape
patrickvonplaten Apr 25, 2020
71426c0
fix detach
patrickvonplaten Apr 25, 2020
dfbcf8f
update
patrickvonplaten Apr 25, 2020
0ea564c
fix trainer
patrickvonplaten Apr 25, 2020
af3456c
fix backward dropout
patrickvonplaten Apr 26, 2020
002f19c
make reformer more flexible
patrickvonplaten Apr 26, 2020
7de3f4f
fix
patrickvonplaten May 7, 2020
6111bd5
fix
patrickvonplaten May 7, 2020
0c75149
add tests for fixed seed in reformer layer
patrickvonplaten Apr 26, 2020
7a03bc7
fix trainer typo
patrickvonplaten Apr 26, 2020
37943f3
fix typo in activations
patrickvonplaten Apr 26, 2020
0f751f5
add fp16 tests
patrickvonplaten Apr 28, 2020
8df5dcd
add fp16 training
patrickvonplaten Apr 28, 2020
51426b5
support fp16
patrickvonplaten Apr 28, 2020
b37fd3b
correct gradient bug in reformer
patrickvonplaten Apr 29, 2020
e3e05ef
add fast gelu
patrickvonplaten Apr 29, 2020
c3e32b4
re-add dropout for embedding dropout
patrickvonplaten Apr 29, 2020
52ee5ed
better naming
patrickvonplaten Apr 29, 2020
ece19ee
better naming
patrickvonplaten Apr 29, 2020
e661832
renaming
patrickvonplaten Apr 29, 2020
f1a6355
finalize test branch
patrickvonplaten Apr 29, 2020
ea1126e
finalize tests
patrickvonplaten Apr 30, 2020
d4bc3c6
add more tests
patrickvonplaten Apr 30, 2020
94086ac
finish tests
patrickvonplaten Apr 30, 2020
01f4074
fix
patrickvonplaten May 7, 2020
9dafbc2
fix type trainer
patrickvonplaten Apr 30, 2020
de08a57
fix fp16 tests
patrickvonplaten Apr 30, 2020
aa570dc
fix tests
patrickvonplaten Apr 30, 2020
a681d19
fix tests
patrickvonplaten Apr 30, 2020
320c045
fix tests
patrickvonplaten Apr 30, 2020
482c6cd
fix issue with dropout
patrickvonplaten Apr 30, 2020
d7905dd
fix dropout seeds
patrickvonplaten Apr 30, 2020
764e06e
correct random seed on gpu
patrickvonplaten Apr 30, 2020
a3e0f59
finalize random seed for dropout
patrickvonplaten Apr 30, 2020
c48f88a
finalize random seed for dropout
patrickvonplaten Apr 30, 2020
ce87cb6
remove duplicate line
patrickvonplaten Apr 30, 2020
d418dd0
correct half precision bug
patrickvonplaten May 1, 2020
3248e67
make style
patrickvonplaten May 1, 2020
6fe0648
refactor
patrickvonplaten May 1, 2020
c3031b8
refactor
patrickvonplaten May 1, 2020
6c2be30
docstring
patrickvonplaten May 1, 2020
3d266fb
remove sinusoidal position encodings for reformer
patrickvonplaten May 1, 2020
1be343f
move chunking to modeling_utils
patrickvonplaten May 1, 2020
a10eb2e
make style
patrickvonplaten May 1, 2020
f31b570
clean config
patrickvonplaten May 1, 2020
b2a660f
make style
patrickvonplaten May 1, 2020
dfc1f64
fix tests
patrickvonplaten May 1, 2020
2e95c17
fix auto tests
patrickvonplaten May 1, 2020
b95f6ae
pretrained models
patrickvonplaten May 1, 2020
a6f69cb
fix docstring
patrickvonplaten May 1, 2020
59868f3
update conversion file
patrickvonplaten May 1, 2020
a81c3e0
Update pretrained_models.rst
patrickvonplaten May 1, 2020
c0ddf94
fix rst
patrickvonplaten May 1, 2020
62a8eb0
fix rst
patrickvonplaten May 1, 2020
47e5fc8
update copyright
patrickvonplaten May 1, 2020
b6576c8
fix test path
patrickvonplaten May 1, 2020
a111720
fix test path
patrickvonplaten May 1, 2020
ff5e783
fix small issue in test
patrickvonplaten May 1, 2020
f7f949b
include reformer in generation tests
patrickvonplaten May 2, 2020
91472b8
add docs for axial position encoding
patrickvonplaten May 2, 2020
6ed2fa8
finish docs
patrickvonplaten May 2, 2020
963bb5e
Update convert_reformer_trax_checkpoint_to_pytorch.py
patrickvonplaten May 2, 2020
425b185
remove isort
patrickvonplaten May 2, 2020
3336d8f
include sams comments
patrickvonplaten May 3, 2020
54eb629
remove wrong comment in utils
patrickvonplaten May 3, 2020
e4e1e59
correct typos
patrickvonplaten May 3, 2020
5f5c89b
fix typo
patrickvonplaten May 3, 2020
7fdf16b
Update reformer.rst
patrickvonplaten May 4, 2020
7ccec6a
applied morgans optimization
patrickvonplaten May 4, 2020
3978afa
make style
patrickvonplaten May 4, 2020
01b1006
make gpu compatible
patrickvonplaten May 4, 2020
e983a69
remove bogus file
patrickvonplaten May 4, 2020
9ce32f0
big test refactor
patrickvonplaten May 4, 2020
67f02c0
add example for chunking
patrickvonplaten May 4, 2020
4e7252a
fix typo
patrickvonplaten May 4, 2020
ca4dab3
add to README
patrickvonplaten May 7, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
better naming
  • Loading branch information
patrickvonplaten committed May 7, 2020
commit 52ee5ed7107957eceb19cd62a54fdc940405f8ee
6 changes: 3 additions & 3 deletions src/transformers/configuration_reformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ def __init__(
vocab_size=10,
attention_head_size=32,
hidden_size=64,
num_attention_heads=2,
num_buckets=2,
num_attention_heads=1,
num_buckets=[2, 4],
num_hashes=2,
lsh_attn_chunk_length=64,
local_attn_chunk_length=64,
Expand All @@ -122,7 +122,7 @@ def __init__(
layer_norm_eps=1e-12,
sinusoidal_pos_embds=False,
axial_pos_embds=False,
axial_pos_shape=[8, 8],
axial_pos_shape=[32, 16],
axial_pos_embds_dim=[32, 32],
attn_layers=["lsh", "lsh", "lsh", "lsh"],
# attn_layers=["local", "local", "local", "local"],
Expand Down
34 changes: 18 additions & 16 deletions src/transformers/modeling_reformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -471,6 +471,7 @@ def _hash_vectors(self, vectors, num_hashes):
# See https://arxiv.org/pdf/1509.02897.pdf
# We sample a different random rotation for each round of hashing to
# decrease the probability of hash misses.

if isinstance(self.num_buckets, int):
assert (
self.num_buckets % 2 == 0
Expand All @@ -480,12 +481,13 @@ def _hash_vectors(self, vectors, num_hashes):
else:
# Factorize the hash if self.num_buckets is a list or tuple
rotation_size, num_buckets = 0, 1
for num_bucket in self.num_buckets:
assert num_bucket % 2 == 0, "The number of buckets should be even, but `num_bucket`: {}".format(
num_bucket
)
rotation_size += num_bucket
num_buckets *= num_bucket
for bucket_factor in self.num_buckets:
assert bucket_factor % 2 == 0, "The number of buckets should be even, but `num_bucket`: {}".format(bucket_factor)
rotation_size = rotation_size + bucket_factor
num_buckets = num_buckets * bucket_factor

# remove gradient
vectors = vectors.detach()

# TODO: delete later when integration tests are ok
if self.hash_seed is not None:
Expand All @@ -497,7 +499,7 @@ def _hash_vectors(self, vectors, num_hashes):
rotated_vectors = torch.einsum("bmtd,dhr->bmhtr", vectors, random_rotations)
else:
rotations_shape = (self.num_attention_heads, vectors.shape[-1], num_hashes, rotation_size // 2)
# create a random self.attention_head_size x num_hashes x self.num_buckets/2
# create a random self.attention_head_size x num_hashes x num_buckets/2
random_rotations = torch.randn(rotations_shape, device=vectors.device).to(vectors.dtype)

# rotated_vectors has dim:
Expand All @@ -513,17 +515,17 @@ def _hash_vectors(self, vectors, num_hashes):
else:
# Get the buckets for them and combine.
buckets, cur_sum, cur_product = None, 0, 1
for num_bucket in self.num_buckets:
rotated_vectors = rotated_vectors[..., cur_sum : cur_sum + (num_bucket // 2)]
cur_sum += num_bucket // 2
rotated_vectors = torch.cat([rotated_vectors, -rotated_vectors], dim=-1)
for bucket_factor in self.num_buckets:
rotated_vectors_factor = rotated_vectors[..., cur_sum : cur_sum + (bucket_factor // 2)]
cur_sum = cur_sum + bucket_factor // 2
rotated_vectors_factor = torch.cat([rotated_vectors_factor, -rotated_vectors_factor], dim=-1)

if buckets is None:
buckets = torch.argmax(rotated_vectors, dim=-1)
buckets = torch.argmax(rotated_vectors_factor, dim=-1)
else:
buckets += cur_product * torch.argmax(rotated_vectors, dim=-1)
buckets = buckets + (cur_product * torch.argmax(rotated_vectors_factor, dim=-1))

cur_product *= num_bucket
cur_product = cur_product * bucket_factor

# buckets is now (Batch_size x Num_Attn_Heads x Num_Hashes x Seq_Len).
# Next we add offsets so that bucket numbers from different hashing rounds don't overlap.
Expand Down Expand Up @@ -1511,8 +1513,8 @@ def forward(
if labels is not None:
# Shift so that tokens < n predict n
# Uncomment this line for integration test with Trax
# shift_logits = logits.contiguous()
shift_logits = logits[..., :-1, :].contiguous()
shift_logits = logits.contiguous()
# shift_logits = logits[..., :-1, :].contiguous()

shift_labels = labels[..., 1:].contiguous()
# Flatten the tokens
Expand Down
2 changes: 1 addition & 1 deletion tests/test_modeling_reformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -631,7 +631,7 @@ def test_local_layer(self):
def test_reformer_lm_model(self):
config = ReformerConfig(axial_pos_embds=True, hash_seed=0, is_decoder=True)

shape = (1, 64) # Batch x SeqLen x ModelDimPerHead
shape = (1, 512) # Batch x SeqLen x ModelDimPerHead

np_input = np.random.randint(0, config.vocab_size, size=shape)
np_input_2 = np.asarray(np_input, np.float32)
Expand Down