Pytorch embedding #2

DeNeutoy · 2017-06-26T20:03:21Z

Switching the word embedding functionality over to use Pytorch. Mainly did this to get a quick idea of how tough it would be to switch layers. Answer: pretty straightforward.

matt-gardner · 2017-06-26T22:14:07Z

allennlp/layers/embeddings.py

+
+    Parameters
+    ----------
+    num_embeddings:, int:


We should probably figure out markdown vs. rest sooner rather than later, so we can switch docstrings now, if we're going to switch.

If we're sticking with rest, this should be formatted like:

num_embeddings : ``int``

Note the space after the parameter name - it's important for the code that generates the docs.

matt-gardner · 2017-06-26T22:17:38Z

allennlp/layers/embeddings.py

+            self.weight.data[self.padding_index].fill_(0)
+
+    def forward(self, input):
+        padding_idx = self.padding_idx


self.padding_index

matt-gardner · 2017-06-26T22:18:59Z

allennlp/layers/embeddings.py

+        self.sparse = sparse
+
+        if embedding_dim == 1:
+            raise ConfigurationError("There is no need to embed tokens if you "


Why not? I could be learning something like a tf-idf weight for each token. Seems odd to put in this restriction, as it's unnecessary.

Sure, I was just porting the previous embedding. I'll remove it.

matt-gardner · 2017-06-26T22:19:55Z

allennlp/layers/embeddings.py

+                 embedding_dim: int,
+                 weight: torch.FloatTensor=None,
+                 padding_index: int=None,
+                 trainable: bool=True,


This field is missing from the docstring.

matt-gardner · 2017-06-26T22:21:20Z

allennlp/layers/embeddings.py

+        if weight is None:
+            weight = torch.FloatTensor(num_embeddings, embedding_dim)
+            self.weight = torch.nn.Parameter(weight, requires_grad=trainable)
+            self.reset_parameters()


How does this work? This is scary to me.

Oh, you defined this method below. Ok, that makes more sense. When I first saw this, I thought you were calling a super class method, and I was really nervous about using pytorch.

Do you even need this method? Looks like it's just a single line, and you can move the padding fill to below the if/else block, because you do it in both.

And if you do want it, I'd call it something like initialize_parameters.

It looks like there is a technical reason to have this function if we ever need to call it to create parameters in the call of forward: https://discuss.pytorch.org/t/dynamic-parameter-declaration-in-forward-function/427/7

This isn't the reason it's used here though, but I was wondering why other pytorch layers use it and it looks like this might be why.

Interesting, ok. But it's not a method defined on Module, so we can handle this however we want, right?

Yes, I have removed it for now.

matt-gardner · 2017-06-26T22:31:37Z

allennlp/layers/embeddings.py

+                                   trainable: bool=True,
+                                   log_misses: bool=False):
+    """
+    Reads a pre-trained embedding file and generates a Keras Embedding layer that has weights


matt-gardner · 2017-06-26T22:31:48Z

allennlp/layers/embeddings.py

+    it a zero vector.
+
+    The embeddings file is assumed to be gzipped, formatted as [word] [dim 1] [dim 2] ...
+    """


Documenting parameters here would be good.

matt-gardner · 2017-06-26T22:34:24Z

allennlp/layers/embeddings.py

+    embedding_dim = None
+
+    # TODO(matt): make this a parameter
+    embedding_misses_filename = 'embedding_misses.txt'


We can probably just log misses using logger.debug, and remove the log_misses parameter entirely, and this variable.

matt-gardner · 2017-06-26T22:36:16Z

allennlp/layers/embeddings.py

+                assert embedding_dim > 1, "Found embedding size of 1; do you have a header?"
+            else:
+                if len(fields) - 1 != embedding_dim:
+                    # Sometimes there are funny unicode parsing problems that lead to different


Can we do better here? Maybe we can just import a library that reads glove vectors already, so we don't have to duplicate that functionality?

matt-gardner · 2017-06-26T22:37:58Z

allennlp/layers/embeddings.py

+
+    # Depending on whether the namespace has PAD and UNK tokens, we start at different indices,
+    # because you shouldn't have pretrained embeddings for PAD or UNK.
+    start_index = 0 if namespace.startswith("*") else 2


You should be able to always start at 0. The check for word in embeddings will fail, and everything will be ok, because we just continue in that case. And the namespace check is wrong here, anyways.

otherwise it would raise an error in line 505 (expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'other')

otherwise it would raise an error in line 505 (expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument allenai#2 'other')

segmental conll 2000 dataset reader

Pull from AllenNLP Master

Call make_app correctly.

DeNeutoy added 2 commits June 26, 2017 12:53

pytorch version of embeddings

33de4b0

fix vocab word list

ebaaf06

matt-gardner reviewed Jun 26, 2017

View reviewed changes

DeNeutoy closed this Jun 26, 2017

joelgrus mentioned this pull request Dec 8, 2017

modify Params to support adding arbitrary files to the archive #590

Merged

nelson-liu mentioned this pull request Mar 25, 2018

Add length-based filtering and truncation of seq2seq data #1030

Closed

LiyuanLucasLiu referenced this pull request in LiyuanLucasLiu/allennlp Jul 16, 2018

fixing a bug in trainer for histograms

3dde9f1

otherwise it would raise an error in line 505 (expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'other')

LiyuanLucasLiu mentioned this pull request Jul 16, 2018

fixing a bug in trainer for histograms #1498

Merged

DeNeutoy pushed a commit that referenced this pull request Jul 16, 2018

fixing a bug in trainer for histograms (#1498)

9c21696

otherwise it would raise an error in line 505 (expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'other')

joelgrus mentioned this pull request Aug 21, 2018

AllenNLP's data piece could be more pythonic #1633

Closed

shwetamandhare mentioned this pull request Sep 17, 2018

Answer is coming as '(list)\n' in wikitables_parser #1778

Closed

brendan-ai2 pushed a commit to brendan-ai2/allennlp that referenced this pull request Feb 27, 2019

Merge pull request allenai#2 from swabhs/dataset-reader

2e48f97

segmental conll 2000 dataset reader

joelgrus mentioned this pull request Mar 1, 2019

Make load_archive operate on serialization directories. #2554

Merged

schmmd mentioned this pull request Mar 14, 2019

Using ExponentialMovingAverage logs all parameters to stdout #2533

Closed

joelgrus added a commit to joelgrus/allennlp that referenced this pull request Jul 11, 2019

gradient accumulation, try allenai#2

dbb9077

dirkgr pushed a commit to dirkgr/allennlp that referenced this pull request Dec 11, 2019

Merge pull request allenai#2 from allenai/master

5ccf8ad

Pull from AllenNLP Master

schmmd added a commit that referenced this pull request Feb 26, 2020

Merge pull request #2 from allenai/fix

f5a4182

Call make_app correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytorch embedding #2

Pytorch embedding #2

DeNeutoy commented Jun 26, 2017

matt-gardner Jun 26, 2017

matt-gardner Jun 26, 2017

matt-gardner Jun 26, 2017

DeNeutoy Jun 26, 2017

matt-gardner Jun 26, 2017

matt-gardner Jun 26, 2017

matt-gardner Jun 26, 2017

matt-gardner Jun 26, 2017

matt-gardner Jun 26, 2017

DeNeutoy Jun 26, 2017

matt-gardner Jun 26, 2017

DeNeutoy Jun 26, 2017

matt-gardner Jun 26, 2017

matt-gardner Jun 26, 2017

matt-gardner Jun 26, 2017

matt-gardner Jun 26, 2017

matt-gardner Jun 26, 2017

Pytorch embedding #2

Pytorch embedding #2

Conversation

DeNeutoy commented Jun 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment