RNN stacks #437

ringgaard · 2020-01-08T12:53:05Z

I have implemented a new RNN module for the parser that implements a number of different RNN architectures:

LSTM - Standard LSTM [Hochreiter & Schmidhuber 1997].
DRAGNN LSTM - LSTM with peephole connections [Gers & Schmidhuber 2000] and coupled forget and input gates [Greff et al. 2015].
DOZAT LSTM - Standard LSTM with one matrix multiplication [Dozat & Manning 2017].
PYTORCH LSTM - Standard LSTM with two matrix multiplications [Paszke et al. 2019].
GRU -Gated Recurrent Unit (GRU) [Cho et al. 2014].

The RNN module also supports:

Multiple layers
Uni-directional and bi-directional RNNs.
High-way connections.
Dropout.

The best parser model with one bi-directional LSTM layer produces the following results:

SPAN:  P=92.78, R=93.76, F1=93.27
FRAME: P=93.85, R=94.84, F1=94.34
PAIR:  P=95.76, R=96.12, F1=95.94
EDGE:  P=81.41, R=82.30, F1=81.85
ROLE:  P=73.32, R=74.14, F1=73.73
TYPE:  P=90.22, R=91.18, F1=90.70
LABEL: P= 0.00, R= 0.00, F1= 0.00
SLOT:  P=83.55, R=84.45, F1=84.00
TOTAL: P=88.90, R=89.85, F1=89.37
training time: 30 mins
parsing speed: 40000 tokens/sec

You can make a more accurate model with a 3-layer 192-dim bi-directional LSTM:

SPAN:  P=93.09, R=94.49, F1=93.79
FRAME: P=94.17, R=95.58, F1=94.87
PAIR:  P=96.05, R=96.78, F1=96.42
EDGE:  P=83.01, R=84.92, F1=83.96
ROLE:  P=75.30, R=77.07, F1=76.17
TYPE:  P=90.91, R=92.27, F1=91.58
LABEL: P= 0.00, R= 0.00, F1= 0.00
SLOT:  P=84.72, R=86.27, F1=85.49
TOTAL: P=89.59, R=91.07, F1=90.33
training time: 4 hours
parsing speed: 2500 tokens/sec

For comparison, the baseline for the old PyTorch-based trainer is:

SPAN F1:  92.6639
FRAME F1: 93.8144
TYPE F1:  89.7554
ROLE F1:  72.2666
SLOT F1:  82.8597
TOTAL F1: 88.5455
training time: 16 hours
parsing speed: 9000 tokens/sec

Details:

Parser feature tracing has been removed
New PAIR and EDGE parser benchmarks
Deprecate EMBED, ELABORATE, and STOP parser actions
Remove mark distance feature and frame create feature
Learning rate cliff for fine-tuning model in final phrase
Support for parser training restart
Model parameter initialization (uniform, normal, ortho)
Flow variable attributes (flow version 6)
Support for dynamically sized tensors with dynamic (Concat and Split)
Generic ArgMax for all types.
Per-tensor L2 regularization parameter
On-demand loading for tensors in elementwise index generator
Myelin compiler --compile_only and --param_stats
ConcatV2 op renamed to Concat
Remove support for singleton kernel library

anders-sandholm

Massive improvement. LGTM. Mostly comments about comments.

anders-sandholm · 2020-01-09T16:33:46Z

python/myelin/flow.py

+  def add_attr(self, name, value):
+    if type(value) is bool:
+      if value == True: value = 1
+      elif value == False: value = 0


Shorter but maybe not more readable:
value = int(value)

Yes. This is simpler. Done.

anders-sandholm · 2020-01-09T22:24:26Z

sling/myelin/rnn.cc

+      tf.RandomOrtho(x2o);
+      tf.RandomOrtho(h2o);
+
+      // i = sigmoid(x * x2i + h_in * h2i + c_in * c2i + bi)


Might just be a copy paste bug in the comment inherited from the DRAGNN LSTM.
c_in and c2i do not appear in the expression below. Moreover, c2i has not been defined/assigned yet, I believe.

Oops. That is a copy/paste bug. The standard LSTM does not have peephole connections.

anders-sandholm · 2020-01-09T22:58:57Z

sling/nlp/parser/parser-trainer.cc

@@ -315,20 +338,13 @@ void ParserTrainer::Parse(Document *document) const {
        d = action.delegate;
      }

-      // Shift or stop if predicted action is invalid.
+      // Shift if predicted action is not invalid.


invalid, not invalid or not valid? Life is just more exciting with double negations... :-)

anders-sandholm · 2020-01-09T23:02:09Z

sling/nlp/parser/parser.cc

+        d = action.delegate;
+      }
+
+      // Shift if predicted action is not invalid.


not invalid or not valid or invalid?

There is nothing not wrong with this :)

ringgaard

Thanks for the review and the comments.

ringgaard · 2020-01-10T12:26:27Z

python/myelin/flow.py

+  def add_attr(self, name, value):
+    if type(value) is bool:
+      if value == True: value = 1
+      elif value == False: value = 0


Yes. This is simpler. Done.

ringgaard · 2020-01-10T12:30:05Z

sling/myelin/rnn.cc

+      tf.RandomOrtho(x2o);
+      tf.RandomOrtho(h2o);
+
+      // i = sigmoid(x * x2i + h_in * h2i + c_in * c2i + bi)


Oops. That is a copy/paste bug. The standard LSTM does not have peephole connections.

ringgaard · 2020-01-10T12:31:57Z

sling/nlp/parser/parser.cc

+        d = action.delegate;
+      }
+
+      // Shift if predicted action is not invalid.


There is nothing not wrong with this :)

ringgaard · 2020-01-10T12:33:41Z

sling/nlp/parser/parser.cc

+        d = action.delegate;
+      }
+
+      // Shift if predicted action is not invalid.


ringgaard · 2020-01-11T15:42:23Z

This PR changes the parser flow file format, so I have trained and deployed new pre-trained models:

1-layer 128-dim bi-directional LSTM (caspar-accurate.flow):

SPAN:  P=92.59, R=93.71, F1=93.15
FRAME: P=93.70, R=94.83, F1=94.26
PAIR:  P=95.62, R=96.06, F1=95.84
EDGE:  P=80.84, R=82.30, F1=81.56
ROLE:  P=72.72, R=74.06, F1=73.38
TYPE:  P=90.14, R=91.23, F1=90.68
SLOT:  P=83.24, R=84.46, F1=83.84
TOTAL: P=88.66, R=89.83, F1=89.24
training time: 30 min
parsing speed: 40000 tokens/sec

3-layer 192-dim bi-directional LSTM (caspar-accurate.flow):

SPAN:  P=93.29, R=94.54, F1=93.91
FRAME: P=94.34, R=95.59, F1=94.96
PAIR:  P=96.13, R=96.75, F1=96.44
EDGE:  P=83.34, R=84.62, F1=83.98
ROLE:  P=75.67, R=76.86, F1=76.26
TYPE:  P=91.08, R=92.29, F1=91.68
SLOT:  P=84.99, R=86.20, F1=85.59
TOTAL: P=89.82, R=91.05, F1=90.43
training time: 4 hours
parsing speed: 2500 tokens/sec

RNN stacks

fffb7af

ringgaard requested a review from anders-sandholm January 8, 2020 12:53

ringgaard self-assigned this Jan 8, 2020

anders-sandholm approved these changes Jan 9, 2020

View reviewed changes

ringgaard commented Jan 10, 2020

View reviewed changes

Fixes for #437

0480ec4

ringgaard merged commit d95c7ef into google:master Jan 11, 2020

ringgaard deleted the rnnstack branch January 11, 2020 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNN stacks #437

RNN stacks #437

ringgaard commented Jan 8, 2020 •

edited

Loading

anders-sandholm left a comment

anders-sandholm Jan 9, 2020

ringgaard Jan 10, 2020

anders-sandholm Jan 9, 2020

ringgaard Jan 10, 2020

anders-sandholm Jan 9, 2020

anders-sandholm Jan 9, 2020

ringgaard Jan 10, 2020

ringgaard Jan 10, 2020

ringgaard left a comment

ringgaard Jan 10, 2020

ringgaard Jan 10, 2020

ringgaard Jan 10, 2020

ringgaard Jan 10, 2020

ringgaard commented Jan 11, 2020

RNN stacks #437

RNN stacks #437

Conversation

ringgaard commented Jan 8, 2020 • edited Loading

anders-sandholm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ringgaard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ringgaard commented Jan 11, 2020

ringgaard commented Jan 8, 2020 •

edited

Loading