Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

RNN stacks #437

Merged
merged 2 commits into from
Jan 11, 2020
Merged

RNN stacks #437

merged 2 commits into from
Jan 11, 2020

Conversation

ringgaard
Copy link
Contributor

@ringgaard ringgaard commented Jan 8, 2020

I have implemented a new RNN module for the parser that implements a number of different RNN architectures:

  • LSTM - Standard LSTM [Hochreiter & Schmidhuber 1997].
  • DRAGNN LSTM - LSTM with peephole connections [Gers & Schmidhuber 2000] and coupled forget and input gates [Greff et al. 2015].
  • DOZAT LSTM - Standard LSTM with one matrix multiplication [Dozat & Manning 2017].
  • PYTORCH LSTM - Standard LSTM with two matrix multiplications [Paszke et al. 2019].
  • GRU -Gated Recurrent Unit (GRU) [Cho et al. 2014].

The RNN module also supports:

  • Multiple layers
  • Uni-directional and bi-directional RNNs.
  • High-way connections.
  • Dropout.

The best parser model with one bi-directional LSTM layer produces the following results:

SPAN:  P=92.78, R=93.76, F1=93.27
FRAME: P=93.85, R=94.84, F1=94.34
PAIR:  P=95.76, R=96.12, F1=95.94
EDGE:  P=81.41, R=82.30, F1=81.85
ROLE:  P=73.32, R=74.14, F1=73.73
TYPE:  P=90.22, R=91.18, F1=90.70
LABEL: P= 0.00, R= 0.00, F1= 0.00
SLOT:  P=83.55, R=84.45, F1=84.00
TOTAL: P=88.90, R=89.85, F1=89.37
training time: 30 mins
parsing speed: 40000 tokens/sec

You can make a more accurate model with a 3-layer 192-dim bi-directional LSTM:

SPAN:  P=93.09, R=94.49, F1=93.79
FRAME: P=94.17, R=95.58, F1=94.87
PAIR:  P=96.05, R=96.78, F1=96.42
EDGE:  P=83.01, R=84.92, F1=83.96
ROLE:  P=75.30, R=77.07, F1=76.17
TYPE:  P=90.91, R=92.27, F1=91.58
LABEL: P= 0.00, R= 0.00, F1= 0.00
SLOT:  P=84.72, R=86.27, F1=85.49
TOTAL: P=89.59, R=91.07, F1=90.33
training time: 4 hours
parsing speed: 2500 tokens/sec

For comparison, the baseline for the old PyTorch-based trainer is:

SPAN F1:  92.6639
FRAME F1: 93.8144
TYPE F1:  89.7554
ROLE F1:  72.2666
SLOT F1:  82.8597
TOTAL F1: 88.5455
training time: 16 hours
parsing speed: 9000 tokens/sec

Details:

  • Parser feature tracing has been removed
  • New PAIR and EDGE parser benchmarks
  • Deprecate EMBED, ELABORATE, and STOP parser actions
  • Remove mark distance feature and frame create feature
  • Learning rate cliff for fine-tuning model in final phrase
  • Support for parser training restart
  • Model parameter initialization (uniform, normal, ortho)
  • Flow variable attributes (flow version 6)
  • Support for dynamically sized tensors with dynamic (Concat and Split)
  • Generic ArgMax for all types.
  • Per-tensor L2 regularization parameter
  • On-demand loading for tensors in elementwise index generator
  • Myelin compiler --compile_only and --param_stats
  • ConcatV2 op renamed to Concat
  • Remove support for singleton kernel library

@ringgaard ringgaard self-assigned this Jan 8, 2020
Copy link
Contributor

@anders-sandholm anders-sandholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massive improvement. LGTM. Mostly comments about comments.

def add_attr(self, name, value):
if type(value) is bool:
if value == True: value = 1
elif value == False: value = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorter but maybe not more readable:
value = int(value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is simpler. Done.

tf.RandomOrtho(x2o);
tf.RandomOrtho(h2o);

// i = sigmoid(x * x2i + h_in * h2i + c_in * c2i + bi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might just be a copy paste bug in the comment inherited from the DRAGNN LSTM.
c_in and c2i do not appear in the expression below. Moreover, c2i has not been defined/assigned yet, I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. That is a copy/paste bug. The standard LSTM does not have peephole connections.

@@ -315,20 +338,13 @@ void ParserTrainer::Parse(Document *document) const {
d = action.delegate;
}

// Shift or stop if predicted action is invalid.
// Shift if predicted action is not invalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalid, not invalid or not valid? Life is just more exciting with double negations... :-)

d = action.delegate;
}

// Shift if predicted action is not invalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not invalid or not valid or invalid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing not wrong with this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

@ringgaard ringgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and the comments.

def add_attr(self, name, value):
if type(value) is bool:
if value == True: value = 1
elif value == False: value = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is simpler. Done.

tf.RandomOrtho(x2o);
tf.RandomOrtho(h2o);

// i = sigmoid(x * x2i + h_in * h2i + c_in * c2i + bi)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. That is a copy/paste bug. The standard LSTM does not have peephole connections.

d = action.delegate;
}

// Shift if predicted action is not invalid.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing not wrong with this :)

d = action.delegate;
}

// Shift if predicted action is not invalid.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@ringgaard
Copy link
Contributor Author

This PR changes the parser flow file format, so I have trained and deployed new pre-trained models:

1-layer 128-dim bi-directional LSTM (caspar-accurate.flow):

SPAN:  P=92.59, R=93.71, F1=93.15
FRAME: P=93.70, R=94.83, F1=94.26
PAIR:  P=95.62, R=96.06, F1=95.84
EDGE:  P=80.84, R=82.30, F1=81.56
ROLE:  P=72.72, R=74.06, F1=73.38
TYPE:  P=90.14, R=91.23, F1=90.68
SLOT:  P=83.24, R=84.46, F1=83.84
TOTAL: P=88.66, R=89.83, F1=89.24
training time: 30 min
parsing speed: 40000 tokens/sec

3-layer 192-dim bi-directional LSTM (caspar-accurate.flow):

SPAN:  P=93.29, R=94.54, F1=93.91
FRAME: P=94.34, R=95.59, F1=94.96
PAIR:  P=96.13, R=96.75, F1=96.44
EDGE:  P=83.34, R=84.62, F1=83.98
ROLE:  P=75.67, R=76.86, F1=76.26
TYPE:  P=91.08, R=92.29, F1=91.68
SLOT:  P=84.99, R=86.20, F1=85.59
TOTAL: P=89.82, R=91.05, F1=90.43
training time: 4 hours
parsing speed: 2500 tokens/sec

@ringgaard ringgaard merged commit d95c7ef into google:master Jan 11, 2020
@ringgaard ringgaard deleted the rnnstack branch January 11, 2020 15:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants