Transformer: add update_listeners_in_predict option #342
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Draft: still needs docs, but I first wanted to discuss this proposal.
The output of a transformer is passed through in two different ways:
Doc._.trf_data
attribute.However, the
Transformer.predict
method breaks the strict separation between training and prediction by also broadcasting transformer outputs to its listeners. This was added (I think) to support training with a frozen transformer.However, this breaks down when we are training a model with an unfrozen transformer when the transformer is also in
annotating_components
. The transformer will first (as part of its update step) broadcast the tensors and backprop function to its listeners. However, then when acting as an annotating component, it would immediately override its own output and clear the backprop function. As a result, gradients will not flow into the transformer.This change fixes this issue by adding the
update_listeners_in_predict
option, which is enabled by default. When this option is disabled, the tensors will not be broadcast to listeners inpredict
.Alternatives considered:
predict
: breaks our current semantics, would make it harder to train with a frozen transformer.