-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Conversation
Also worth mentioning that this should be fully backwards compatible. In other words, old-style |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viking-sudo-rm - unfortunately this isn't quite right, as it's leaked into the sampler which it should be completely independent from.
Right now, we have something that looks like:
sorting_keys: ["tokens", "{indexer_name}___{name_of_some_returned_key_from_indexer}"]
The idea of this PR was to actually replace entirely the underscore notation with the indexer_name
only, as then we can call the method that you have added get_default_sort_key
to retrieve the actual name. So the idea was that the new config could look like:
sorting_keys: ["tokens", "{indexer_name}"]
And then we could replace this line with something like padding_key = self.token_indexers[indexer_name].get_default_sort_key()
.
Does that make sense? We can chat today about this if you have some Qs
@@ -144,5 +153,22 @@ def _guess_sorting_keys(self, instances: Iterable[Instance], num_instances: int | |||
) | |||
self.sorting_keys = [longest_padding_key] | |||
|
|||
def _expand_sorting_keys(self, schema: Instance) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we need this to exist on the Sampler
class means this design needs to be revisited.
Closing since #3902 is potentially a better fix. |
Gave each
TokenIndexer
a default sorting key, so that if the user wants to manually specify sorting keys for aBucketBatchSampler
, they do not need to know the internal key names used by theTokenIndexer
.For example, say you have a
TextField
called "text" with aPretrainedTransformerIndexer
called "roberta". The sorting keys would now be(text, roberta)
instead of(text, roberta___token_ids)
.I gave each
Field
anexpand_sort_key
method, which can be used to rewrite default user-specified sort keys into the underlying internal format. Currently, the only field that utilizes this isTextField
, although it would be straightforward to extend the same pattern to other fields.Fixes #3664