unintuitive `sorting_keys` scoping in new token indexers #3664

DeNeutoy · 2020-01-21T21:53:08Z

@viking-sudo-rm and I ran into some poor config design resulting from #3597

sorting_keys: ["tokens", "num_tokens"] in most configs is now replaced by:

sorting_keys: ["tokens", "{indexer_name}___{name_of_some_returned_key_from_indexer}"]

In Will's case, this looked like:
sorting_keys: ["tokens", "{roberta}___{token_ids}"]

This is pretty obscure and hard to use.

Possible solutions:

We don't care, as Guess sorting keys when none are given to BucketIterator #3603 means most config files can just delete this line (at the cost of iterating over the data to guess them before you start training)
We change the sorting keys to be a triple (field_name, indexer_name, return_name_you_have_to_know)

I think the best solution to this problem is for each TokenIndexer to have a method output_field_to_sort_by() -> str or something like this. By default, this would just be e.g "tokens". This is much better, because it means that sorting_keys now could be "["tokens", "roberta"]", both of which are available in a user's config. We should almost certainly do this.

@matt-gardner it would be good to get your opinion here, what do you think of my suggestion?

The text was updated successfully, but these errors were encountered:

matt-gardner · 2020-01-21T22:57:28Z

No objections from me, that seems reasonable. I do think that almost all use cases can just delete that line entirely, though, as the auto-detection should do the right thing in basically all cases.

DeNeutoy · 2020-01-21T23:26:40Z

Apart from it iterates over the data, which in some cases (like if your data is infinite) is extremely bad, but good that you agree 👍

DeNeutoy · 2020-01-24T23:23:01Z

Possibly instead/as well as this, we should augment #3603 to only iterate over a small amount of data, because in the general case this will work

brendan-ai2 · 2020-01-25T00:50:25Z

Probably everybody else realized this already, but just in case, given how _memory_sized_lists works, #3603 will only consider the first max_instances_in_memory or -- if the reader is lazy with no max -- the first batch_size instances. So this seems like pretty good behavior as is.

viking-sudo-rm · 2020-02-21T23:11:34Z

If people still think it's worth making the fix suggested by @DeNeutoy, I'd be happy to submit a PR addressing it.

matt-gardner · 2020-02-22T00:05:53Z

#3812 (which will get integrated into #3700) handles the biggest issue here, but actually changing the key would definitely also be an improvement. I think a big problem will be how any change here interacts with how the bucket sampler / iterator actually does sorting. If there's a good way to fix the key and make sure that this still works, then yes, a PR would be great.

dirkgr · 2020-02-24T18:21:15Z

That sounds like @viking-sudo-rm should not pick this up right now, because too much other stuff will change around this soon.

matt-gardner · 2020-02-24T18:27:12Z

Yes, I think that waiting for #3700 to get merged is a good idea, but after that, I don't think anything else around this is changing.

viking-sudo-rm · 2020-02-24T19:12:20Z

Got it. I'll wait for #3700, and assuming nothing else blocking comes up, will start work on this.

DeNeutoy · 2020-02-27T21:31:49Z

@viking-sudo-rm - you can start this now, as #3700 has been merged 👍

DeNeutoy changed the title ~~sorting_keys scoping in new token indexers~~ unintuitive sorting_keys scoping in new token indexers Jan 21, 2020

DeNeutoy added this to the 1.0.0 milestone Jan 24, 2020

matt-gardner mentioned this issue Feb 19, 2020

Using only a few instances for guessing sorting keys #3812

Closed

matt-gardner added the Breaking change label Feb 20, 2020

DeNeutoy assigned viking-sudo-rm Feb 27, 2020

viking-sudo-rm mentioned this issue Feb 28, 2020

Default sort key for token indexers #3876

Closed

DeNeutoy mentioned this issue Mar 5, 2020

Sorting keys api #3902

Merged

DeNeutoy closed this as completed in #3902 Mar 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unintuitive `sorting_keys` scoping in new token indexers #3664

unintuitive `sorting_keys` scoping in new token indexers #3664

DeNeutoy commented Jan 21, 2020

matt-gardner commented Jan 21, 2020

DeNeutoy commented Jan 21, 2020

DeNeutoy commented Jan 24, 2020

brendan-ai2 commented Jan 25, 2020

viking-sudo-rm commented Feb 21, 2020

matt-gardner commented Feb 22, 2020

dirkgr commented Feb 24, 2020

matt-gardner commented Feb 24, 2020

viking-sudo-rm commented Feb 24, 2020

DeNeutoy commented Feb 27, 2020

unintuitive sorting_keys scoping in new token indexers #3664

unintuitive sorting_keys scoping in new token indexers #3664

Comments

DeNeutoy commented Jan 21, 2020

matt-gardner commented Jan 21, 2020

DeNeutoy commented Jan 21, 2020

DeNeutoy commented Jan 24, 2020

brendan-ai2 commented Jan 25, 2020

viking-sudo-rm commented Feb 21, 2020

matt-gardner commented Feb 22, 2020

dirkgr commented Feb 24, 2020

matt-gardner commented Feb 24, 2020

viking-sudo-rm commented Feb 24, 2020

DeNeutoy commented Feb 27, 2020

unintuitive `sorting_keys` scoping in new token indexers #3664

unintuitive `sorting_keys` scoping in new token indexers #3664