Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ExplicitTags queries with FuzzyFilters #1896

Merged
merged 1 commit into from
May 12, 2020

Conversation

ronanh
Copy link

@ronanh ronanh commented Dec 27, 2019

This patch reworks how FuzzyFilters are used in the case of ExplicitTags queries:

  • Use a list of FuzzyFilters on tag key and value instead of a single FuzzyFilter over tag keys
  • No more need for a Rowkey Regex filter (All filtering done by FuzzyFilters + scanner start/stop keys)

It brings noticeable performance improvement when filtering on tag value for metrics with high cardinality (eg. a query for metric for a single host, with host tag cardinality of 20000 which took 9s is down to 250ms)

The list of fuzzy filters is constructed by a new method QueryUtil.buildFuzzyFilters(row_key_literals).
It is used for filtering all combinations of tags provided by the caller.

So for 3 tags cluster, service and vm, with filtering values: cluster=dev|qa|test, service=mysvc and vm=web|db, a list of 8 (= 3 x 1 x 2) FuzzyFilters covering all combinations will be returned.

A few remarks:

  • New (private) version of QueryUtil.getRowKeyUIDRegex(row_key_literals, explicit_tags): simplified variant stripped of everything related to FuzzyFilters
  • QueryUtil.getRowKeyUIDRegex(group_bys, row_key_literals) and QueryUtil.getRowKeyUIDRegex(group_bys, row_key_literals, explicit_tags, fuzzy_key, fuzzy_mask) are left unchanged as these are public API (but not used anymore)

@ronanh
Copy link
Author

ronanh commented Dec 27, 2019

I also have an experiment_explicit_tags_gui branch in my repo for testing explicit tags from the GUI.

@avermeer
Copy link

avermeer commented Jan 2, 2020

Hi @ronanh, your contrib sounds great to get best performances from OpenTSDB !
Hi @manolama, do you think this PR could be reviewed & integrated if no objection?

Kind regards,
Alexandre

@manolama
Copy link
Member

Hi @ronanh, what versions of HBase did you test this with? An issue I ran to with older versions was that it failed to actually match on the values, only keys. Thanks!

@ronanh
Copy link
Author

ronanh commented Feb 11, 2020

Yes, I saw a comment in the code referring to the issue you're talking about, but it was not really clear to me what the problem really was. I did not encounter anything like this.
I'm working with HBase from Hortonworks distibution (2.6.5), which is HBase 1.1.2 plus a number of patches. See the details here.

@manolama
Copy link
Member

Thanks! I think we were still on a 0.98 branch where in setting the filter on the key values caused nothing to return. I'll go through this and make sure it will still handle the case where a user may have a regex or wildcard on a value.

@ronanh
Copy link
Author

ronanh commented Feb 12, 2020

There's been several issues with FuzzyFilters in earlier HBase versions. In my experience it's also quite sensitive to how FuzzyFilter are serialized in the client request, eg. if you use different jar version in client and server (don't know how this applies to AsyncHBase client though)

@manolama manolama merged commit 81d2112 into OpenTSDB:master May 12, 2020
@johann8384 johann8384 added this to the v2.5.0 milestone Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants