Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard ingester queries. #3852

Merged
merged 16 commits into from
Jul 7, 2021
Merged

Conversation

cyriltovena
Copy link
Contributor

This is still experimental but already yield from 2x to 6x faster for short period queries.

I'm still playing with it but I want to share how I do it early.

Signed-off-by: Cyril Tovena cyril.tovena@gmail.com

This is still experimental but already yield from 2x to 6x faster for short period queries.

I'm still playing with it but I want to share how I do it early.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena
Copy link
Contributor Author

This is promising, I'm getting exceptional result in our biggest cluster.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
cyriltovena added a commit to cyriltovena/loki that referenced this pull request Jun 15, 2021
This PR introduces Series Queries Sharding. It does not check the boundaries of ingesters data since I'm assuming
grafana#3852 will be merge first.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena cyriltovena mentioned this pull request Jun 15, 2021
Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some suggestions which I tested locally but didn't want to push to your branch. They are based off my fork of this PR.

This does two things:

  1. Fixes the hashing calculations - notably maintaining that the internal hash space must be consistent. i.e. moving from 2 buckets to 4, the first bucket (0), covers the first 50% of the hash ring. Moving this same space into 4 buckets should actually map shards (0_of_2) -> (0_of_4, 1_of_4) instead of (0_of_2) -> (0_of_4, 2_of_4).
  2. Adds logic for mapping any schema sharding factor into the relevant superset [shard] of any inverted index sharding factor. Notably, this would allow us to decouple the two. However, because the shard factors wouldn't need to be mutually divisible any more, we'd need to amend the Lookup, LabelNames, and LabelValues to also filter any returned values, ensuring we filter out any not belonging to the desired schema shard. Because this code is currently protected by the validateShard function, we can safely defer this decision for now.

One thing which I didn't include but expect we'll need is to align the hash functions used in the Cortex Index with the one used here. Notably, this is not exposed from cortex and is currently hardcoded into the Entries implementations. Since the chunk store is deprecated at this point, I expect we can copy that as it's not subject to change. The reason why we'll need to agree on a specific hash function here is because the querier queries both the store and the ingester for a particular shard. Our queries will quickly prove incorrect when series belong to different shards on each of these query paths.

pkg/ingester/index/index_test.go Outdated Show resolved Hide resolved
pkg/ingester/index/index.go Outdated Show resolved Hide resolved
@cyriltovena
Copy link
Contributor Author

Our queries will quickly prove incorrect when series belong to different shards on each of these query paths.

Not sure about that.

@cyriltovena
Copy link
Contributor Author

So I have an idea :) I'm going to change sharding in the store to align it with ingester see you soon !

cyriltovena and others added 4 commits July 1, 2021 08:26
Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
Utimately we should have a storage that relies on fingerprint, but that's harder to change.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena cyriltovena marked this pull request as ready for review July 1, 2021 08:53
@cyriltovena
Copy link
Contributor Author

@owen-d I think this is ready I've aligned the shard computation in the ingester. This means they use the same type of hash. I would have prefer to keep it based of the fingerprint but it's hard to change in the storage.

Something to keep in mind when we redesign the storage.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits, then LGTM! We'll need to eventually add a way to either enforce that schema configs are multiples of indexShards or find a way to refactor the inverted index to work with any arbitrary sharding factor.

pkg/ingester/index/index.go Outdated Show resolved Hide resolved
pkg/ingester/index/index.go Show resolved Hide resolved
pkg/ingester/index/index.go Show resolved Hide resolved
pkg/ingester/index/index.go Outdated Show resolved Hide resolved
pkg/ingester/index/index.go Show resolved Hide resolved
pkg/ingester/index/index_test.go Outdated Show resolved Hide resolved
pkg/ingester/instance.go Show resolved Hide resolved
cyriltovena and others added 4 commits July 7, 2021 09:51
Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
@cyriltovena
Copy link
Contributor Author

Couple nits, then LGTM! We'll need to eventually add a way to either enforce that schema configs are multiples of indexShards or find a way to refactor the inverted index to work with any arbitrary sharding factor.

I don't want to invest too much in this yet. Not sure what is the future of that code.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena cyriltovena merged commit 97912b6 into grafana:main Jul 7, 2021
owen-d pushed a commit that referenced this pull request Jul 7, 2021
* Shards Series API.

This PR introduces Series Queries Sharding. It does not check the boundaries of ingesters data since I'm assuming
#3852 will be merge first.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Fixes tests sorting.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants