Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce number of list calls on shared object store when using boltdb-shipper #3283

Conversation

sandeepsukhani
Copy link
Contributor

@sandeepsukhani sandeepsukhani commented Feb 3, 2021

What this PR does / why we need it:
We as of now do a LIST calls per table when we need to find its objects.
If someone has a lot of tables cached locally or has query readiness set to a large number of days, it results in a lot of list calls because each querier tries to sync tables every 5 mins by default.
This PR reduces the number of LIST calls we make when using hosted object stores(S3, GCS, Azure Blob Storage and Swift) as a shared store for boltdb-shipper.

The idea is to do a flat listing of objects which is only supported by hosted object stores mentioned above.
In case of boltdb files stored by the shipper, the listed objects would have keys like /.
For each List call without a prefix(which is actually done to get a list of tables), we would build a map of TableName -> chunk.StorageObject which would be used as a cache for subsequent List calls for getting the list of objects for tables.
Cache items are evicted after the first read or a timeout. The cache is rebuilt during List call for table names or we encounter a cache miss.

Special notes for your reviewer:
I am going to send a subsequent PR to move all the code for ObjectClient and object key processing to a common package to contain all the complexity of processing the object keys, responses and delimiters with a goal to provide a simplified interface to the shipper which would have methods like ListTables, ListTableObjects etc.

Fixes #3258

Checklist

  • Tests updated

@sandeepsukhani sandeepsukhani force-pushed the boltdb-shipper-shared-object-store-flat-listing branch from 5a78d00 to 1218ab4 Compare February 3, 2021 10:17

objects = c.tables[tableName]
// evict the element read from cache
delete(c.tables, tableName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we evict element read from cache ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid any possible bugs if the caller misses refreshing the cache explicitly.
The cache is integrated to make it look seamless for consumers since it only works for hosted object stores.
Removing it should not be big of a problem since we have a ttl but I just added it as an extra protection mechanism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if we ask twice the same table then we will call twice List on the objectstore ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but we don't have a use case where we will list objects from the same table twice. We usually circle through a list of tables for sync or compaction so we will not hit this case most likely. The only case it can happen is if someone has a really low sync interval and the non-deterministic iteration of maps in go can make the same table to be synced back to back.

Copy link
Contributor Author

@sandeepsukhani sandeepsukhani Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always remove this if we see it causing a problem. The problems due to stale reads would be hard to debug so I am in favour of keeping it.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM don't want to block you but I don't fully understand how cache eviction works.

@sandeepsukhani sandeepsukhani merged commit 76e713f into grafana:master Feb 12, 2021
cyriltovena pushed a commit to cyriltovena/loki that referenced this pull request Feb 15, 2021
…shipper (grafana#3283)

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
slim-bean added a commit that referenced this pull request Feb 27, 2021
slim-bean added a commit that referenced this pull request Feb 27, 2021
* revert: boltdbshipper: fix crash of "timeout" panic when boltdbshipper do init (#3369)

This appears to have introduced a different problem, still investigating.

* Revert "reduce number of list calls on shared object store when using boltdb-shipper (#3283)"

This reverts commit 76e713f
@sandeepsukhani sandeepsukhani deleted the boltdb-shipper-shared-object-store-flat-listing branch April 2, 2021 06:33
@sandeepsukhani sandeepsukhani restored the boltdb-shipper-shared-object-store-flat-listing branch April 2, 2021 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Seeing unnatural high amount of calls to S3ListObject when running BoltDB Shipper
2 participants