Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix valueLog doRunGC to use consistent units #724

Closed
wants to merge 1 commit into from

Conversation

mschoch
Copy link
Contributor

@mschoch mschoch commented Feb 26, 2019

some comparisons were made between bytes and megabytes
this change tracks all values in bytes and makes comparisons
using consistent units


This change is Reviewable

some comparisons were made between bytes and megabytes
this change tracks all values in bytes and makes comparisons
using consistent units
@CLAassistant
Copy link

CLAassistant commented Feb 26, 2019

CLA assistant check
All committers have signed the CLA.

@mschoch
Copy link
Contributor Author

mschoch commented Feb 26, 2019

This could have been addressed a couple ways, I chose to just do everything in bytes instead of MB. This removes a few divisions, and the only downside is that the trace messages will output less friendly numbers.

@mschoch
Copy link
Contributor Author

mschoch commented Feb 26, 2019

Not sure if the appveyor failure is related or not, but fixing this bug may expose other issues in value log GC.

Specifically, because this line was comparing bytes and megabytes, we never would have stopped sampling because the number of bytes we read:

https://github.com/dgraph-io/badger/blob/master/value.go#L1188

That means we either ran out of time (10s), read 10000 entries, or got to the end of the log. This then relates to the checks we perform here:

https://github.com/dgraph-io/badger/blob/master/value.go#L1253

Notice how the row count check here is stricter than the size check. This ends up related, because now there will be more cases where we stopped sampling due to size, thus more cases where the row count might not be satisfied any more. Obviously all of this depends on the data and tuning in practice, but it's worth pointing out.

On that same topic, I have concerns about users being able to tune this effectively. I think it makes total sense that you want lots of ways to constrain how much data is sampled (entries, size, time), but at this point we're trying to decide if the sample was good enough. It seems like satisfying size or entries ought to be enough, but as coded today we require both. I anticipate this being hard for some users to tune, as satisfying both requires you have a good idea of the value sizes, which might vary considerably in practice. Any thoughts on this check?

@mschoch
Copy link
Contributor Author

mschoch commented Feb 26, 2019

Actually, perhaps we should disregard my last comment. At least when doing offline compaction, just applying this fix makes it work very well, without any other changes or tuning.

@mschoch mschoch mentioned this pull request Feb 26, 2019
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add some context, we used to do it purely based on size, but if key-values are really small, then that ended up causing a lot of LSM tree lookups to sample the log file. That's why we added the criteria for the number of keys, so we can quit early if we have enough key samples.

I'd prefer keeping the units to MB and the fields in reason to floats.

Reviewable status: 0 of 1 files reviewed, all discussions resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants