Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Implementing ElementCount and ElementBytes Triggers #734

Merged
merged 5 commits into from
Jun 24, 2019

Conversation

rahulKQL
Copy link
Contributor

This is in continuation of follow up PR for new Batching API(#716)


What This PR contains

Added two new thresholds in the BatcherImpl.java which will trigger automatic batch processing upon breach of any one of this threshold.

  • ElementCount: Number of elements queued up till now.
  • ElementBytes: Size in bytes of the accumulated elements.

A threshold would be added in BatcherImpl via v2.BatchingSettings.java and It will be calculated with the help of BatchingThreshold.java, v2.ElementCounter.java & v2.NumericThreshold.java.

Note: Once this PR is merged, I would raise another followup PR with DelayThreshold/maxDelay.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jun 14, 2019
@rahulKQL
Copy link
Contributor Author

@igorbernstein2 Please have a look.

Also Please let me know If we should split this in two part (like separate PR for copied classes eg. BatchingSettings.java, ElementCounter.java and NumericThreshold.java?)

@codecov
Copy link

codecov bot commented Jun 19, 2019

Codecov Report

Merging #734 into master will increase coverage by 0.19%.
The diff coverage is 96.15%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #734      +/-   ##
============================================
+ Coverage     77.66%   77.86%   +0.19%     
- Complexity     1091     1107      +16     
============================================
  Files           197      199       +2     
  Lines          4778     4829      +51     
  Branches        371      379       +8     
============================================
+ Hits           3711     3760      +49     
  Misses          898      898              
- Partials        169      171       +2
Impacted Files Coverage Δ Complexity Δ
...m/google/api/gax/batching/v2/NumericThreshold.java 100% <100%> (ø) 5 <5> (?)
...va/com/google/api/gax/batching/v2/BatcherImpl.java 94.89% <100%> (+2.25%) 19 <8> (+9) ⬆️
...m/google/api/gax/batching/v2/BatchingSettings.java 83.33% <83.33%> (ø) 2 <2> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4554ed...4f8a561. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 19, 2019

Codecov Report

Merging #734 into master will increase coverage by 0.05%.
The diff coverage is 91.66%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #734      +/-   ##
============================================
+ Coverage     77.66%   77.72%   +0.05%     
- Complexity     1091     1094       +3     
============================================
  Files           197      198       +1     
  Lines          4778     4799      +21     
  Branches        371      377       +6     
============================================
+ Hits           3711     3730      +19     
  Misses          898      898              
- Partials        169      171       +2
Impacted Files Coverage Δ Complexity Δ
...va/com/google/api/gax/batching/v2/BatcherImpl.java 93.67% <100%> (+1.02%) 11 <0> (+1) ⬆️
...m/google/api/gax/batching/v2/BatchingSettings.java 80% <80%> (ø) 2 <2> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06280df...ff99cc7. Read the comment docs.

* Represents the batching settings to use for an API method that is capable of batching.
*
* <p>Each batching client will have to define their own set of default values for these thresholds,
* which would be safest behavior for their jobs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be in indecisive here, but I think that the defaults should be Long.MAX_VALUE. I think will serve better to capture a user's intention when they write Batching.newBuilder().setMaxElements(100).build(). Specifically when user doesn't set a maximum value, I think they intuition is that there is no max. And now that manual flushing exists and is hooked into close(), I think it's safer to have these defaults.

Also in follow up PRs, I think we should:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please update the docs & tests to always close the batcher

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the reference. Would sure look into it and raise follow up PR accordingly.

Also, Please have a look at this PR, I have addressed these feedback in the latest commit.

@igorbernstein2
Copy link
Contributor

LGTM, @vam-google, can you take a pass?

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Jun 21, 2019
/** Get a new builder. */
public static Builder newBuilder() {
return new AutoValue_BatchingSettings.Builder()
.setElementCountThreshold(Long.MAX_VALUE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long.MAX_VALUE is effectively Never. Is it what we want here? More importantly, why do we use long for element count threshold? Do we really intend to support more than 2 billion elements (we most probably will get out of memory long before reaching 2 billion anyways)? Long for bytes threshold is probably Ok, but why for elements?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that element bytesize flushing should be disabled by default. Totally fine to downgrade to an int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vam-google, @igorbernstein2 Thanks! for the review. Have updated element count to be int type with the default of Integer.MAX_VALUE.

public static Builder newBuilder() {
return new AutoValue_BatchingSettings.Builder()
.setElementCountThreshold(Long.MAX_VALUE)
.setRequestByteThreshold(Long.MAX_VALUE);
Copy link
Contributor

@vam-google vam-google Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use some reasonably big value here (and in element count) instead of MAX_VALUE? People most probably will go with default values first (yes, everything is documented, but people read documentaiton only after something broke..). The default values can result in OutOfMemoryException quite easily. And even if not, it may be confusing for people that their requests do not get sent for some reason (because it just keeps accumulating everything).

Another option is to always enforce setting those values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 audiences here:

  • client developers
  • end users

The client developers will configure defaults per service in their StubSettings. Each service will have different profiles. When looking at the code:

  BatchingSettings.newBuilder().setMaxElementCount(100).build(); 

I would be very surprised if it flushed a single large element. I would want it to be spelled out in the StubSetting defaults as

  BatchingSettings.newBuilder().setElementCountThreshold(100).setRequestByteThreshold(1024).build(); 

The other audience is the end users, the expectations with them is that they should use toBuilder() on the client developer provided instance to tweak stuff

Copy link
Contributor

@vam-google vam-google Jun 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igorbernstein2 Who are the client developers? Do you mean the manual layer on top of this? If yes, then most of the generated clients don't have that wrapper, so the there is simply no toBuilder() provided for them by the client developer.

This can also be done in gapic-generator level (generator will be the repo, where the "default" value can be found), ut it seems like an unnecessary complication (why to split in in two places and make more confusing, when it can be just here and documented so people can easily find the default value). Also, currently MAX_VALUE is the default value, and if it actually (in the manual wrapper or gapic-generator) is always set to something different by default, then it is even more confusing (we end up in a situation, when we have a default value, which is by default overridden by another default value).

In general it looks like this value should not have a default value at all, and build() on the builder should fail if there is not explicit value provided (which i think is perfectly fine, and probably the best solution here, enforcing setting values, which are hard/impossible to have default values for)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with no default

rahulKQL added 2 commits June 24, 2019 14:24
Added two threshold in the `BatcherImpl` which will trigger automatic batching upon breach of any of these.
 - ElementCount: Number of elements queued up till now.
 - ElementBytes: Size of the accumulated elements.
 - Threshold can be set using `BatchingSettings`.
 - Javadoc fixes to remove v1 reference
 - Removed unnecessary abstraction and made thresholds values as required

Updated BatchingSettings default value to Long.MAX_VALUE
also, added Javadoc at BatcherImpl XOR.
Preconditions.checkState(
settings.getElementCountThreshold() > 0, "elementCountThreshold must be positive");
Preconditions.checkState(
settings.getRequestByteThreshold() > 0, "requestByteThreshold must be positive");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think allowing 0 is ok here to signify that batching is disabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated the BatchingSettings to accept 0 and added a line in Javadoc for the same.
Now BatcherImpl.Batch#hasAnyThresholdReached checks the disability.


boolean hasAnyThresholdReached() {
return (elementThreshold != 0 && elementThreshold <= elementCounter)
|| (bytesThreshold != 0 && bytesThreshold <= byteCounter);
Copy link
Contributor

@igorbernstein2 igorbernstein2 Jun 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want it to immediately flush if threshold is zero, please remove the !=0 clauses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I assumed disable meaning disabling the threshold(though that can be done by setting a very high value).
Just a QQ: If the user sets any threshold to 0 then all other thresholds will be ignored. Is this what we want to add here... right?

Copy link
Contributor

@vam-google vam-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with few comments

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes This human has signed the Contributor License Agreement. 🚨 This issue needs some love.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants