Add Bulk stats track the bulk per shard #52208

zhichen · 2020-02-11T14:39:27Z

Add Bulk stats track the bulk sizes per shard and the time spent on the bulk shard request

It might make sense to track the average bulk sizes per shard , since a large bulk request may be chopped down into much smaller shard level bulk operation on an index with high numbers of shards. This makes more sense to me than just tracking at the shard level since most clients are not partitioning by shard already.

Regarding the statistics of shard bulk size, considering the high cost of re-serialization, only the source field of IndexRequest and the doc field of UpdateRequest are calculated here, while the DeleteRequest in bulk will be counted as 0.

example output:

...
     "bulk": {
           "total": 1,
           "total_time_in_millis": 412,
           "total_size_in_bytes": 83
      }
...

Relates (#50536)(#47345)

…he bulk shard request (#50536)(#47345)

elasticmachine · 2020-02-11T16:41:04Z

Pinging @elastic/es-core-features (:Core/Features/Stats)

zhichen · 2020-02-13T06:41:39Z

@elasticmachine TBR

dakrone

Thanks for opening this @zhichen! I took a look and left a few comments that we should address, let me know what you think.

Another thing that occurs to me is whether we should take this opportunity to also track the exponentially weighted moving average for the time and size of shard bulk requests, so we can have an idea for a "recent average" for time and size. What do you think? If so, we already have an ExponentiallyWeightedMovingAverage class we could use and track it alongside the totals (that way we could track both the overall average and the "more recent" average)

server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

server/src/main/java/org/elasticsearch/index/bulk/stats/BulkStats.java

server/src/main/java/org/elasticsearch/index/bulk/stats/ShardBulkStats.java

server/src/test/java/org/elasticsearch/index/bulk/stats/BulkStatsTests.java

dakrone · 2020-02-14T22:15:55Z

@elasticmachine ok to test

…e review.

cla-checker-service · 2020-02-15T12:25:16Z

Author of the following commits did not sign a Contributor Agreement:
270e8d5, 0d05e87

Please, read and sign the above mentioned agreement if you want to contribute to this project

cla-checker-service · 2020-02-15T12:51:28Z

Author of the following commits did not sign a Contributor Agreement:
270e8d5, 0d05e87, d988e98

Please, read and sign the above mentioned agreement if you want to contribute to this project

jasontedor · 2020-03-16T19:57:50Z

@probakowski Since indexing performance is incredibly important, would you mind running your methodology past someone on the @elastic/es-perf team (e.g., maybe @danielmitterdorfer?) to ensure there are not any flaws? Any regressions here would be concerning.

dliappis · 2020-03-17T09:51:44Z

@probakowski I'd like to understand better the fluctuations.

A few methodology questions:

Did you use a load driver on a separate host?
Did you ensure the environment (esp on the target host(s)) has been "reset" before each iteration? What we've found adding variance is especially untrimmed local SSD disks plus Intel Turbo boost (for Intel processors). We typically run something like:
```
sudo /sbin/fstrim --all
sync
sleep 3
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
sudo sh -c "echo 1 > /proc/sys/vm/compact_memory"
sudo sh -c "echo 1 > cat /sys/devices/system/cpu/intel_pstate/no_turbo"
```

zhichen · 2020-03-19T02:03:40Z

@dakrone sorry, I mistakenly operated on a review request, please ignore it.

zhichen · 2020-03-25T02:14:18Z

@probakowski @jasontedor is there need any more indexing performance test before merging it.

jasontedor · 2020-03-25T03:02:35Z

Yes, there is. I want to understand the methodology that was employed here. Most of the results have indicated performance regressions, I'm not convinced that they are noise. I want to understand the methodology with the questions that @dliappis asked, and also understand where these benchmarks were run (a laptop?). Indexing performance is entirely too important, we need to be cautious here.

dliappis · 2020-03-26T07:49:21Z

FYI we've synced up with @probakowski offline a few days ago to take a more critical look on the methodology, and he's currently working on a more thorough iteration, including using a higher amount of shards (since this PR adds stats per shard), isolated load driver and nodes, better choice of instances etc.

zhichen · 2020-04-09T11:54:03Z

hi @probakowski is there any update?

probakowski · 2020-04-17T17:01:19Z

Hi @zhichen, very sorry for the late update. I was finally able to get stable environment for testing and better testing methodology (thanks @dliappis and @danielmitterdorfer!) and was able to confirm that there's no visible impact on performance here (difference in median throughput was less than 0,5%, the same range as between different runs of master).

I'll resolve conflicts and merge/backport the change.

Thanks for your work!

zhichen · 2020-04-18T05:26:29Z

Thanks @probakowski . It's nice to see that this PR will be merged so that we can use this feature on 7.8 or 8.0

* Add Bulk stats track the bulk sizes per shard and the time spent on the bulk shard request (elastic#50536)(elastic#47345)

russcam · 2020-06-09T03:03:08Z

@probakowski this change doesn't appear to have been backported to 7.8:

elasticsearch/server/src/main/java/org/elasticsearch/action/admin/indices/stats/CommonStats.java

Lines 109 to 165 in e142d69

    
           public CommonStats(CommonStatsFlags flags) { 
        
               CommonStatsFlags.Flag[] setFlags = flags.getFlags(); 
        
               for (CommonStatsFlags.Flag flag : setFlags) { 
        
                   switch (flag) { 
        
                       case Docs: 
        
                           docs = new DocsStats(); 
        
                           break; 
        
                       case Store: 
        
                           store = new StoreStats(); 
        
                           break; 
        
                       case Indexing: 
        
                           indexing = new IndexingStats(); 
        
                           break; 
        
                       case Get: 
        
                           get = new GetStats(); 
        
                           break; 
        
                       case Search: 
        
                           search = new SearchStats(); 
        
                           break; 
        
                       case Merge: 
        
                           merge = new MergeStats(); 
        
                           break; 
        
                       case Refresh: 
        
                           refresh = new RefreshStats(); 
        
                           break; 
        
                       case Flush: 
        
                           flush = new FlushStats(); 
        
                           break; 
        
                       case Warmer: 
        
                           warmer = new WarmerStats(); 
        
                           break; 
        
                       case QueryCache: 
        
                           queryCache = new QueryCacheStats(); 
        
                           break; 
        
                       case FieldData: 
        
                           fieldData = new FieldDataStats(); 
        
                           break; 
        
                       case Completion: 
        
                           completion = new CompletionStats(); 
        
                           break; 
        
                       case Segments: 
        
                           segments = new SegmentsStats(); 
        
                           break; 
        
                       case Translog: 
        
                           translog = new TranslogStats(); 
        
                           break; 
        
                       case RequestCache: 
        
                           requestCache = new RequestCacheStats(); 
        
                           break; 
        
                       case Recovery: 
        
                           recoveryStats = new RecoveryStats(); 
        
                           break; 
        
                       default: 
        
                           throw new IllegalStateException("Unknown Flag: " + flag); 
        
                   } 
        
               }

Should it be backported, or should the v.7.8.0 label be removed?

ywelsch · 2020-07-03T06:29:17Z

This hasn't been backported to 7.9.0 either.

Please hold off on any backports now (I will remove the version label as well) as this is possibly interfering with other work that we are doing in this area (related to #58885). We will need a cohesive plan first.

Add Bulk stats track the bulk sizes per shard and the time spent on t…

270e8d5

…he bulk shard request (#50536)(#47345)

zhichen changed the title ~~Add Bulk stats track the bulk sizes per shard~~ Add Bulk stats track the bulk per shard Feb 11, 2020

zhichen requested review from henningandersen, ywelsch, DaveCTurner and jbaiera and removed request for henningandersen and ywelsch February 11, 2020 14:45

zhichen mentioned this pull request Feb 11, 2020

add stats for bulk_shard sizes/lantency/qps #50536

Closed

zhichen requested a review from dakrone February 11, 2020 14:53

dakrone added the :Data Management/Stats Statistics tracking and retrieval APIs label Feb 11, 2020

zhichen requested a review from martijnvg February 12, 2020 10:13

jakelandis removed request for DaveCTurner, henningandersen, ywelsch, jbaiera and martijnvg February 13, 2020 15:17

dakrone requested changes Feb 14, 2020

View reviewed changes

dakrone added >enhancement v7.7.0 v8.0.0 labels Feb 14, 2020

Refactoring bulk stats test and add some java docs as mentioned in th…

0d05e87

…e review.

Adjust the code style

d988e98

zhichen requested a review from dakrone March 19, 2020 02:00

bpintea added v7.8.0 and removed v7.7.0 labels Mar 25, 2020

Merge branch 'master' into master

f6e3fb5

probakowski merged commit 05066ae into elastic:master Apr 20, 2020

probakowski added the backport pending label Apr 20, 2020

probakowski pushed a commit to probakowski/elasticsearch that referenced this pull request Apr 20, 2020

Add Bulk stats track the bulk per shard (elastic#52208)

2e5611f

* Add Bulk stats track the bulk sizes per shard and the time spent on the bulk shard request (elastic#50536)(elastic#47345)

dakrone mentioned this pull request Apr 20, 2020

[CI] IndexStats.testBulkStats fails looking for a stat that is zero #55485

Closed

igoristic mentioned this pull request Apr 24, 2020

[Metricbeat][parity-tests] Missing some newly added ES fields elastic/beats#17977

Closed

ycombinator mentioned this pull request Apr 25, 2020

Collect bulk indexing stats for Elasticsearch metricsets elastic/beats#17992

Merged

6 tasks

ycombinator mentioned this pull request May 12, 2020

Update version when bulk param is available in ES stats API elastic/beats#18459

Merged

6 tasks

russcam mentioned this pull request May 29, 2020

7.8.0 Meta ticket elastic/elasticsearch-net#4718

Closed

17 tasks

jrodewig mentioned this pull request Jun 10, 2020

[DOCS] Add release notes for 7.8.0 #56340

Merged

pugnascotia added v7.9.0 and removed v7.8.0 labels Jun 12, 2020

ywelsch removed backport pending v7.9.0 labels Jul 3, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Bulk stats track the bulk per shard #52208

Add Bulk stats track the bulk per shard #52208

zhichen commented Feb 11, 2020 •

edited

Loading

elasticmachine commented Feb 11, 2020

zhichen commented Feb 13, 2020 •

edited

Loading

dakrone left a comment

dakrone commented Feb 14, 2020

cla-checker-service bot commented Feb 15, 2020

cla-checker-service bot commented Feb 15, 2020

jasontedor commented Mar 16, 2020

dliappis commented Mar 17, 2020

zhichen commented Mar 19, 2020

zhichen commented Mar 25, 2020

jasontedor commented Mar 25, 2020

dliappis commented Mar 26, 2020

zhichen commented Apr 9, 2020

probakowski commented Apr 17, 2020

zhichen commented Apr 18, 2020

russcam commented Jun 9, 2020

ywelsch commented Jul 3, 2020

Add Bulk stats track the bulk per shard #52208

Add Bulk stats track the bulk per shard #52208

Conversation

zhichen commented Feb 11, 2020 • edited Loading

elasticmachine commented Feb 11, 2020

zhichen commented Feb 13, 2020 • edited Loading

dakrone left a comment

Choose a reason for hiding this comment

dakrone commented Feb 14, 2020

cla-checker-service bot commented Feb 15, 2020

cla-checker-service bot commented Feb 15, 2020

jasontedor commented Mar 16, 2020

dliappis commented Mar 17, 2020

zhichen commented Mar 19, 2020

zhichen commented Mar 25, 2020

jasontedor commented Mar 25, 2020

dliappis commented Mar 26, 2020

zhichen commented Apr 9, 2020

probakowski commented Apr 17, 2020

zhichen commented Apr 18, 2020

russcam commented Jun 9, 2020

ywelsch commented Jul 3, 2020

zhichen commented Feb 11, 2020 •

edited

Loading

zhichen commented Feb 13, 2020 •

edited

Loading