A large number of duplicated update mapping tasks take down master node. #66768

howardhuanghua · 2020-12-23T05:18:24Z

Issue

Elasticsearch version (bin/elasticsearch --version): 7.5.1

Plugins installed: []

JVM version (java -version): 1.8.0.181

OS version (uname -a if on a Unix-like system): CentOS Linux 3.1.0

ES supports auto create index / update mapping during handling bulk request. We have created one index with 100 primary shards, and cluster has high concurrency bulk requests(20k+/s) to the target index at the same time, and they all have added same new fields (would concurrently trigger mapping update). Then huge duplicated update mapping tasks would make master hang for long time (cpu almost 100% consumed). Then other nodes disconnected from current master.

Related logs, you could see that lot's of duplicated _doc types of mapping update task has been calculated. And also they need to call lot's of listeners.

[2020-12-18T09:52:42,632][WARN ][o.e.c.s.MasterService    ] [1608186356000606732] took [20.2s], which is over [10s], to compute cluster state update for [put-mapping[_doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc...............


[2020-12-18T10:19:08,583][WARN ][o.e.c.s.ClusterApplierService] [1608186356000606732] cluster state applier task [Publication{term=99, version=706636}] took [9.6m] which is above the warn threshold of [30s]: [running task [Publication{term=99, version=706636}]] took [0ms], [connecting to new nodes] took [0ms], [applying settings] took [0ms], [running applier [org.elasticsearch.indices.cluster.IndicesClusterStateService@6a7b290d]] took [1ms], [running applier [org.elasticsearch.script.ScriptService@5030671a]] took [0ms], [running applier [org.elasticsearch.xpack.ilm.IndexLifecycleService@3f48459b]] took [10ms], [running applier [org.elasticsearch.repositories.RepositoriesService@6d1eb292]] took [0ms], [running applier [org.elasticsearch.snapshots.RestoreService@5e9ecd7d]] took [0ms], [running applier [org.elasticsearch.ingest.IngestService@d456903]] took [0ms], [running applier [org.elasticsearch.action.ingest.IngestActionForwarder@70dc91ad]] took [0ms], [running applier [org.elasticsearch.action.admin.cluster.repositories.cleanup.TransportCleanupRepositoryAction$$Lambda$3326/1448648448@68ce2706]] took [0ms], [running applier [org.elasticsearch.tasks.TaskManager@4013fa47]] took [0ms], [running applier [org.elasticsearch.snapshots.SnapshotsService@66388e37]] took [0ms], [notifying listener [org.elasticsearch.cluster.InternalClusterInfoService@348ccb0d]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.support.SecurityIndexManager@1667a29d]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.support.SecurityIndexManager@dd482ff]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.authc.TokenService$$Lambda$2005/551229951@70943baa]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$$Lambda$2096/960510275@22924216]] took [0ms], [notifying listener [org.elasticsearch.xpack.watcher.support.WatcherIndexTemplateRegistry@9ca6f93]] took [2ms], [notifying listener [org.elasticsearch.xpack.watcher.WatcherLifeCycleService@2020c72f]] took [0ms], [notifying listener [org.elasticsearch.xpack.watcher.WatcherIndexingListener@71b8f1ea]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectProcessManager@410a8f39]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.datafeed.DatafeedManager$TaskRunner@5eece0c1]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.MlAssignmentNotifier@30d1516b]] took [3ms], [notifying listener [org.elasticsearch.xpack.ml.MlInitializationService@3a7e90a0]] took [2ms], [notifying listener [org.elasticsearch.xpack.ilm.IndexLifecycleService@3f48459b]] took [0ms], [notifying listener [org.elasticsearch.xpack.core.slm.history.SnapshotLifecycleTemplateRegistry@2a70f3a8]] took [0ms], [notifying listener [org.elasticsearch.xpack.slm.SnapshotLifecycleService@1323e4cf]] took [0ms], [notifying listener [org.elasticsearch.xpack.ccr.action.ShardFollowTaskCleaner@2943b5c9]] took [0ms], [notifying listener [org.elasticsearch.xpack.transform.TransformClusterStateListener@67f6dbd1]] took [0ms], [notifying listener [org.elasticsearch.plugin.opack.OPackActionFilter$$Lambda$2347/1715343684@4a368934]] took [2ms], [notifying listener [org.elasticsearch.cluster.metadata.TemplateUpgradeService@463f1921]] took [3ms], [notifying listener [org.elasticsearch.node.ResponseCollectorService@72e601d2]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.action.TransportOpenJobAction$OpenJobPersistentTasksExecutor$$Lambda$2844/814630491@231f740c]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.action.TransportStartDataFrameAnalyticsAction$TaskExecutor$$Lambda$2848/598978484@5d9e3672]] took [0ms], [notifying listener [org.elasticsearch.persistent.PersistentTasksClusterService@63d6daf1]] took [0ms], [notifying listener [org.elasticsearch.cluster.routing.DelayedAllocationService@217cbfc8]] took [2ms], [notifying listener [org.elasticsearch.gateway.DanglingIndicesState@799b00ff]] took [0ms], [notifying listener [org.elasticsearch.persistent.PersistentTasksNodeService@551a9418]] took [0ms], [notifying listener [org.elasticsearch.license.LicenseService@4ecd18]] took [0ms], [notifying listener [org.elasticsearch.xpack.ccr.action.AutoFollowCoordinator@75da727a]] took [0ms], [notifying listener [org.elasticsearch.gateway.GatewayService@26ce242c]] took [0ms], [notifying listener [org.elasticsearch.cluster.service.ClusterApplierService$LocalNodeMasterListeners@5ae44db4]] took [1ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@1e801e81]] took [35ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@1fc01fc1]] took [31ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@2a002a03]] took [32ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@1a201a25]] took [32ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@6fc06fc5]] took [32ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@70b070bb]] took [30ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@72e072f4]] took [29ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@2ec02ed5]] took [28ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@77607776]] took [29ms], [notifying listener............

Steps to reproduce:
I have veirfied the duplicated update mapping tasks in my local cluster.

Create cluster with 2 nodes.
Create index with 2 primary shards, no replica.
Use three clients to send a bulk request concurrently, each bulk request only contains single write document, and they have the same new field.

Client request1:  // field1 is the new field
curl -X POST "/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
'
Client request2: 
curl -X POST "/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_id" : "2" } }
{ "field1" : "value2" }
'
Client request3: 
curl -X POST "/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
'

I could see three update mapping tasks for type of _doc on master node:

[2020-12-22T00:35:03,553][DEBUG][o.e.c.s.MasterService    ] [node-1] executing cluster state update for [put-mapping[_doc, _doc, _doc]]

Proposal

Each shard bulk operation on primary shard would submit a mapping update task if the mapping has been changed:

elasticsearch/server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

Lines 128 to 133 in 3f52894

    
           performOnPrimary(request, primary, updateHelper, threadPool::absoluteTimeInMillis, 
        
               (update, shardId, mappingListener) -> { 
        
                   assert update != null; 
        
                   assert shardId != null; 
        
                   mappingUpdatedAction.updateMappingOnMaster(shardId.getIndex(), update, mappingListener); 
        
               },

If one index on single data node has submitted update mapping task to master, we could pending incoming mapping update task that has the same mapping update source and put them in a cache queue instead of sending duplicate task to master directly. Once the submitted update mapping task has been processed, simple to set the rest incoming task's status and continue to process bulk write. With this optimization, we could group update mapping tasks to node index level instead of shard bulk operation level.

If this idea is acceptable, I would provide a PR, thanks.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-12-28T22:04:01Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2020-12-29T09:18:57Z

This looks to be a duplicate of #50670 which is addressed by #51038 in 7.7.0. The fix is similar in effect to your proposal. I suggest:

upgrade to a more recent version
avoid using dynamic mapping updates as much as possible if you are heavily indexing

I'm closing this as a duplicate, but please get back to us if the problem persists even after upgrading to the latest version.

howardhuanghua · 2021-01-06T03:45:04Z

Thanks @DaveCTurner , proposal in #51038 mainly focus on data node's duplicated update mapping tasks. However, if we have 100 data nodes, and each data node has a primary shard, suppose we have 10k concurrent bulk operations need to update the same mapping, even we set indices.mapping.max_in_flight_updates to 1, that would also have 100 duplicated mapping update tasks for master execution, because we have 100 data nodes.

So probably, we need to consider a general mechanism to de-duplicate master's tasks in the future, like shard failed, update mapping, create index, and add template, they probably have lots of duplicated tasks in high concurrency scenario.

DaveCTurner · 2021-01-06T07:17:06Z

So probably, we need to consider a general mechanism to de-duplicate master's tasks in the future

This already exists, and has been in place since long before 7.5.1:

elasticsearch/server/src/main/java/org/elasticsearch/cluster/metadata/MetadataMappingService.java

Lines 361 to 365 in 6181ba3

    
           if (updated) { 
        
               return ClusterState.builder(currentState).metadata(builder).build(); 
        
           } else { 
        
               return currentState; 
        
           }

A few hundred put-mapping tasks, most of which are skipped by this code, shouldn't be presenting any problems.

howardhuanghua · 2021-01-06T11:44:00Z

Thank you. This seems only for mapping update tasks, so how about creating index, adding template? 10k concurrent requests would cause 10k times of cluster state calculation and 10k listeners call (network request).

DaveCTurner · 2021-01-06T12:18:12Z

It's pretty much the same story for creating indices and adding templates. Are you actually seeing problems with these in practice? If so, I suggest you open a separate issue with some more detail of how this is causing an impact.

howardhuanghua added >bug needs:triage Requires assignment of a team area label labels Dec 23, 2020

howardhuanghua changed the title ~~Huge duplicated update mapping tasks take down master node.~~ A large number of duplicated update mapping tasks take down master node. Dec 23, 2020

nik9000 added the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. label Dec 28, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Dec 28, 2020

DaveCTurner closed this as completed Dec 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A large number of duplicated update mapping tasks take down master node. #66768

A large number of duplicated update mapping tasks take down master node. #66768

howardhuanghua commented Dec 23, 2020 •

edited

Loading

elasticmachine commented Dec 28, 2020

DaveCTurner commented Dec 29, 2020

howardhuanghua commented Jan 6, 2021

DaveCTurner commented Jan 6, 2021

howardhuanghua commented Jan 6, 2021

DaveCTurner commented Jan 6, 2021

A large number of duplicated update mapping tasks take down master node. #66768

A large number of duplicated update mapping tasks take down master node. #66768

Comments

howardhuanghua commented Dec 23, 2020 • edited Loading

Issue

Proposal

elasticmachine commented Dec 28, 2020

DaveCTurner commented Dec 29, 2020

howardhuanghua commented Jan 6, 2021

DaveCTurner commented Jan 6, 2021

howardhuanghua commented Jan 6, 2021

DaveCTurner commented Jan 6, 2021

howardhuanghua commented Dec 23, 2020 •

edited

Loading