Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A large number of duplicated update mapping tasks take down master node. #66768

Closed
howardhuanghua opened this issue Dec 23, 2020 · 6 comments
Closed
Labels
>bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. needs:triage Requires assignment of a team area label Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@howardhuanghua
Copy link
Contributor

howardhuanghua commented Dec 23, 2020

Issue

Elasticsearch version (bin/elasticsearch --version): 7.5.1

Plugins installed: []

JVM version (java -version): 1.8.0.181

OS version (uname -a if on a Unix-like system): CentOS Linux 3.1.0

ES supports auto create index / update mapping during handling bulk request. We have created one index with 100 primary shards, and cluster has high concurrency bulk requests(20k+/s) to the target index at the same time, and they all have added same new fields (would concurrently trigger mapping update). Then huge duplicated update mapping tasks would make master hang for long time (cpu almost 100% consumed). Then other nodes disconnected from current master.

Related logs, you could see that lot's of duplicated _doc types of mapping update task has been calculated. And also they need to call lot's of listeners.

[2020-12-18T09:52:42,632][WARN ][o.e.c.s.MasterService    ] [1608186356000606732] took [20.2s], which is over [10s], to compute cluster state update for [put-mapping[_doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc...............


[2020-12-18T10:19:08,583][WARN ][o.e.c.s.ClusterApplierService] [1608186356000606732] cluster state applier task [Publication{term=99, version=706636}] took [9.6m] which is above the warn threshold of [30s]: [running task [Publication{term=99, version=706636}]] took [0ms], [connecting to new nodes] took [0ms], [applying settings] took [0ms], [running applier [org.elasticsearch.indices.cluster.IndicesClusterStateService@6a7b290d]] took [1ms], [running applier [org.elasticsearch.script.ScriptService@5030671a]] took [0ms], [running applier [org.elasticsearch.xpack.ilm.IndexLifecycleService@3f48459b]] took [10ms], [running applier [org.elasticsearch.repositories.RepositoriesService@6d1eb292]] took [0ms], [running applier [org.elasticsearch.snapshots.RestoreService@5e9ecd7d]] took [0ms], [running applier [org.elasticsearch.ingest.IngestService@d456903]] took [0ms], [running applier [org.elasticsearch.action.ingest.IngestActionForwarder@70dc91ad]] took [0ms], [running applier [org.elasticsearch.action.admin.cluster.repositories.cleanup.TransportCleanupRepositoryAction$$Lambda$3326/1448648448@68ce2706]] took [0ms], [running applier [org.elasticsearch.tasks.TaskManager@4013fa47]] took [0ms], [running applier [org.elasticsearch.snapshots.SnapshotsService@66388e37]] took [0ms], [notifying listener [org.elasticsearch.cluster.InternalClusterInfoService@348ccb0d]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.support.SecurityIndexManager@1667a29d]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.support.SecurityIndexManager@dd482ff]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.authc.TokenService$$Lambda$2005/551229951@70943baa]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$$Lambda$2096/960510275@22924216]] took [0ms], [notifying listener [org.elasticsearch.xpack.watcher.support.WatcherIndexTemplateRegistry@9ca6f93]] took [2ms], [notifying listener [org.elasticsearch.xpack.watcher.WatcherLifeCycleService@2020c72f]] took [0ms], [notifying listener [org.elasticsearch.xpack.watcher.WatcherIndexingListener@71b8f1ea]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectProcessManager@410a8f39]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.datafeed.DatafeedManager$TaskRunner@5eece0c1]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.MlAssignmentNotifier@30d1516b]] took [3ms], [notifying listener [org.elasticsearch.xpack.ml.MlInitializationService@3a7e90a0]] took [2ms], [notifying listener [org.elasticsearch.xpack.ilm.IndexLifecycleService@3f48459b]] took [0ms], [notifying listener [org.elasticsearch.xpack.core.slm.history.SnapshotLifecycleTemplateRegistry@2a70f3a8]] took [0ms], [notifying listener [org.elasticsearch.xpack.slm.SnapshotLifecycleService@1323e4cf]] took [0ms], [notifying listener [org.elasticsearch.xpack.ccr.action.ShardFollowTaskCleaner@2943b5c9]] took [0ms], [notifying listener [org.elasticsearch.xpack.transform.TransformClusterStateListener@67f6dbd1]] took [0ms], [notifying listener [org.elasticsearch.plugin.opack.OPackActionFilter$$Lambda$2347/1715343684@4a368934]] took [2ms], [notifying listener [org.elasticsearch.cluster.metadata.TemplateUpgradeService@463f1921]] took [3ms], [notifying listener [org.elasticsearch.node.ResponseCollectorService@72e601d2]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.action.TransportOpenJobAction$OpenJobPersistentTasksExecutor$$Lambda$2844/814630491@231f740c]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.action.TransportStartDataFrameAnalyticsAction$TaskExecutor$$Lambda$2848/598978484@5d9e3672]] took [0ms], [notifying listener [org.elasticsearch.persistent.PersistentTasksClusterService@63d6daf1]] took [0ms], [notifying listener [org.elasticsearch.cluster.routing.DelayedAllocationService@217cbfc8]] took [2ms], [notifying listener [org.elasticsearch.gateway.DanglingIndicesState@799b00ff]] took [0ms], [notifying listener [org.elasticsearch.persistent.PersistentTasksNodeService@551a9418]] took [0ms], [notifying listener [org.elasticsearch.license.LicenseService@4ecd18]] took [0ms], [notifying listener [org.elasticsearch.xpack.ccr.action.AutoFollowCoordinator@75da727a]] took [0ms], [notifying listener [org.elasticsearch.gateway.GatewayService@26ce242c]] took [0ms], [notifying listener [org.elasticsearch.cluster.service.ClusterApplierService$LocalNodeMasterListeners@5ae44db4]] took [1ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@1e801e81]] took [35ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@1fc01fc1]] took [31ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@2a002a03]] took [32ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@1a201a25]] took [32ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@6fc06fc5]] took [32ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@70b070bb]] took [30ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@72e072f4]] took [29ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@2ec02ed5]] took [28ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@77607776]] took [29ms], [notifying listener............


Steps to reproduce:
I have veirfied the duplicated update mapping tasks in my local cluster.

  1. Create cluster with 2 nodes.
  2. Create index with 2 primary shards, no replica.
  3. Use three clients to send a bulk request concurrently, each bulk request only contains single write document, and they have the same new field.
Client request1:  // field1 is the new field
curl -X POST "/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
'
Client request2: 
curl -X POST "/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_id" : "2" } }
{ "field1" : "value2" }
'
Client request3: 
curl -X POST "/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
'

I could see three update mapping tasks for type of _doc on master node:

[2020-12-22T00:35:03,553][DEBUG][o.e.c.s.MasterService    ] [node-1] executing cluster state update for [put-mapping[_doc, _doc, _doc]]

Proposal

Each shard bulk operation on primary shard would submit a mapping update task if the mapping has been changed:

performOnPrimary(request, primary, updateHelper, threadPool::absoluteTimeInMillis,
(update, shardId, mappingListener) -> {
assert update != null;
assert shardId != null;
mappingUpdatedAction.updateMappingOnMaster(shardId.getIndex(), update, mappingListener);
},

If one index on single data node has submitted update mapping task to master, we could pending incoming mapping update task that has the same mapping update source and put them in a cache queue instead of sending duplicate task to master directly. Once the submitted update mapping task has been processed, simple to set the rest incoming task's status and continue to process bulk write. With this optimization, we could group update mapping tasks to node index level instead of shard bulk operation level.

If this idea is acceptable, I would provide a PR, thanks.

@howardhuanghua howardhuanghua added >bug needs:triage Requires assignment of a team area label labels Dec 23, 2020
@howardhuanghua howardhuanghua changed the title Huge duplicated update mapping tasks take down master node. A large number of duplicated update mapping tasks take down master node. Dec 23, 2020
@nik9000 nik9000 added the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. label Dec 28, 2020
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Dec 28, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

This looks to be a duplicate of #50670 which is addressed by #51038 in 7.7.0. The fix is similar in effect to your proposal. I suggest:

  • upgrade to a more recent version
  • avoid using dynamic mapping updates as much as possible if you are heavily indexing

I'm closing this as a duplicate, but please get back to us if the problem persists even after upgrading to the latest version.

@howardhuanghua
Copy link
Contributor Author

Thanks @DaveCTurner , proposal in #51038 mainly focus on data node's duplicated update mapping tasks. However, if we have 100 data nodes, and each data node has a primary shard, suppose we have 10k concurrent bulk operations need to update the same mapping, even we set indices.mapping.max_in_flight_updates to 1, that would also have 100 duplicated mapping update tasks for master execution, because we have 100 data nodes.

So probably, we need to consider a general mechanism to de-duplicate master's tasks in the future, like shard failed, update mapping, create index, and add template, they probably have lots of duplicated tasks in high concurrency scenario.

@DaveCTurner
Copy link
Contributor

So probably, we need to consider a general mechanism to de-duplicate master's tasks in the future

This already exists, and has been in place since long before 7.5.1:

if (updated) {
return ClusterState.builder(currentState).metadata(builder).build();
} else {
return currentState;
}

A few hundred put-mapping tasks, most of which are skipped by this code, shouldn't be presenting any problems.

@howardhuanghua
Copy link
Contributor Author

Thank you. This seems only for mapping update tasks, so how about creating index, adding template? 10k concurrent requests would cause 10k times of cluster state calculation and 10k listeners call (network request).

@DaveCTurner
Copy link
Contributor

It's pretty much the same story for creating indices and adding templates. Are you actually seeing problems with these in practice? If so, I suggest you open a separate issue with some more detail of how this is causing an impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. needs:triage Requires assignment of a team area label Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

4 participants