Dynamic mapping updates are unboundedly parallel #50670

DaveCTurner · 2020-01-06T16:49:14Z

Before 7.2.0 dynamic mapping updates would block a write thread waiting for the master to acknowledge the new mapping. In #39793 we moved to an asynchronous model, freeing up the write thread to carry on with other indexing tasks.

One feature of the pre-7.2.0 blocking approach was that the number of write threads is limited and this limits the number of parallel dynamic mapping updates pending on the master. The asynchronous model has no such limit and may send a very large number of dynamic mapping updates in a short time since the shard bulks are processed much faster. Furthermore, many indexing operations may require the same mapping update but since they are now generated much more quickly it may be that many of the mapping updates sent to the master are duplicates.

Related discussion thread.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-01-06T16:49:19Z

Pinging @elastic/es-distributed (:Distributed/CRUD)

DaveCTurner · 2020-01-06T16:51:36Z

We discussed a couple of options in Slack:

impose an explicit limit on the number of in-flight dynamic mapping updates on each data node, throttling indexing when that limit is reached.
detect the case where the in-flight mapping updates are already sufficient and, if so, wait locally for those updates to complete instead of sending duplicates to the master.
apply backpressure at the network level (on the master) by stopping reading from the wire before reaching breaking point

SpencerLN · 2020-01-09T04:28:31Z

I believe we are running into a very similar issue after an upgrade from 6.8.0 > 7.5.1. In 6.8.0 we would see a couple of minutes where the cluster would apply the new mappings and then recover, while in 7.5.1 we see the number of pending mapping changes reach 30,000+. This quickly results in the master node becoming unresponsive and data nodes repeatedly leaving/joining the cluster due to being unable to contact the master node in a timely manner.

Since our upgrade, each night at 12 AM (new date based indexes are created at this time) we have had to restart all of our master nodes simultaneously to bring the cluster back to a healthy state.

Is there any timeline on a potential fix for this issue being made available, or a recommended workaround?

DaveCTurner · 2020-01-09T09:18:40Z

@SpencerLN that sounds like this issue indeed.

As a general rule, dynamic mappings should be used sparingly in production since they cause indexing to bottleneck on the master. It's much more efficient to use an index template to set up most of the mappings when the index is created, and this is particularly important at the kind of scale that would result in tens of thousands of pending tasks. Dynamic mappings are more appropriate for handling an occasional unexpected field.

ywelsch · 2020-01-15T11:12:20Z

We discussed various options to solving this issue in the distributed sync (and combinations thereof):

reintroduce blocking the writer thread on the data node once it has a certain number of dynamic mapping updates in-flight.
deduplicate the mapping updates that are sent to the master, assuming that a large number of these updates would be similar.
track the number of semi-processed, but uncompleted requests and start rejecting new requests once uncompleted requests reach a certain bound.
bound the number of in-flight mapping updates on the master node, and reject any new requests coming in. Combine this with a retry/backoff mechanism on the data node side.

To get a fix out quickly, I will look at reintroducing the blocking behavior in a first step.

Ensures that there are not too many concurrent dynamic mapping updates going out from the data nodes to the master. Closes #50670

Ensures that there are not too many concurrent dynamic mapping updates going out from the data nodes to the master. Closes elastic#50670

farin99 · 2020-04-27T13:59:22Z

@ywelsch seems like the code isn't part of any release (not 7.6 or above)
https://github.com/elastic/elasticsearch/blob/v7.6.2/server/src/main/java/org/elasticsearch/cluster/action/index/MappingUpdatedAction.java

ywelsch · 2020-04-27T14:10:45Z

Indeed, looks like I missed the backport to the 7.6 branch, and it also missed the 7.5.2 release (it's on the 7.5 branch, but just after that release), probably backported at the time where the new branches were cut. It was backported to 7.x (i.e. future 7.7.0), so will be released as part of that. I will adapt the labels on the PR.

DaveCTurner added >bug discuss :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Jan 6, 2020

ywelsch added team-discuss and removed discuss labels Jan 6, 2020

ywelsch self-assigned this Jan 8, 2020

ywelsch mentioned this issue Jan 15, 2020

Block too many concurrent mapping updates #51038

Merged

ywelsch removed the team-discuss label Jan 15, 2020

ywelsch closed this as completed in #51038 Jan 15, 2020

ywelsch added a commit that referenced this issue Jan 15, 2020

Block too many concurrent mapping updates (#51038)

bfbbd73

Ensures that there are not too many concurrent dynamic mapping updates going out from the data nodes to the master. Closes #50670

ywelsch added a commit that referenced this issue Jan 15, 2020

Block too many concurrent mapping updates (#51038)

dc47b38

Ensures that there are not too many concurrent dynamic mapping updates going out from the data nodes to the master. Closes #50670

ywelsch added a commit that referenced this issue Jan 15, 2020

Block too many concurrent mapping updates (#51038)

7b98184

Ensures that there are not too many concurrent dynamic mapping updates going out from the data nodes to the master. Closes #50670

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

DaveCTurner mentioned this issue Dec 29, 2020

A large number of duplicated update mapping tasks take down master node. #66768

Closed

rwynn mentioned this issue May 12, 2021

Monstache stalling while indexing large collection holding 700K documents rwynn/monstache#509

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic mapping updates are unboundedly parallel #50670

Dynamic mapping updates are unboundedly parallel #50670

DaveCTurner commented Jan 6, 2020 •

edited

Loading

elasticmachine commented Jan 6, 2020

DaveCTurner commented Jan 6, 2020 •

edited

Loading

SpencerLN commented Jan 9, 2020

DaveCTurner commented Jan 9, 2020

ywelsch commented Jan 15, 2020

farin99 commented Apr 27, 2020

ywelsch commented Apr 27, 2020

Dynamic mapping updates are unboundedly parallel #50670

Dynamic mapping updates are unboundedly parallel #50670

Comments

DaveCTurner commented Jan 6, 2020 • edited Loading

elasticmachine commented Jan 6, 2020

DaveCTurner commented Jan 6, 2020 • edited Loading

SpencerLN commented Jan 9, 2020

DaveCTurner commented Jan 9, 2020

ywelsch commented Jan 15, 2020

farin99 commented Apr 27, 2020

ywelsch commented Apr 27, 2020

DaveCTurner commented Jan 6, 2020 •

edited

Loading

DaveCTurner commented Jan 6, 2020 •

edited

Loading