Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle index deletions into a single DB request #310

Closed
Piedone opened this issue Dec 14, 2020 · 4 comments
Closed

Bundle index deletions into a single DB request #310

Piedone opened this issue Dec 14, 2020 · 4 comments

Comments

@Piedone
Copy link
Contributor

Piedone commented Dec 14, 2020

When one deletes a document all the index tables corresponding to the document type get a DELETE query. While these can be quite fast (since they use the primary key, many times there won't actually be a matching row, and some of the tables are small) they're issued one by one, incurring the penalty of the roundtrip to the DB (which might be a separate server reached over a network) many times. This can be a serious performance issue when you have a lot of indices (like >50).

Instead of issuing these queries one by one, they could be bundle together into a single DB request (i.e. concatenated into a single query string). That way, the network roundtrip would only happen once.

Also see this Orchard issue: OrchardCMS/OrchardCore#7949

@sebastienros
Copy link
Owner

Agreed. Introduced batching commands recently #228 but I assume I haven't done it for index deletions.

@deanmarcussen
Copy link
Collaborator

deanmarcussen commented Dec 26, 2020

Did a little testing with the profiler to see what's actually going on here.

Focused on the index deletes, just for a simple command to profile.

Batching isn't working.
All it's doing is hitting the dapper MultiExec here https://github.com/StackExchange/Dapper/blob/0561b616baf8d0742f90c44373da40fb90b24c41/Dapper/SqlMapper.Async.cs#L521

Which is then splitting them back out into multiple ExecuteAsync commands, one for each parameter in the enumerable.

So for some numbers.

For an update to two documents (which is what happens publishing on OrchardCore, with Versioning on).
Using a DummyIndexProvider which never indexes anything (so just forces a Delete everytime)

Locally the difference in adding extra indexes is marginal, but still expontential.

Extra Indexes: 1, Elapsed 00:00:00.0371806
Extra Indexes: 100, Elapsed 00:00:00.1317780
Extra Indexes: 200, Elapsed 00:00:00.2826214
Extra Indexes: 500, Elapsed 00:00:00.6213360
Extra Indexes: 1000, Elapsed 00:00:01.2628232

As soon as you put it on a remote sql server (azure), the time it takes simply to execute all these separate delete statements really starts to add up.

Extra Indexes: 1, Elapsed 00:00:00.0838498
Extra Indexes: 100, Elapsed 00:00:02.5644737
Extra Indexes: 200, Elapsed 00:00:05.0814065
Extra Indexes: 500, Elapsed 00:00:12.6558409
Extra Indexes: 1000, Elapsed 00:00:28.6622291

w/1000 indexes 28 seconds to execute all these commands.

Now by taking @Piedone suggestion of wrapping them into a single ExecuteAsync with parameters, the remote execution time drops to 1.1 seconds on average, for 1000 indexes.

So I think batching these into a more complex delete command (and applying the same thing to the other commands), something like a BatchDeleteMapIndexCommand, would have some big gains when saving items.
(how to refactor to achieve that is another story. Worth a chat sometime @sebastienros )

Now when testing this on Orchard Core, it's not just the delete's slowing the process, there are so so many .ExecuteAsync that occur in the session commands, when saving a ContentItem that there would be good opportunity to batch more.

But deletes could be a good place to start.

@Piedone
Copy link
Contributor Author

Piedone commented Dec 26, 2020

Nice findings!

As an example, in our particular app publishing the home page, a Page (the same content type as in the Blog recipe, with several widgets in it) issues 238 individual queries. Out of these, 190 are DELETEs, all of them on index tables. The queries themselves take only between 1-3 ms usually in an Azure App Service with a remote Azure SQL DB but this of course adds up (the variations seem to be solely due to fluctuations in network latency; when the actual execution time is measured by running the queries on the same DB directly then they all seem to be <1ms consistently but I only sampled).

Furthermore, there are 19 UPDATEs, 27 SELECTs, and 16 INSERTs (these add up to more than 238 because some of the SELECTs are inner queries or batched with another query) but of course, these are less of an issue and the most time is spent in DELETEs.

Just saving a draft causes fewer queries, as expected, but still doing 141 (114 DELETEs).

All this with 23 custom ContentItem indices, 48 ContentItem indices in total (i.e. including the ones provided by built-in modules), and 57 index tables altogether.

@Piedone
Copy link
Contributor Author

Piedone commented Mar 21, 2021

Shouldn't this be closed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants