Skip to content

mkolumb/EFCore.Extensions.SaveOptimizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EFCore.Extensions.SaveOptimizer

Save optimizer extension for EF Core.
It supports multiple EF Core providers and brings serious performance benefits for many scenarios without big effort.

Currently in BETA

GitHub build Contributor Covenant

Package NuGet
EFCore.Extensions.SaveOptimizer NuGet SaveOptimizer
EFCore.Extensions.SaveOptimizer.Dapper NuGet SaveOptimizer.Dapper

Why another library for batch save?

Short version

Because I needed

Long version

Idea came from one of my commercial projects.

We were working with CockroachDB (excellent database for multi-region environments) and initially used Entity Framework Core 3.1. It worked fine, but in multi-region configuration save performance was not so good.

Reason? As many people knows EF generates multiple INSERT / UPDATE / DELETE single row statements instead of lower amount of multiple row statements.

I considered few solutions (e.g. EFCore.BulkExtensions, linq2db) but all of them had some disadvantages during this time. We needed something with support for CockroachDB and concurrency tokens. Also we want to avoid business logic rewrite, more code changes than replacing SaveChanges execution could be big problem. As there were no good choice I wrote something simple from scratch. For sql statements generation I used excellent SqlKata library. Now I decided to rewrite whole solution as something more efficient and better integrated into EF.

EFCore.BulkExtensions is good choice for most cases as it has BulkSaveChanges method and currently supports few different databases (SQL Server, PostgreSQL and SQLite). Unfortunately it looks like API they used for PostgreSQL is incompatible with CockroachDB.

Main difference is approach - EFCore.BulkExtensions uses copy tools / statements under the hood, SaveOptimizer uses batched INSERT / UPDATE / DELETE statements. You need to choose which would be better in your case. Likely copy would be faster in most cases (it's good to measure this), but SaveOptimizer approach would bring more databases support. SaveOptimizer is simple library so you should verify it with your requirements.

How to use

Just replace SaveChanges() / SaveChangesAsync() :)

Optimized

using EFCore.Extensions.SaveOptimizer;

await using var transaction = await context.Database.BeginTransactionAsync(IsolationLevel.Serializable, cancellationToken);
await context.AddAsync(entity);
await context.SaveChangesOptimizedAsync();
await transaction.CommitAsync(cancellationToken);

Optimized Dapper

using EFCore.Extensions.SaveOptimizer.Dapper;

await using var transaction = await context.Database.BeginTransactionAsync(IsolationLevel.Serializable, cancellationToken);
await context.AddAsync(entity);
await context.SaveChangesDapperOptimizedAsync();
await transaction.CommitAsync(cancellationToken);

How it works

When you execute SaveChangesOptimized usually the following sequence happens:

  1. Get entries from ChangeTracker
  2. Build property changes dictionary for each entry
  3. Group changes as much as possible
  4. Generate SQL
  5. Execute
  6. Accept changes

Please note it is not working exactly as SaveChanges, so you should verify it works in your case as expected.

Features

What to do

  • Optimize data retrieval from Change Tracker
  • Add unchecked when possible
  • Low level performance optimizations
  • Contributing document
  • Validation before save
  • Build query builder tests from scratch

Limitations

Refresh data after save

SaveOptimizer approach makes almost impossible refresh data after save, it is on your side. If you will use auto increment primary key it will not retrieve this key from db. I recommend to generate values for primary keys in code, not in db. This will make much easier refresh data after save if necessary, you will be able to use this values for query. Also DatabaseValues for entry will not be retrieved from db when DBConcurrencyException is thrown.

Basically - after save you should not use this context anymore as it could be invalid, you should use new context for another operation. However if you need you can experiment with AfterSaveBehavior.

If you start transaction by your own I recommend to use AfterSaveBehavior.DoNothing in configuration and after successful commit call ProcessAfterSave method from result with parameter you want.

Relationships

This limitation is connected with the previous one. If you want to add within one transaction an entity and some dependent entities you can't use auto increment primary key. This will work only with primary key value generated in your code.

Known issues

Oracle serializable transaction

It looks like serializable transaction produces many errors during execution, especially during insert (e.g. ORA-08177 & ORA-06512). This is something to investigate, maybe this is dockerized Oracle Express issue. I don't recommend using this library with Oracle in production environment without strong testing. Sometimes decrease batch size for insert could help.

Firebird provider

This is not a SaveOptimizer issue, however I experienced some problems with Firebird provider. It looks model builder sometimes build different model than other providers.

Issue Workaround
Precision lost for decimal column Use HasColumnType("DECIMAL(PRECISION,SCALE)")
Auto increment column not created Use HasAnnotation(FbAnnotationNames.ValueGenerationStrategy, FbValueGenerationStrategy.IdentityColumn)

Q&A

  1. Why you wrote query builder instead of using SqlKata?
    • The reason behind is performance. I noticed few issues and created pull request. When I started working on this library my pull request was in review without response for few months. Initially I decided to create .NET 6 fork and performance improvements were really great. Later I realized I don't need most of SqlKata features so it's better to write something optimized from scratch - this will bring performance & freedom boost. Currently there is reference to SqlKata in test project, just for verify my builder. In the future I have plan to drop this dependency and write builder tests from scratch.
  2. Why you wrote query executor?
    • I noticed a bug with ExecuteSqlRaw from RelationalExtensions. It looks it cuts precision for decimals. So I created something lightweight using some EF Core features.
  3. What is EFCore.Extensions.SaveOptimizer.Dapper package purpose?
    • The default execution use provider described in previous question. Someone could prefer execution using Dapper. You can compare performance in your case. From my experience results usually are similar.
  4. Which EF Core version do you support?
    • I have plan to support only current release and latest LTS version. As there is only one required dependency (Microsoft.EntityFrameworkCore.Relational) you should be able to quickly prepare version for older EF if you need. Maybe some small changes in DataContextModelWrapper would be required.

Configuration

Parameter Description Provider default value
Batch size Default batch size All - 1000
Insert batch size When defined override batch size for insert operations Firebird - 500
Oracle - 50
Other - NULL
Update batch size When defined override batch size for update operations All - NULL
Delete batch size When defined override batch size for delete operations All - NULL
Parameters limit Limit parameters for statement, when exceeded batch size decreased for operation to prevent exceptions SqlServer - 1024
Firebird - 2048
SqLite - 512
Postgres - 31768
Other - 15384
Concurrency token behavior When concurrency token is defined for entity it is included in update / delete statements. When flag is set to throws exception it will throws exception when statements affected less / more rows than expected. All - throw exception
Auto transaction enabled If enabled it will start transaction when no transaction attached to DbContext All - true
After save behavior It will behavior after successful save, possible values (ClearChanges, AcceptChanges, DetachSaved, MarkTemporaryAsPermanent, DoNothing) All - DoNothing
Auto transaction isolation level Isolation level for auto transaction All - serializable
Builder configuration -> case type Case type used when building statements, if normal it will not change case to upper / lower All - normal
Builder configuration -> optimize parameters Optimize parameters usage in statements, sometimes can lead to unexpected exception in db All - true

Migration command

powershell -Command ".\add_migration.ps1 -name [NAME]"
$name = "[NAME]"

.\add_migration.ps1 -name $name

Benchmarks

Summary

SaveOptimizer is not always better than pure EF Core methods. EF Core contains tons of optimizations so for small operations and simple workloads likely is better.

My advice is to compare results in your real environment. Honestly - the best choice for pure performance is leave EF Core for write operations at all and then write statements from scratch for your scenarios. But this library could bring serious performance benefits in many scenarios without big effort.

All benchmarks uses serializable isolation level and run databases within containers. Please note I used free versions (e.g. Oracle Express), results on commercial versions could be much better.

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1826 (21H2)
12th Gen Intel Core i9-12900HK, 1 CPU, 20 logical and 14 physical cores
.NET SDK=6.0.302
  [Host]     : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT

EvaluateOverhead=True  OutlierMode=RemoveUpper
InvocationCount=1  IterationCount=20  LaunchCount=3
RunStrategy=ColdStart  UnrollFactor=1  WarmupCount=0

Running

powershell -File run_benchmarks.ps1
.\run_benchmarks.ps1

Results

Legend

  • Optimized - Using EFCore.Extensions.SaveOptimizer package
  • Optimized Dapper - Using EFCore.Extensions.SaveOptimizer.Dapper package
  • EF Core - Using pure EF Core

CockroachDB - single docker node

CockroachDB - single docker node INSERT

CockroachDB - single docker node UPDATE

CockroachDB - single docker node DELETE

CockroachDB - nine docker nodes

CockroachDB - nine docker nodes INSERT

CockroachDB - nine docker nodes UPDATE

CockroachDB - nine docker nodes DELETE

SQLite

SQLite INSERT

SQLite UPDATE

SQLite DELETE

SQL Server

SQL Server INSERT

SQL Server UPDATE

SQL Server DELETE

Oracle Express 21

Oracle Express 21 INSERT

Oracle Express 21 UPDATE

Oracle Express 21 DELETE

MySQL

MySQL INSERT

MySQL UPDATE

MySQL DELETE

MariaDB

MariaDB INSERT

MariaDB UPDATE

MariaDB DELETE

PostgreSQL

PostgreSQL INSERT

PostgreSQL UPDATE

PostgreSQL DELETE

Firebird 3

Firebird 3 INSERT

Firebird 3 UPDATE

Firebird 3 DELETE

Firebird 4

Firebird 4 INSERT

Firebird 4 UPDATE

Firebird 4 DELETE

Buy Me A Coffee

"Buy Me A Coffee"