Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CryptoStream performs 1-byte reads, leading to performance bottleneck #52651

Closed
LordMike opened this issue May 12, 2021 · 9 comments
Closed

CryptoStream performs 1-byte reads, leading to performance bottleneck #52651

LordMike opened this issue May 12, 2021 · 9 comments
Labels
area-System.Security tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner

Comments

@LordMike
Copy link
Contributor

Description

I was using a CryptoStream with a SHA1CryptoServiceProvider to hash a stream as it was being read. Through troubleshooting of some slow reads, I noticed that CryptoStream was reading 1 byte at a time, which results in some immensely slow reads.

The below example code simulates a Stream that provides less than the requested datasize, which is common for many streams, such as network streams and whatnot. The console output shows the following:

Wanted 131072, read 20000
Wanted 1, read 1
Wanted 1, read 1
...

From this I gather that CryptoStream will read in chunks of 128K, but in the event that this buffer isn't filled, it will fill the remainder with 1 byte at a time.

IMO, CryptoStream should never read as little as 1 byte at a time, if more is needed.

Workarounds:

  • Create a custom hashing stream that hashes data as it is being read
  • Optimize MyStream such that each read isn't "slow"
  • Pack MyStream in a BufferingStream

Example

    class Program
    {
        static void Main(string[] args)
        {
            byte[] data = new byte[1 * 1024 * 1024];

            using (Stream myStream = new MyStream(data))
            using (SHA1CryptoServiceProvider hasher = new SHA1CryptoServiceProvider())
            using (Stream stream = new CryptoStream(myStream, hasher, CryptoStreamMode.Read))
            using (Stream target = Stream.Null)
            {
                // Do some read, e.g. network writing
                stream.CopyTo(target);

                Console.WriteLine("Completed");
                Console.WriteLine(BitConverter.ToString(hasher.Hash));
            }
        }
    }

    /// <summary>
    /// This represents some stream I've implemented. I provide data as a stream to use it in other areas of the code.
    /// The important part is that Read() will not be able to provide the full requested data each time
    /// </summary>
    class MyStream : Stream
    {
        private int _position;
        private readonly byte[] _backingData;

        public MyStream(byte[] backingData)
        {
            _backingData = backingData;
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            var toRead = Math.Min(count, _backingData.Length - _position);

            // Simulate less data available from source
            toRead = Math.Min(20000, toRead);

            Console.WriteLine("Wanted " + count + ", read " + toRead);

            Array.Copy(_backingData, _position, buffer, offset, toRead);
            _position += toRead;

            return toRead;
        }

        public override bool CanRead => true;
        public override long Length => _backingData.Length;
        public override long Position
        {
            get => _position;
            set => throw new NotImplementedException();
        }

        #region NotImplemented
        public override void Flush()
        {
            throw new NotImplementedException();
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotImplementedException();
        }

        public override void SetLength(long value)
        {
            throw new NotImplementedException();
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            throw new NotImplementedException();
        }

        public override bool CanSeek { get; }
        public override bool CanWrite { get; }
        #endregion
    }
@LordMike LordMike added the tenet-performance Performance related issue label May 12, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label May 12, 2021
@ghost
Copy link

ghost commented May 12, 2021

Tagging subscribers to this area: @bartonjs, @vcsjones, @krwq, @GrabYourPitchforks
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I was using a CryptoStream with a SHA1CryptoServiceProvider to hash a stream as it was being read. Through troubleshooting of some slow reads, I noticed that CryptoStream was reading 1 byte at a time, which results in some immensely slow reads.

The below example code simulates a Stream that provides less than the requested datasize, which is common for many streams, such as network streams and whatnot. The console output shows the following:

Wanted 131072, read 20000
Wanted 1, read 1
Wanted 1, read 1
...

From this I gather that CryptoStream will read in chunks of 128K, but in the event that this buffer isn't filled, it will fill the remainder with 1 byte at a time.

IMO, CryptoStream should never read as little as 1 byte at a time, if more is needed.

Workarounds:

  • Create a custom hashing stream that hashes data as it is being read
  • Optimize MyStream such that each read isn't "slow"
  • Pack MyStream in a BufferingStream

Example

    class Program
    {
        static void Main(string[] args)
        {
            byte[] data = new byte[1 * 1024 * 1024];

            using (Stream myStream = new MyStream(data))
            using (SHA1CryptoServiceProvider hasher = new SHA1CryptoServiceProvider())
            using (Stream stream = new CryptoStream(myStream, hasher, CryptoStreamMode.Read))
            using (Stream target = Stream.Null)
            {
                // Do some read, e.g. network writing
                stream.CopyTo(target);

                Console.WriteLine("Completed");
                Console.WriteLine(BitConverter.ToString(hasher.Hash));
            }
        }
    }

    /// <summary>
    /// This represents some stream I've implemented. I provide data as a stream to use it in other areas of the code.
    /// The important part is that Read() will not be able to provide the full requested data each time
    /// </summary>
    class MyStream : Stream
    {
        private int _position;
        private readonly byte[] _backingData;

        public MyStream(byte[] backingData)
        {
            _backingData = backingData;
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            var toRead = Math.Min(count, _backingData.Length - _position);

            // Simulate less data available from source
            toRead = Math.Min(20000, toRead);

            Console.WriteLine("Wanted " + count + ", read " + toRead);

            Array.Copy(_backingData, _position, buffer, offset, toRead);
            _position += toRead;

            return toRead;
        }

        public override bool CanRead => true;
        public override long Length => _backingData.Length;
        public override long Position
        {
            get => _position;
            set => throw new NotImplementedException();
        }

        #region NotImplemented
        public override void Flush()
        {
            throw new NotImplementedException();
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotImplementedException();
        }

        public override void SetLength(long value)
        {
            throw new NotImplementedException();
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            throw new NotImplementedException();
        }

        public override bool CanSeek { get; }
        public override bool CanWrite { get; }
        #endregion
    }
Author: LordMike
Assignees: -
Labels:

area-System.Security, tenet-performance, untriaged

Milestone: -

@vcsjones
Copy link
Member

vcsjones commented May 12, 2021

The CryptoStream perf is noted here: #45080. Since InputBlockSize and OutputBlockSize are a single byte, that is why that behavior is observed.

A much better way to hash a stream if possible would be to do something like:

using System.IO;
using System.Security.Cryptography;

using SHA256 sha256 = SHA256.Create();
using Stream stream = Stream.Null; // Your stream you want to hash
byte[] hash = sha256.ComputeHash(stream);

For .NET 5, there ComputeHashAsync for asynchronously reading from the stream.

@vcsjones
Copy link
Member

Ah, if you need to do something with the stream while hashing it, like copy it to another destination, I suppose you could also read from the stream as well. Perhaps something like this:

using System;
using System.Buffers;
using System.IO;
using System.Security.Cryptography;

using IncrementalHash ih = IncrementalHash.CreateHash(HashAlgorithmName.SHA256);
using Stream source = Stream.Null; // Some stream
using Stream destination = Stream.Null; // Some stream

byte[] buffer = ArrayPool<byte>.Shared.Rent(4096);
int bytesRead;

while ((bytesRead = source.Read(buffer)) > 0)
{
    ReadOnlySpan<byte> read = buffer.AsSpan(0, bytesRead);
    ih.AppendData(read);
    destination.Write(read);
}

ArrayPool<byte>.Shared.Return(buffer, clearArray: true);
byte[] hash = ih.GetHashAndReset();

@LordMike
Copy link
Contributor Author

My workaround involves a HashingStream that wraps and delegates reads to an underlying stream. When reading is done, the injected hasher can be used to find the hash of the read data.

I see that #45080 completely covers this issue, so I'll just close it. Thanks for finding it - I think I didn't search for "perf" :/

@stephentoub
Copy link
Member

stephentoub commented Jun 11, 2021

Just FYI, with #53644, the earlier repro now outputs:

Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 20000
Wanted 81920, read 8576
Wanted 81920, read 0
Wanted 1, read 0
Completed
3B-71-F4-3F-F3-0F-4B-15-B5-CD-85-DD-9E-95-EB-C7-E8-4E-B5-A3

(It's much better, though the last read shouldn't be necessary... I'll see if I can easily fix that before merging.)

@LordMike
Copy link
Contributor Author

Uhm, you probably cannot fix that last read. It's my understanding that Streams' Read() will give you 1..count bytes back. Receiving 0 bytes indicates an end-of-stream.

@LordMike
Copy link
Contributor Author

But yes - way better.

@stephentoub
Copy link
Member

you probably cannot fix that last read

I already did. To be clear, the issue is this:

Wanted 81920, read 0
Wanted 1, read 0

That last line isn't necessary once the previous read had returned 0.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Security tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

3 participants