Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZipArchive in Update mode loads whole ZIP file into memory #94455

Closed
ulrichb opened this issue Nov 7, 2023 · 3 comments
Closed

ZipArchive in Update mode loads whole ZIP file into memory #94455

ulrichb opened this issue Nov 7, 2023 · 3 comments

Comments

@ulrichb
Copy link

ulrichb commented Nov 7, 2023

I want to bring into discussion that ZipArchive holds the entire archive (all entry metadata and content) in memory when opened in ZipArchiveMode.Update mode (as it's documented).

Just for recap a small repro sample with a ZIP containing one 1 GB entry requiring 1 GB of private managed memory during ZipArchive.Dispose():

ZipArchiveMemoryUsageTest.cs
using System.IO.Compression;

const string zipArchiveToTest = "Large.zip";

CreateTestZipFile(zipArchiveToTest, size: 1024 * 1024 * 1024);

using var stream = File.Open(zipArchiveToTest, FileMode.Open);

using (new ZipArchive(stream, ZipArchiveMode.Update))
{
    DumpTotalMem("before ZipArchive dispose");
}

DumpTotalMem("after ZipArchive dispose");
// => increased by 1 GB or OOM with `DOTNET_GCHeapHardLimit=32000000` !

// (... when doing a GC run the 1GB is freed again)

static void CreateTestZipFile(string filePath, long size)
{
    using var stream = new FileStream(filePath, FileMode.OpenOrCreate);
    using var zipArchive = new ZipArchive(stream, ZipArchiveMode.Create);

    using var entryStream = zipArchive.CreateEntry("zeros.bin", CompressionLevel.NoCompression).Open();

    var bufferToWrite = new byte[4096];
    for (var i = 0; i < size / bufferToWrite.Length; i++)
        entryStream.Write(bufferToWrite, 0, bufferToWrite.Length);

    Console.WriteLine($"Created {size / 1024.0 / 1024} MB '{filePath}'.");
}

static void DumpTotalMem(String text) =>
    Console.WriteLine($"Total managed mem {text}: {GC.GetTotalMemory(false) / 1024.0 / 1024:F1} MB");

We're aware that this probably needs a rewrite of ZipArchive resp. ZipArchiveEntry, but at least when the underlying stream is seekable in our opinion the current behavior (i.e. the memory requirement) should be considered as a bug. For reasoning consider the use case we have (just as an example):

a) We're dealing with potentially large user-provided ZIP files (in the GB range),
b) we need to update entries (actually not directly but via System.IO.Packaging.ZipPackage for OPC file processing), and
c) the whole can happen in parallel.

This means with the current ZipArchive implementation we need to reserve dozens of GBs of virtual memory just for System.IO.Packaging.ZipPackage processing, otherwise we're risking container memory limit violations and therefore OOM exceptions.

Related tickets:

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 7, 2023
@ghost
Copy link

ghost commented Nov 7, 2023

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

I want to bring into discussion the property of ZipArchive when in ZipArchiveMode.Update holding the entire archive (incl. entry data) in memory (as it's documented).

Just for recap a small repro sample with a ZIP containing one 1 GB entry requiring 1 GB of private managed memory during ZipArchive.Dispose():

ZipArchiveMemoryUsageTest.cs
using System.IO.Compression;

const string zipArchiveToTest = "Large.zip";

CreateTestZipFile(zipArchiveToTest, size: 1024 * 1024 * 1024);

using var stream = File.Open(zipArchiveToTest, FileMode.Open);

using (new ZipArchive(stream, ZipArchiveMode.Update))
{
    DumpTotalMem("before ZipArchive dispose");
}

DumpTotalMem("after ZipArchive dispose");
// => increased by 1 GB or OOM with `DOTNET_GCHeapHardLimit=32000000` !

static void CreateTestZipFile(string filePath, long size)
{
    using var stream = new FileStream(filePath, FileMode.OpenOrCreate);
    using var zipArchive = new ZipArchive(stream, ZipArchiveMode.Create);

    using var entryStream = zipArchive.CreateEntry("zeros.bin", CompressionLevel.NoCompression).Open();

    var bufferToWrite = new byte[4096];
    for (var i = 0; i < size / bufferToWrite.Length; i++)
        entryStream.Write(bufferToWrite, 0, bufferToWrite.Length);

    Console.WriteLine($"Created {size / 1024.0 / 1024} MB '{filePath}'.");
}

static void DumpTotalMem(String text) =>
    Console.WriteLine($"Total managed mem {text}: {GC.GetTotalMemory(false) / 1024.0 / 1024:F1} MB");

We're aware that this probably needs a rewrite of ZipArchive resp. ZipArchiveEntry, but at least when the underlying stream is seekable IMO the current behavior (memory requirement) should be considered as a bug. For reasoning consider the use case we have (just as an example):

a) We're dealing with potentially large user-provided ZIP files (in the GB range),
b) we need to update entries (actually not directly but via System.IO.Packaging.ZipPackage), and
c) the whole can happen in parallel.

This means with the current ZipArchive implementation we need to reserve dozens of GBs of virtual memory just for System.IO.Packaging.ZipPackage processing, otherwise we're risking container memory limit violations and therefore OOM exceptions.

Related tickets:

Author: ulrichb
Assignees: -
Labels:

area-System.IO.Compression

Milestone: -

@ulrichb ulrichb changed the title ZipArchive in Update loads whole ZIP file into memory ZipArchive in Update mode loads whole ZIP file into memory Nov 7, 2023
@ulrichb
Copy link
Author

ulrichb commented Nov 7, 2023

Oh, just found #1543

@carlossanlop
Copy link
Member

Duplicate of #1543

@carlossanlop carlossanlop marked this as a duplicate of #1543 Dec 6, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Dec 6, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Jan 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants