-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for adding new entries to large zip archives #49149
Comments
@madelson I think what you're facing is what I described in this other issue, which is a problem often reported: #35815 The suggested workaround is to change Your suggestions make sense, even though there's a workaround. Another thing we should consider doing is improving the documentation. |
@carlossanlop are you saying that EDIT: this does not work. See my comment below. |
I could have sworn that the Create mode is for creating or overwriting existing archives if they already exist. |
@carlossanlop, why is the |
I don't think this would have solved it for us. We were dealing with multiple huge files on a server and aside from the array length issue, (iirc) that the server memory usage was spiking. This would just have allowed it to spike even higher. |
I think this example should leverage `ZipArchiveMode.Create` because it is only appending/writing to entries. `Create` is more efficient in that scenario since it does not have to buffer the entire archive in memory. Furthermore, it is confusing that `ZipArchiveMode.Create` supports appending (unlike `File.Create` / `FileMode.Create`), so calling this nuance out in the first code example seems prudent. See dotnet/runtime#49149
For context, see dotnet#49149. The aim is to reduce confusion around the fact that Create mode supports append (unlike `FileMode.Create`).
@carlossanlop I went to update the docs based on your comments but I can't reproduce the behavior you describe: var path = "...";
using (var archive = new ZipArchive(File.Create(path), ZipArchiveMode.Create))
{
WriteEntry(archive, "a");
WriteEntry(archive, "b");
}
PrintEntries(path);
using (var archive = new ZipArchive(File.Open(path, FileMode.Open), ZipArchiveMode.Create))
{
WriteEntry(archive, "c");
WriteEntry(archive, "d");
}
PrintEntries(path);
using (var archive = new ZipArchive(File.Open(path, FileMode.Open), ZipArchiveMode.Create))
{
WriteEntry(archive, "e");
}
PrintEntries(path);
void WriteEntry(ZipArchive archive, string name)
{
using (var writer = new StreamWriter(archive.CreateEntry(name).Open()))
{
writer.Write(name);
}
}
void PrintEntries(string path)
{
Console.WriteLine($"Entries for {path}:");
using var archive = new ZipArchive(File.OpenRead(path), ZipArchiveMode.Read);
foreach (var entry in archive.Entries)
{
using var reader = new StreamReader(entry.Open());
Console.WriteLine($"* {entry.FullName}: {reader.ReadToEnd()}");
}
} Output:
So I don't think |
I doubt it was fixed, Also I thought it would overwrite it as well and I was right. 😅 It could have been that the one who thought that create could be used was wrong, or you might have to use it only with manually creating But I think a better option would be to consider replacing existing archive code with 7zip's code in System.IO.Compression.Native and then just have the |
@AraHaan do you have a code snippet? Not sure I follow. Do you think this will avoid the "buffer everything in memory" issue? |
I do not think it will bypass the memory buffer, but I think 7zip's native library for it could. |
Background and Motivation
We use System.IO.Compression.ZipArchive to manage the creation of large .zip files in a streaming fashion. When creating new archives, this works quite well (with
ZipArchiveMode.Create
, it writes through to the underlying stream so long as you only write to one entry at a time).However, when we want to append to an existing archive, we have to use
ZipArchiveMode.Update
. According to the doc comments, with this mode the contents of the entire archive must be held in memory! This caused our system to crash due to array length restrictions when working with a particularly large file.The zip format is designed to support efficient appending of files, so I believe it should be possible for .NET's implementation to support this use-case.
Proposed API
This could be addressed using a new
ZipArchiveMode
enum value, perhaps namedZipArchiveMode.Append
to matchFileMode.Append
. This would be similar toCreate
but would allow for an existing file to be used.Usage Examples
Alternative Designs
Another approach would be to change the behavior of
Update
such that it would only bring things into memory as needed (e. g. if you change the contents of an existing entry or keep multiple entries open for writing at the same time).This second approach would have the benefit of improving the performance of all existing programs which use
Update
mode to append to existing zips, which seems likely to be a common use-case forUpdate
.Similarly, it seems that this could also enable
Update
to share the streaming benefits ofRead
mode in many cases.The downside would be that code could silently go from performant to non-performant if the usage limitations were violated, although that is already the case with
Read
when the underlying stream is not seekable andCreate
when writing to multiple entries at once.Another potential downside is that today presumably
Update
mode does all writes at the end of the operation, potentially allowing other readers to use the zip until then. This change would alter that behavior.Risks
With the design approach of optimizing
Update
for specific scenarios, the design might entail update switching from a write-through approach to an in-memory approach partway through an operation. This might add overhead to someone who is actually leveraging the ability modify existing entries.The text was updated successfully, but these errors were encountered: