Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create script API for reading and writing zip files #34444

Closed
wants to merge 7 commits into from

Conversation

jameswestman
Copy link
Contributor

@jameswestman jameswestman commented Dec 18, 2019

Bugsquad edit: this was salvaged and merged in #65281.
Bugsquad edit: master version of #58900.

This builds on #22554, which only supported writing zip files.

Creates two new classes, ZipReader and ZipWriter, for reading and writing zip files in GDScript using the already bundled minizip library.

Example:

var p := ZIPPacker.new()
p.open("user://test.zip")
p.start_file("hello/world.txt")
p.write_file("Hello, world!".to_utf8_buffer())
p.close()

var r := ZipReader.new()
r.open("user://test.zip")
for filename in r.get_files():
    print(filename, ": ", r.read_file(filename).get_string_from_utf8())

Note: Append modes don't work yet due to a bug in core/io/zip_io.cpp. The zipio_open function doesn't properly handle the ZLIB_FILEFUNC_MODE_EXISTING flag. I'll fix that in another PR.

Bugsquad edit: This closes godotengine/godot-proposals#1069.

@Calinou
Copy link
Member

Calinou commented Feb 17, 2020

It seems the rebase failed, as lots of unrelated commits are included.

Also, I think ZipWriter should be called ZIPPacker to be consistent with the PCKPacker class we currently have. Likewise, ZipReader should be called ZIPReader.

@jameswestman
Copy link
Contributor Author

Ah, oops, I rebased on 3.2 instead of master.

I've renamed the classes and fixed a couple build errors.

@ghost
Copy link

ghost commented Feb 18, 2020

@akien-mga Is this still a feature you'd want to include in some version of 3.2?

@Frontrider
Copy link

What is the status on this?

@Calinou
Copy link
Member

Calinou commented Apr 27, 2020

@Frontrider The master branch is currently undergoing refactoring, which means it can take a while for pull requests to be merged (especially if they introduce large features like this one).

@avril-gh
Copy link
Contributor

@marcelofg55

i tested this zip-module-3.2.1 , works nice, and its very useful,
would love to see it in 3.2 but you pushed to master instead ?

@realkotob
Copy link
Contributor

realkotob commented Apr 27, 2020

@avril-gh New features are added to master branch first and then they're cherry-picked for older branches if the changes are stable enough and don't majorly break compatibility.

@avril-gh
Copy link
Contributor

avril-gh commented Apr 27, 2020

@asheraryam

Thank's for clarifying. Didnt know about this workflow. 👍

then they're cherry-picked for older branches if the changes are stable enough and don't majorly break compatibility.

been wondering, because i already tried it and this PR does not work with 3.2 due to differences in 4.0 .

@Calinou
Copy link
Member

Calinou commented Apr 27, 2020

@avril-gh This pull request could be remade for the 3.2 branch if there's enough demand.

@jameswestman
Copy link
Contributor Author

I've backported this to 3.2 in my branch here: https://github.com/flyingpimonster/godot/tree/zip-module-3.2 Should I open a separate pull request?

Previously, the file mode would be write-only if the write flag was present,
and read-only if it wasn't. However, ADDINZIP mode requires read-write mode,
and CREATE requires the file to be truncated.

Fixed by checking all of the relevant flags individually and constructing a
correct file access mode.
@jameswestman
Copy link
Contributor Author

Rebased.

@qrrk
Copy link

qrrk commented Apr 3, 2022

Very much looking forward to this, as well as the backport to 3.x. It would untie my hands for a lot of optimizations and improvements in my project.

I know it is not likely, but if there is anything I could do to expedite this PR, please let me know.

@mojoyup
Copy link

mojoyup commented May 5, 2022

How would I access these two classes in a script? I am trying to use the filedialog to save a file as a .zip and also be able to open/edit images within the .zip for resaving using the signals if I can.

@akien-mga
Copy link
Member

akien-mga commented Jun 16, 2022

Sorry for the delay reviewing this - we finally discussed and approved the linked proposal, and this seems to be a fairly straightforward implementation for it.

Would be good if @bruvzg could review. Some changes will be needed to adapt to the new RefCounted FileAccess (see #59440). @Faless also had some comments about further work that might be needed to have file streaming for more elaborate use cases (but that's not blocking for this PR and is a general issue with Godot's file APIs).

Finally, I think that this would make sense to have in core/io instead of a module. The implementation is pretty straightforward and we already have core ZIP functionality - the only difference is that this exposes it in a user accessible manner.

The commit history could likely also be rebased as we usually don't use so much granularity in the implementation of features (we prefer a small number of commits - often 1 - that directly implements the finalized feature, not the intermediate WIP history).

@anderlli0053
Copy link

Can this be cherry-picked for Godot 3.5 or at least 3.x ?

@Calinou
Copy link
Member

Calinou commented Jun 21, 2022

Can this be cherry-picked for Godot 3.5 or at least 3.x ?

3.5 is nearing release, so I'm afraid not. There's another pull request remaking this feature for 3.x, but it'll have to wait for 3.6 to land in a stable release.

@anderlli0053
Copy link

Can this be cherry-picked for Godot 3.5 or at least 3.x ?

3.5 is nearing release, so I'm afraid not. There's another pull request remaking this feature for 3.x, but it'll have to wait for 3.6 to land in a stable release.

Thank you Hugo

}

do {
char filename[256];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure about this limit ?

filename contains not only the name of the file, but also it path within the archive (e.g. foo/bar/spam.txt)

So the limit is fine on Windows where path are limited to 250 characters (but I think to recall there is special case in Windows where this might be longer), but on Linux/Mac 255 is the max size of file name (and entire path is at least 1024 bytes).

So I think it would be better to set a bigger buffer here

Typically doing something like

Suggested change
char filename[256];
char filename[256]; // Note filename is a path !
int err = unzGetCurrentFileInfo64(uzf, NULL, filename, sizeof(filename), NULL, 0, NULL, 0);
if (err == UNZ_OK) {
s.push_back(filename);
} else {
// Assume filename buffer was too small
unz_file_info64 file_info;
int err = unzGetCurrentFileInfo64(uzf, &file_info, NULL, 0, NULL, 0, NULL, 0);
char long_filename_buff = memalloc(file_info.size_filename);
int err = unzGetCurrentFileInfo64(uzf, NULL, long_filename_buff, sizeof(filename), NULL, 0, NULL, 0);
s.push_back(long_filename_buff);
memfree(long_filename_buff);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the limit is fine on Windows where path are limited to 250 characters

By default, Windows MAX_PATH is 260 (but it's possible to disable the limit on Windows 10+). Unicode UNC (\\?\ prefixed) paths are limited to 32K UTF-16 characters.

Copy link

@ssokolow ssokolow Jul 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what limits Zip itself imposes, but:

  1. If you use \\?\-format paths (every NT-based Windows that's still in any way relevant) or enable Win32 long paths via group policy or the registry (Windows 10), then you're subject to much longer filesystem-based length limits on Windows.
  2. Various POSIX-specified libc functions are limited to PATH_MAX (limits.h) by a buffer allocation, which is typically 4096, but you can work around that.

Here are the limits I'm aware of:

  • FAT32: 32,760 "Unicode characters" for paths, 255 "Unicode characters" for filenames
  • exFAT: 32,760 "Unicode characters" for paths, 255 "Unicode characters" for filenames
  • NTFS: 32,768 "Unicode characters" for paths, 255 "Unicode characters" for filenames
  • ReFS: 32,768 "Unicode characters" for paths, 255 "Unicode characters" for filenames
  • UDF: 1023 bytes for paths, 255 bytes for filenames
  • ext2/3/4: No limit for paths (As I demonstrate with some Python here), 255 bytes for filenames
  • HFS+: No limit for paths as far as I was able to google up, 255 bytes for filenames

As far as I'm aware, "Unicode characters" means 16-bit code units.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZIP format uses a 16-bit value to store filename length, so it's probably limited to 64K (UTF-8). Not sure if there are extra limitations, but it's probably better to always use unz_file_info64 and allocate space for filename on demand.

Copy link

@ssokolow ssokolow Jul 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also...

So the limit is fine on Windows where path are limited to 250 characters

Actually, if I remember correctly, the limit is 260 characters and they chose that limit based on A:\<some buffer that allocates for 256 chars><NUL>.

ZIP format uses a 16-bit value to store filename length, so it's probably limited to 64K (UTF-8).

Yeah. No clue about the upper limit, but there are plenty of google results with people complaining about this or that unzipping tool erroring out because the Zip file contains paths longer than 260 and the tool isn't using \\?\.

(Apparently long titles are common in science and lawyers like to pack as much of the beginning of the document's contents into the filename as possible as a hack for searchability.)

@touilleMan
Copy link
Member

@akien-mga do you think this PR will be able to make it into Godot 4.0 ?

The asset store currently doesn't play well with Godot-Python (the assets must be stored as a git repo and there is no possibility to provide different asset per platform, so for Godot 3 I have to put ~200mo of data on a git repo for each release... )

Long story short, I wrote a Godot-Python installer addon in GDScript that download just the right data. This works great except I need a way bundle data...

  • I tried using the tar command present on the system but it's not reliable (not available on all versions of Windows)
  • I currently use a pure GDScript implementation of zip, but it is not 100% reliable

So in the end having this PR merged for Godot 4.0 would help me a lot 😃 (of course Godot 4.0 feature freeze might already be there, so I would totally understand if this train as already left ^^)

@fire
Copy link
Member

fire commented Aug 27, 2022

@RevoluPowered You mentioned you wanted to read and write zips.

@MisterMX
Copy link
Contributor

I created a squashed version of this PR's branch and rebased it on the lastest master: https://github.com/MisterMX/godot/tree/zip-module

Not sure, if I should create a second PR or if @jameswestman wants to copy the changes.

@mhilbrunner
Copy link
Member

mhilbrunner commented Aug 27, 2022

@MisterMX You might want to integrate the file path size changes suggested by touilleMan above if you haven't already, and then open a new PR from your repo branch and reference this PR in the description.

For the actual commit, you can still give credit to the original author by using Co-authored-by.

Thanks for salvaging this!

@Anutrix
Copy link
Contributor

Anutrix commented Oct 17, 2022

We can close this since #65281 is merged, right?

@MisterMX
Copy link
Contributor

Yes!

@mhilbrunner
Copy link
Member

Closing as salvaged in #65281.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose support for packing/unpacking ZIP archives to the scripting API