Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors while reading archives from java.util.zip via ZipInputStream #279

Closed
TalgatAkhm opened this issue Jan 22, 2021 · 3 comments
Closed
Assignees
Labels
bug Something isn't working resolved

Comments

@TalgatAkhm
Copy link

There is an error while reading zip archive from standard java.util.zip archiver. Zip4j ZipInputStream successfully read only first header in archive, next it can not find any.

Here is a full code example (with zip generating, and reading via zip4j stream) and reasoning below:

   
    @Test
    public void test() throws IOException {
        // Arrange
        String archivePath = "C:/Users/Public/test.zipByJavaUtilZip";
        File zipArchive = new File(archivePath);

        File folderToZip = new File("C:/Users/Public/FolderToZip");
        int expectedNumberOfEntries;
        try (java.util.zip.ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(zipArchive))) {
            expectedNumberOfEntries = zipByJavaUtilZip(zos, folderToZip, folderToZip.listFiles());
            zos.flush();
        }

        // Act
        List<String> elements = new ArrayList<>();
        try(InputStream is = new FileInputStream(zipArchive);
                net.lingala.zip4j.io.inputstream.ZipInputStream zis = new ZipInputStream(is)) {

            LocalFileHeader fh;
            while((fh = zis.getNextEntry()) != null)
                elements.add(fh.getFileName());
        }

        // Act-assert
        // Lets prove that there are five elements in archive
        net.lingala.zip4j.ZipFile zf = new net.lingala.zip4j.ZipFile(zipArchive);
        Assertions.assertEquals(expectedNumberOfEntries, zf.getFileHeaders().size());
        Assertions.assertEquals(5, zf.getFileHeaders().size());

        // Assert
        // Check for mistakes in zip4j streams
        Assertions.assertEquals(expectedNumberOfEntries, elements.size()); // (elements.size() == 1) = true
    }

    private static int zipByJavaUtilZip(ZipOutputStream zos, File rootFolder, File[] files) throws IOException {
        int entriesNumber = 0;
        for (File f : files) {
            if (f.isDirectory())
                entriesNumber += zipByJavaUtilZip(zos, rootFolder, f.listFiles());
            else {
                entriesNumber++;
                String path = rootFolder.toPath().relativize(f.toPath()).toString().replaceAll("\\\\", "/");
                ZipEntry entry = new ZipEntry(path);
                zos.putNextEntry(entry);
                try (InputStream fis = new FileInputStream(f)) {
                    IOUtils.copy(fis, zos);
                }
                zos.closeEntry();
            }
        }
        return entriesNumber;
    }

However, comparing archives from zip4j and java.util.zip in hex viewer, I found, that there are not many differences between them. One of them is the general purpose flag, but the 3rd bit of it have been set in both archives. That means, that in LFH compression size is 0, and the actual size is written in the data descriptor (they are identical). So it is normal to dont know the actual size while reading LFH. But the problem is, then I call zis.getNextEntry() second time the result will be null, because in zis.getNextEntry() there is call of readUntilEndOfEntry(), in it I can see this code:

    if (localFileHeader.isDirectory() || localFileHeader.getCompressedSize() == 0) {
      return;
    }
.....

In this condition there is a leaving from function, because localFileHeader.getCompressedSize() is equal to zero. But after creating archive with the same content via zip4j ZipFile or ZipOutputStream, the test above will pass. So I found this as a strange behaviour.

Moreover, if I fix it in my code with the problems will appear in reading directories-lfh:

if (!lfh.isDirectory())
   zis.readAllBytes();

I don’t know exactly why this is happening and how to fix it (maybe it doesn't need to be fixed at all).
The test archive in attach. Thank you!
test.zip

@TalgatAkhm
Copy link
Author

A quick fix is reading content of any element (no matter file or directory) after each call ZipInputStream.getNextEntry(), even if no one is going to use the data, like:

zis.getNextEntry();
zis.readAllBytes();

The reason for it:
3d bit of general purpose flag is set to 1, so compressedSize in LFH is set to zero, and after calling getNextEntry() second time readUntilEndOfEntry() doesn't do anything, because of compressedSize = 0. That's why zip4j try to read second LFH from first element content. So quick fix in library code is:

private void readUntilEndOfEntry() throws IOException {
    if (localFileHeader.isDirectory() || (localFileHeader.getCompressedSize() == 0 && !localFileHeader.isDataDescriptorExists())) {
      return;
    }
....

But, I think this only solve consequence but not cause

@srikanth-lingala
Copy link
Owner

Thanks for the detailed explanation and your analysis. Appreciate it. I have fixed the issue and will include the fix in the next release.

@srikanth-lingala srikanth-lingala added bug Something isn't working resolved and removed in-progress labels Jan 28, 2021
@srikanth-lingala
Copy link
Owner

Issue fixed in v2.7.0 which was released today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working resolved
Projects
None yet
Development

No branches or pull requests

2 participants