Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use Password with Unicode #427

Closed
christianwejwoda opened this issue Mar 6, 2020 · 15 comments · Fixed by #592
Closed

Can't use Password with Unicode #427

christianwejwoda opened this issue Mar 6, 2020 · 15 comments · Fixed by #592
Assignees
Labels
bug encryption zip Related to ZIP file format

Comments

@christianwejwoda
Copy link

Steps to reproduce

Create zipFile using Password "DatenFürDich" which uses the german ü == u with dots.

Expected behavior

the Password should be usefull when extracting files

Actual behavior

When trying zu extract again with Windows or 7Zip --> Wrong Password

Version of SharpZipLib

1.2.0

Obtained from (only keep the relevant lines)

  • Package installed using NuGet
@Numpsy Numpsy added encryption zip Related to ZIP file format labels Mar 8, 2020
@Numpsy
Copy link
Contributor

Numpsy commented Mar 8, 2020

It looks like 7-zip won't allow you to create a .zip file with that password, not sure offhand if that's something decided by 7-zip, or is in the spec somewhere.

Looking at the code, I wonder if there is an issue with the way it handles passwords for ZipCrypto - it currently converts the supplied password to a byte array using the codepage specified in ZipStrings (which currently defaults to UTF-8), and i'm not certain if that is something that should be configurable?
Changing it to use Windows 1252 for the conversion seems to allow the Windows built in zip support to open the file, but i'm not certain if that is the right general approach either.

@piksel
Copy link
Member

piksel commented Mar 27, 2020

I'm not even sure we're talking about ZipCrypto (we should probably start using this name since 7zip calls it that) or about AES encryption passwords.
For the ZipCrypto ones we should actually not depend on the Unicode flag (which we do right now):

D.5 General purpose bit 11 will not imply any encoding of file content or
password.  Values defining character encoding for file content or 
password MUST be stored within the 0x0008 Extended Language Encoding 
Extra Field.

I have not looked into using the extra fields for this though. But if nothing else, we should at least only use CP437/1252 right now.

@piksel piksel added the bug label Mar 27, 2020
@Numpsy
Copy link
Contributor

Numpsy commented Mar 27, 2020

I took the comment about Windows giving you a wrong password error to mean ZipCrypto, as I don't think Explorer will extract AES encrypted files created by 7zip etc anyway (you get an error rather than a password request trying to read such entries I think?)

Perhaps I misunderstood though)

@christianwejwoda
Copy link
Author

christianwejwoda commented Mar 28, 2020 via email

@Numpsy
Copy link
Contributor

Numpsy commented Mar 28, 2020

D.5 General purpose bit 11 will not imply any encoding of file content or
password.  Values defining character encoding for file content or 
password MUST be stored within the 0x0008 Extended Language Encoding 
Extra Field.

I like how it uses MUST there, while also saying that the format of the 0x0008 field is undefined :-(

@Numpsy
Copy link
Contributor

Numpsy commented Mar 29, 2020

I have not looked into using the extra fields for this though. But if nothing else, we should at least only use CP437/1252 right now.

What's the situation with doing that in .NET Standard and/or .NET Core? (you might need to use the System.Text.Encoding.CodePages package to ensure the code pages are available in Core I think?)

@piksel
Copy link
Member

piksel commented Mar 29, 2020

.NET Core supports Unicode and ISO 8859-1 (CP 28591), so we can at least fall back to using that if 437 is not available. The differences can be seen here: https://en.wikipedia.org/wiki/Windows-1252#Character_set
Ideally we would throw an exception if a character that falls outside of the shared ones and we are in ISO 8859-1 fallback mode...

@Numpsy
Copy link
Contributor

Numpsy commented Apr 6, 2020

throw an exception if a character that falls outside of the shared ones

The shared ones being ASCII, from the looks of it?

That's the approach that 7-Zip seems to take anyway:

image

@piksel
Copy link
Member

piksel commented Apr 7, 2020

The shared ones are all but some parts of 0x80-0x9f, they are indicated by the thick border in the linked table.

@Numpsy
Copy link
Contributor

Numpsy commented Apr 7, 2020

I might be confusing myself reading the different tables, but the

ü

character in the ops password looks to map to 0x81 in CP437 and 0xFC in 1252, which is what I meant by not-shared (it isn't in the bordered area in the 1252 table).

(Edit: I took your previous comment to mean throw if a character is one that would be different between CP437 and ISO 8859-1, which looks like a larger range than 0x80-0x9f)

@Numpsy
Copy link
Contributor

Numpsy commented Apr 11, 2020

Anyway, I'll look at doing a PR that makes the basic change to use 437 by default anyway.

@piksel
Copy link
Member

piksel commented Apr 12, 2020

Yeah, sorry, I think I got the code pages mixed somehow. You're right, they don't have that much overlap.

@Numpsy
Copy link
Contributor

Numpsy commented Jun 19, 2020

I did some extra tests a while back but apparently forgot to update the bug, so:

The file WinZipZipCrypto.zip was created with WinZip on an English version of Windows, using the password DatenFürDich. (WinZip allows that password, but suggests that it's a bad idea for compatibility purposes).

On my English Windows 10 system, 7-Zip and Explorer will both decrypt the file, as will SharpZipLib iff the string conversions for the ZipCrypto password are done with codepage 1252 (but not with 437 or utf-8).

I haven't tried to see what happens on an OS with different language settings, it may or may not be different. (I think i read somewhere that 7-Zip uses CP_ACP for doing conversions, but i'm not certain, and that may or may not be meaningful).

So, not really sure where that leaves the default approach that should be used.

@piksel piksel self-assigned this Mar 7, 2021
@piksel
Copy link
Member

piksel commented Mar 7, 2021

In #592 the encoding can be set independently. The default is currently set to SystemDefaultEncoding, which should mimic the behaviour stated above.

@piksel piksel linked a pull request Mar 14, 2021 that will close this issue
@ngray-jnj
Copy link

I see this bug has been fixed but is there any timeline for getting it (i.e. 1.4) out there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug encryption zip Related to ZIP file format
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants