Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong output when number of dex files increases #99

Closed
lwasyl opened this issue Sep 22, 2020 · 7 comments · Fixed by #224
Closed

Wrong output when number of dex files increases #99

lwasyl opened this issue Sep 22, 2020 · 7 comments · Fixed by #224

Comments

@lwasyl
Copy link

lwasyl commented Sep 22, 2020

This is output for diffuse when comparing APK before and after adding a big library with native libs and whatnot (unique numbers are way off because of R8):

          │             compressed             │           uncompressed            
          ├───────────┬───────────┬────────────┼───────────┬───────────┬───────────
 APK      │ old       │ new       │ diff       │ old       │ new       │ diff      
──────────┼───────────┼───────────┼────────────┼───────────┼───────────┼───────────
      dex │   2.6 MiB │     196 B │   -2.6 MiB │   6.6 MiB │      -2 B │  -6.6 MiB 
     arsc │ 440.6 KiB │ 439.6 KiB │     -970 B │ 440.5 KiB │ 439.5 KiB │    -960 B 
 manifest │     3 KiB │     113 B │   -2.9 KiB │  12.2 KiB │      -1 B │ -12.3 KiB 
      res │   1.5 MiB │   1.1 MiB │ -496.3 KiB │   2.1 MiB │ 906.7 KiB │  -1.3 MiB 
   native │  43.8 KiB │   4.6 MiB │   +4.6 MiB │    43 KiB │   4.6 MiB │  +4.5 MiB 
    asset │    49 KiB │     421 B │  -48.6 KiB │ 124.9 KiB │      -3 B │  -125 KiB 
    other │ 170.8 KiB │ 112.4 KiB │  -58.4 KiB │ 361.6 KiB │ 275.6 KiB │ -86.1 KiB 
──────────┼───────────┼───────────┼────────────┼───────────┼───────────┼───────────
    total │   4.8 MiB │   6.2 MiB │   +1.4 MiB │   9.7 MiB │   6.1 MiB │  -3.6 MiB 


         │          raw          │                unique                 
         ├───────┬───────┬───────┼───────┬───────┬───────────────────────
 DEX     │ old   │ new   │ diff  │ old   │ new   │ diff                  
─────────┼───────┼───────┼───────┼───────┼───────┼───────────────────────
   count │     1 │     2 │    +1 │       │       │                       
 strings │ 35815 │ 38915 │ +3100 │ 35815 │ 37708 │ +1893 (+5632 -3739)   
   types │ 11415 │ 12329 │  +914 │ 11415 │ 12031 │  +616 (+4007 -3391)   
 classes │ 10163 │ 10693 │  +530 │ 10163 │ 10693 │  +530 (+3701 -3171)   
 methods │ 42939 │ 46614 │ +3675 │ 42939 │ 45943 │ +3004 (+30436 -27432) 
  fields │ 57046 │ 57083 │   +37 │ 57046 │ 57018 │   -28 (+48758 -48786) 

Note DEX#count row, which shows increased DEX count, yet APK#dex row shows dex size has significantly decreased. Also total compressed file size is much lower than in reality. Here's what AS shows when inspecting new apk file:

image

For comparison, old apk seems to be aligned with diffuse output:
image

@JakeWharton
Copy link
Owner

Yeah something is definitely wrong there, it's clearly not attributing space correctly in the second APK. How are you creating the APK? Are you using an external tool designed for obfuscation?

You can tell it still sees the dex files, but their size is returning 1B somehow (hence the 2B total), potentially due to a corrupted zip structure (at least in the eyes of the JDK ZIP implementation). The 196B compressed size is all the ZIP metadata like the file name, path, timestamps, etc. being added to 2B.

Are you able to share the second APK? Or the tools you're using so I can try to reproduce?

Also what version of Java are you running diffuse with?

@lwasyl
Copy link
Author

lwasyl commented Sep 22, 2020

How are you creating the APK? Are you using an external tool designed for obfuscation?

Just plain ./gradlew app:assembleMyVariantRelease with minifyEnabled true. However the new build had FullStory integration enabled (https://help.fullstory.com/hc/en-us/articles/360040596093-Getting-Started-with-Android-Recording) which might be doing some stuff on the apk/dex after the app has been built. I can't share the apk unsurprisingly, but FullStory integration is pretty straightforward (the only tricky part is that root project needs to have AGP in classpath)

Everything is running the same version of Java:

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_252-b09)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.252-b09, mixed mode)

@shamilovstas
Copy link

I have a similar issue when a build is signed using jarsigner (100% reproducibility). Here are the steps to reproduce:

  1. Check that an apk isn't signed and run diffuse info on it:
    image
    The report shows that the build's size is computed correctly.
  2. Sign the build with jarsigner and run diffuse info once again:
    image
    Now the report says that the uncompressed size is -1

Also, the Wikipedia page about ZIP format that you provided in the comments denotes that the local header of a file stored in a zip archive contains information about the file's compressed and uncompressed size only in case if the file's format is not ZIP64:
image
Can it be the source of the issue?

@mateuszkwiecinski
Copy link
Contributor

mateuszkwiecinski commented Apr 16, 2022

I don't fully understand why, but I just learned that negative values (both for size and compressed size) are produced only when using ZipInputStream. If I replace ZipInputStream with ZipFile (as per this SO answer suggested) diffuse starts showing expected values.

Goooler pushed a commit to Goooler/diffuse that referenced this issue Sep 10, 2022
Update dependency org.jetbrains.kotlin.jvm to v1.7.10
@JakeWharton
Copy link
Owner

@shamilovstas Thank you for the reproducer! I see the same behavior.

Indeed the zip format will use -1 as values for both the compressed and uncompressed size when a zip64 record is present in the extras, ZipInputStream already supports parsing those values. The only thing I can think of is that the requisite bits are not being set to indicate zip64 which does not trigger that codepath.

Looking into it.

@JakeWharton
Copy link
Owner

JakeWharton commented Jan 27, 2024

Okay so the problem is that the creation of the zip was streamed, which means the sizes were not known when the local file header was written (as it precedes the data). When you are streaming, you set bit 3 (0x08) of the general flags which indicates that the data will be followed by a data descriptor record containing the final size information.

If we dump a jarsigned zip we see this is the case:

29639D LOCAL HEADER #11      04034B50 (67324752)
2963A1 Extract Zip Spec      14 (20) '2.0'
2963A2 Extract OS            00 (0) 'MS-DOS'
2963A3 General Purpose Flag  0808 (2056)
       [Bits 1-2]            0 'Normal Compression'
       [Bit  3]              1 'Streamed'
       [Bit 11]              1 'Language Encoding'
2963A5 Compression Method    0008 (8) 'Deflated'
2963A7 Last Mod Date/Time    02210821 (35719201) 'Wed Dec 31 20:01:02 1980'
2963AB CRC                   00000000 (0)
2963AF Compressed Size       00000000 (0)
2963B3 Uncompressed Size     00000000 (0)
2963B7 Filename Length       0013 (19)
2963B9 Extra Length          0000 (0)
2963BB Filename              'AndroidManifest.xml'
2963CE PAYLOAD

296F29 DATA DESCRIPTOR       08074B50 (134695760)
296F2D CRC                   022730EE (36122862)
296F31 Compressed Size       00000B5B (2907)
296F35 Uncompressed Size     0000338C (13196)

@JakeWharton
Copy link
Owner

@mateuszkwiecinski Using ZipFile requires that the data be on the system file system which somewhat undermines the abstraction.

All we need to do is skip the zip entry's contents so that the trailing data descriptor can be parsed and the correct values become available. Bonus: this also allows increasing the precision of the size attribution since we know a data descriptor is present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants