Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit Request: librec-auto (300 MB, both PyPI and TestPyPI) #152

Closed
masoudmansoury opened this issue Jan 12, 2020 · 5 comments
Closed

Limit Request: librec-auto (300 MB, both PyPI and TestPyPI) #152

masoudmansoury opened this issue Jan 12, 2020 · 5 comments

Comments

@masoudmansoury
Copy link

Project

librec-auto
https://pypi.org/project/librec-auto/
https://github.com/that-recsys-lab/librec-auto
https://github.com/that-recsys-lab/librec-auto-java

Size of release

300MB

Which indexes

Both

Reasons for the request

The librec-auto project aims to automate recommender system experimens using Librec. The workflow of an experiment involves identifying appropriate data, creating training / test splits, implementing or choosing algorithms, running experiments (possibly with a range of different parameters), and reporting on the results.

@jamadden
Copy link
Contributor

Thanks for the report. It doesn't seem to answer the most important question though: Why does the release have to be so big?

@masoudmansoury
Copy link
Author

Thanks for your response. This project uses a java project, librec, as an engine for performing experiments. As part of the installation, this .jar file needs to be installed as well. So, the whole project including this .jar file will need 270MB-300MB.

@jamadden
Copy link
Contributor

Thanks for the reply! Because such large projects place a burden on PyPI and do not provide a good experience for end users, PyPI moderators are encouraged to help find ways to avoid the need to distribute such large packages — this is especially true if the package isn't actually distributing Python code (PyPI is not built to distribute data sets or non-Python code at scale). At 300MB, moderators are asked to limit grant increases only to established projects, so finding ways to reduce the size is especially important.

The JAR in question is 250MB. It appears to be an amalgamation of many disparate Java libraries. It bundles numerous compiled binary libraries for numerous platforms (Windows x86, Windows x86-64, Linux x86-64, macOS x86-64, Linux Arm, Linux PPC). When expanded, it occupies nearly 1GB of disk space, largely because of all the binary code.

931753243                     27079 files
 83887836  11-23-2019 11:43   org/bytedeco/javacpp/macosx-x86_64/libopenblas.dylib
 42228081  11-23-2019 11:43   org/bytedeco/javacpp/windows-x86_64/libopenblas.dll
 37002668  11-23-2019 11:43   org/bytedeco/javacpp/linux-x86_64/libopenblas.so.0
 26179158  11-23-2019 11:43   org/bytedeco/javacpp/windows-x86/libopenblas.dll
 20630467  11-23-2019 11:43   org/bytedeco/javacpp/linux-x86/libopenblas.so.0
 20523304  11-23-2019 11:43   org/bytedeco/javacpp/linux-ppc64le/libjniopenblas.so
 17761792  11-23-2019 11:43   org/bytedeco/javacpp/windows-x86/jniopenblas.dll
 16453494  11-23-2019 11:43   org/bytedeco/javacpp/linux-ppc64le/libopenblas.so.0
 15765592  11-23-2019 11:43   org/bytedeco/javacpp/linux-x86_64/libjniopenblas.so
 15728252  11-23-2019 11:43   org/bytedeco/javacpp/macosx-x86_64/libjniopenblas.dylib
 14511824  11-23-2019 11:43   org/bytedeco/javacpp/linux-x86/libjniopenblas.so
 14227968  11-23-2019 11:43   org/bytedeco/javacpp/windows-x86_64/jniopenblas.dll
 13589892  11-23-2019 11:43   sentiment/sentiwordnet.txt
 13558619  11-23-2019 11:43   org/bytedeco/javacpp/linux-armhf/libjniopenblas.so
 12868299  11-23-2019 11:43   org/bytedeco/javacpp/linux-armhf/libopenblas.so.0
  8826928  11-23-2019 11:43   lib/x86/libnd4jcpu.so
  8612629  11-23-2019 11:43   org/nd4j/nativeblas/windows-x86_64/libnd4jcpu.dll
  6828164  11-23-2019 11:43   org/nd4j/nativeblas/macosx-x86_64/libnd4jcpu.dylib
  6220840  11-23-2019 11:43   lib/armeabi/libnd4jcpu.so
  5285984  11-23-2019 11:43   org/bytedeco/javacpp/linux-ppc64le/libjnilept.so
  4915812  11-23-2019 11:43   lib/x86/liblept.so
  4854640  11-23-2019 11:43   org/bytedeco/javacpp/linux-ppc64le/liblept.so.5
  4744836  11-23-2019 11:43   lib/x86/libopenblas.so
  4485304  11-23-2019 11:43   org/bytedeco/javacpp/linux-armhf/liblept.so.5
  4395008  11-23-2019 11:42   org/bytedeco/javacpp/windows-x86_64/jnihdf5.dll
  4332992  11-23-2019 11:42   org/bytedeco/javacpp/linux-ppc64le/libhdf5.so.100
  4283392  11-23-2019 11:42   org/bytedeco/javacpp/windows-x86/jnihdf5.dll
  4231940  11-23-2019 11:43   org/bytedeco/javacpp/macosx-x86_64/liblept.5.dylib
  4204304  11-23-2019 11:43   org/bytedeco/javacpp/linux-x86_64/liblept.so.5
  4078944  11-23-2019 11:43   lib/armeabi/liblept.so
  4068984  11-23-2019 11:43   org/bytedeco/javacpp/windows-x86_64/liblept-5.dll
  4007687  11-23-2019 11:43   org/nd4j/nativeblas/linux-x86_64/libnd4jcpu.so
  3997504  11-23-2019 11:43   org/bytedeco/javacpp/macosx-x86_64/libjnilept.dylib
  3979891  11-23-2019 11:43   org/bytedeco/javacpp/windows-x86/liblept-5.dll
  3937548  11-23-2019 11:43   lib/x86/libjnilept.so
  3882024  11-23-2019 11:42   org/bytedeco/javacpp/linux-x86/libhdf5.so.100
  3829216  11-23-2019 11:43   org/bytedeco/javacpp/linux-x86/liblept.so.5
  3814400  11-23-2019 11:43   org/bytedeco/javacpp/windows-x86/jnilept.dll
  3670416  11-23-2019 11:43   org/bytedeco/javacpp/linux-ppc64le/libopencv_dnn.so.3.2
  3659264  11-23-2019 11:43   org/bytedeco/javacpp/windows-x86_64/opencv_dnn320.dll
  3658096  11-23-2019 11:43   org/bytedeco/javacpp/linux-x86/libjnilept.so
  3655760  11-23-2019 11:43   org/bytedeco/javacpp/linux-ppc64le/libjniopencv_core.so
  3640680  11-23-2019 11:42   org/bytedeco/javacpp/linux-x86_64/libhdf5.so.100
  3616760  11-23-2019 11:43   org/bytedeco/javacpp/linux-x86_64/libjnilept.so
  3392784  11-23-2019 11:43   org/bytedeco/javacpp/linux-armhf/libgfortran.so.3
...

The best recommendation I have is to distribute multiple binary wheels, one for each platform you wish to support, using the appropriate platform tags. Only include the needed binary libraries for each platform in that platform's wheel, instead of distributing 5 copies of a library, 4 of which will be unused after download. Perhaps dividing the 250MB jar file into 5 platform specific variants will let it fit within the size limit.

Of course, that just sidesteps the issue of using PyPI to distribute non-Python artifacts. So another recommendation is to provide a way to install the dependency from its intended distribution center (e.g., Maven), either at install time (have 'setup.py' execute Maven) or by having the user do so later (often this is done by providing a Python script that does the installation; this could be used to update this dependency later, too). Using the native distribution mechanism in this way might allow using its own dependency resolution such that there's no need to create an amalgamation jar.

Note that PyPI is currently introducing certain additional scans of uploaded files. In the future that might extend to verifying uploaded binary code is compatible with the platforms it claims, specifically for manylinux, so it is also recommended to build all such binary code on a compatible system.

@masoudmansoury
Copy link
Author

Thanks for your suggestions.

@pradyunsg pradyunsg changed the title librec-auto Limit Request: librec-auto (300 MB, both PyPI and TestPyPI) Mar 19, 2020
@di
Copy link
Member

di commented Apr 30, 2020

Since this package seems to wrap a relatively large JAR file, I'm going to opt to decline this request. If you're able to reduce the size of the JAR file contained in your distribution, please let us know the reduced size and we can reconsider this request.

@di di closed this as completed Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants