-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5082: [Python] Stop exporting copies of shared libraries in wheel #4577
Conversation
@@ -111,6 +111,7 @@ PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py build_ext \ | |||
--bundle-boost \ | |||
--boost-namespace=arrow_boost | |||
PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py bdist_wheel | |||
# Source distribution is used for debian pyarrow packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any documentation about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyarrow Debian package doesn't exist.
Source distribution is needed to use pyarrow with libarrow Debian package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you care to explain, I'm missing some pieces. My comment was regarding uwe comment on zulip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, I'm not sure why this needs to be here. Anyone can build a source distribution from scratch (as the name suggests, it just packages source code together).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives us a reproducible and controlled environment to build a source tarball. You still need the right versions of setuptools amd friends to actually support Markdown in the description field of the package information.
filename = os.path.basename(lib) | ||
link_name = pjoin(build_lib, 'pyarrow', filename) | ||
if not os.path.exists(link_name): | ||
os.symlink(lib_filename, link_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user wants to link a C++ library to the libarrow that is bundled with wheels, will it still work if there is only libarrow.so.14
there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensorflow is doing the same. Downstream users might need to add something to their linker commands though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, when I extracted the tensorflow packages, I noted they used the versionless path.
@ursabot crossbow --help |
|
@ursabot crossbow package wheel |
AMD64 Conda Crossbow (#19707) builder has been succeeded. Revision: 15932d0 Submitted crossbow builds: ursa-labs/crossbow @ ursabot-10 |
Most of the wheel builds will fail mostly because of the missing OpenSSL dependency - so OpenSSL should be turned off. |
#4494 may also fix the OpenSSL issue, if we can get that in. |
@nealrichardson You may try to execute the same ursabot command, see whether the PR fixes the wheel builds (execute |
How important is it to have the shared library with the ABI version tag in these wheels? I see two options:
Either of these is OK with me. @xhochy do you have an opinion so we can get this closed out? |
I would prefer the ABI version in the name as this is the more graceful error for the end user than getting a segmentation fault. We should test first if users of the wheel (read turbodbc) need to adjust their build system or whether the ABI named version is still picked up |
OK, who wants to do that? We're reaching the critical horizon for 0.14 so need to get this issue closed out and move on to the other backlog items |
I think I've reached my limit of wheels debugging. @xhochy can you pickup the validation of turbobc? |
I can take this for a spin tomorrow (Tuesday) if no one else volunteers since I have built turbodbc before |
I've tested the wheel with turbodbc locally and turbodbc seems to be fine with linking to the versioned .so file (note that it's saying "not found" because the pyarrow wheel directory is not in my LD_LIBRARY_PATH)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This reduces the size from 50mb to 28mb. I haven't tested if this breaks the OSX wheel. Note that the fix is brittle since currently (on Linux) it links with the full-versioned shared library binary. This works out due to the
-len(x)
sorting applied. A proper fix would be to keep the one linked (found vialdd
or some other methods). I suspect that auditwheel and subsequent install & test will catch this if the contract were to change.