Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-5082: [Python] Stop exporting copies of shared libraries in wheel #4577

Closed

Conversation

fsaintjacques
Copy link
Contributor

This reduces the size from 50mb to 28mb. I haven't tested if this breaks the OSX wheel. Note that the fix is brittle since currently (on Linux) it links with the full-versioned shared library binary. This works out due to the -len(x) sorting applied. A proper fix would be to keep the one linked (found via ldd or some other methods). I suspect that auditwheel and subsequent install & test will catch this if the contract were to change.

@@ -111,6 +111,7 @@ PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py build_ext \
--bundle-boost \
--boost-namespace=arrow_boost
PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py bdist_wheel
# Source distribution is used for debian pyarrow packages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any documentation about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyarrow Debian package doesn't exist.
Source distribution is needed to use pyarrow with libarrow Debian package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you care to explain, I'm missing some pieces. My comment was regarding uwe comment on zulip.

Copy link
Member

@pitrou pitrou Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I'm not sure why this needs to be here. Anyone can build a source distribution from scratch (as the name suggests, it just packages source code together).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives us a reproducible and controlled environment to build a source tarball. You still need the right versions of setuptools amd friends to actually support Markdown in the description field of the package information.

filename = os.path.basename(lib)
link_name = pjoin(build_lib, 'pyarrow', filename)
if not os.path.exists(link_name):
os.symlink(lib_filename, link_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user wants to link a C++ library to the libarrow that is bundled with wheels, will it still work if there is only libarrow.so.14 there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensorflow is doing the same. Downstream users might need to add something to their linker commands though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, when I extracted the tensorflow packages, I noted they used the versionless path.

@wesm
Copy link
Member

wesm commented Jun 14, 2019

@kszucs can we run the Crossbow wheel builds before merging this? Also want to wait for @xhochy to review

@kszucs
Copy link
Member

kszucs commented Jun 14, 2019

@ursabot crossbow --help

@ursabot
Copy link

ursabot commented Jun 14, 2019

Usage: @ursabot crossbow [OPTIONS] COMMAND [ARGS]...

  Trigger crossbow builds for this pull request

Options:
  --help  Show this message and exit.

Commands:
  package  Submit crossbow packaging tasks.
  test     Submit crossbow testing tasks.

@kszucs
Copy link
Member

kszucs commented Jun 14, 2019

@ursabot crossbow package wheel

@ursabot
Copy link

ursabot commented Jun 14, 2019

AMD64 Conda Crossbow (#19707) builder has been succeeded.

Revision: 15932d0

Submitted crossbow builds: ursa-labs/crossbow @ ursabot-10

@kszucs
Copy link
Member

kszucs commented Jun 14, 2019

Most of the wheel builds will fail mostly because of the missing OpenSSL dependency - so OpenSSL should be turned off.

@nealrichardson
Copy link
Member

#4494 may also fix the OpenSSL issue, if we can get that in.

@kszucs
Copy link
Member

kszucs commented Jun 14, 2019

@nealrichardson You may try to execute the same ursabot command, see whether the PR fixes the wheel builds (execute @ursabot crossbow package wheel conda linux to run everything).

@wesm
Copy link
Member

wesm commented Jun 17, 2019

How important is it to have the shared library with the ABI version tag in these wheels? I see two options:

  • Ship libarrow.so.$ABI_VERSION
  • Ship unversioned libarrow.so

Either of these is OK with me. @xhochy do you have an opinion so we can get this closed out?

@xhochy
Copy link
Member

xhochy commented Jun 17, 2019

I would prefer the ABI version in the name as this is the more graceful error for the end user than getting a segmentation fault. We should test first if users of the wheel (read turbodbc) need to adjust their build system or whether the ABI named version is still picked up

@wesm
Copy link
Member

wesm commented Jun 17, 2019

OK, who wants to do that? We're reaching the critical horizon for 0.14 so need to get this issue closed out and move on to the other backlog items

@fsaintjacques
Copy link
Contributor Author

I think I've reached my limit of wheels debugging. @xhochy can you pickup the validation of turbobc?

@wesm
Copy link
Member

wesm commented Jun 18, 2019

I can take this for a spin tomorrow (Tuesday) if no one else volunteers since I have built turbodbc before

@wesm
Copy link
Member

wesm commented Jun 18, 2019

I've tested the wheel with turbodbc locally and turbodbc seems to be fine with linking to the versioned .so file

(note that it's saying "not found" because the pyarrow wheel directory is not in my LD_LIBRARY_PATH)

$ ldd turbodbc_arrow_support.cpython-37m-x86_64-linux-gnu.so 
	linux-vdso.so.1 (0x00007ffe59771000)
	libturbodbc.so => /home/wesm/code/turbodbc/build/libturbodbc.so (0x00007f72a07aa000)
	libarrow.so.14 => not found
	libarrow_python.so.14 => not found
	libcpp_odbc.so => /home/wesm/code/turbodbc/build/cpp/cpp_odbc/Library/libcpp_odbc.so (0x00007f72a0724000)
	libstdc++.so.6 => /home/wesm/cpp-runtime-toolchain/lib/libstdc++.so.6 (0x00007f72a05e2000)
	libgcc_s.so.1 => /home/wesm/cpp-runtime-toolchain/lib/libgcc_s.so.1 (0x00007f72a05cc000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f72a03e1000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f72a0293000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f72a0949000)
	libodbc.so.2 => /home/wesm/miniconda/envs/turbodbc-dev/lib/libodbc.so.2 (0x00007f72a0212000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f72a020c000)
	libiconv.so.2 => /home/wesm/miniconda/envs/turbodbc-dev/lib/./libiconv.so.2 (0x00007f72a0122000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f72a0101000)

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants