Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up setup.py #49

Closed
garfieldnate opened this issue Dec 16, 2021 · 18 comments
Closed

Clean up setup.py #49

garfieldnate opened this issue Dec 16, 2021 · 18 comments

Comments

@garfieldnate
Copy link

garfieldnate commented Dec 16, 2021

Hi,

this might be the same as #42 and #44, but the platform is different. I'm unable to load the .so file through fugashi when running from the base Docker image for Python 3.10. Minimal breaking example below:

Dockerfile:

FROM python:3.10.1-bullseye
RUN pip install -vvv fugashi
RUN echo "import fugashi" | python

Output of running docker build 2>&1 ..

If you know a quick workaround for this, I'd be happy to use that for now. I'd be happy to help if you need a hand with this, as well.

@garfieldnate
Copy link
Author

Quick workaround:

RUN apt-get update

This ensures that libmecab-dev can be downloaded directly instead of needing to be compiled.

That fixes the issue for me, though I still wonder why compiling from source did not work properly.

@garfieldnate
Copy link
Author

Actually, I think the more important issue is that installing the module didn't fail; it would have been better if this were an install-time failure, rather than a runtime failure.

@polm
Copy link
Owner

polm commented Dec 20, 2021

I see the error you're getting but I'm also not sure what would cause it. It might be something simple like needing to run ldconfig after compiling the library.

Because shared libs are resolved at runtime in general, there's no reasonable way to make this an install failure instead of a runtime one.

@garfieldnate
Copy link
Author

Sorry to bother you with this; I think now that I was simply running out of memory during the compilation phase. But I'm still unsure why the build completed successfully without finding mecab. Wouldn't it be beneficial to raise an error at the end of check_libmecab in fugashi_util.py instead of returning empty lists?

@polm
Copy link
Owner

polm commented Dec 20, 2021

Wouldn't it be beneficial to raise an error at the end of check_libmecab in fugashi_util.py instead of returning empty lists?

I guess, but I'm not sure because that means you couldn't ever run setup.py without a working installation.

@garfieldnate
Copy link
Author

Sorry to bother you with this if this seems like a silly issue 😅

you couldn't ever run setup.py without a working installation.

Exactly. Maybe I want it to be more black-boxy than you intended, but the way I expect pip install fugashi to work is that 1) before install I do not have mecab and have no idea how to get mecab, and 2) after install everything works and I don't have to worry about whether mecab is installed. I definitely do not expect 3) fugashi installed successfully but actually the mecab install failed so it doesn't work.

@lambdadog
Copy link
Contributor

lambdadog commented Dec 20, 2021

Exactly. Maybe I want it to be more black-boxy than you intended, but the way I expect pip install fugashi to work is that 1) before install I do not have mecab and have no idea how to get mecab, and 2) after install everything works and I don't have to worry about whether mecab is installed. I definitely do not expect 3) fugashi installed successfully but actually the mecab install failed so it doesn't work.

Fugashi already functions this way IMO, this is a bug in the installation process.

In theory, all routes seen in fugashi_util.py attempt to ensure a running configuration is produced upon install. mecab_config_linux_build is passing an rpath arg to extra_link_args which attempts to ensure that the built python extension will be able to find libmecab.so.

It seems broken however, and it looks like mecab_config_linux_build() is a rarely used and untested older codepath, which is why it's only shown up now (in the gap between python 3.10 release and a fugashi 3.10 wheel being released).

@lambdadog
Copy link
Contributor

lambdadog commented Dec 20, 2021

Some notes after digging into this issue:

  • In python 3.9.9, this Dockerfile installs libmecab in ~/.local/lib/mecab (with ~ being /root)
  • In python 3.10.1, this Dockerfile installs libmecab in /usr/local/lib/python3.10/site-packages/root/.local/lib/mecab
    • When pip is run with --user, this becomes ~/.local/lib/python3.10/site-packages/root/.local/lib/mecab
    • When pip is run as non-root with --user, this becomes ~./local/lib/python3.10/site-packages/home/$USER/.local/lib/mecab

So mecab_config_linux_build()'s behavior on 3.10 is apparently extra broken, and its behavior prior to 3.10 was to install a library on the user's behalf in non-python-managed directories, plus since the extra-link-args are erroneous in some way, it doesn't even produce a working configuration. I'm unsure if this has always been the case since its introduction or if it broke on a python or setuptools update at some point.

mecab_config_debian_user() does the same thing, just taking advantage of apt-get download rather than building the library from source.

And finally, mecab_config_debian_root() quite literally just installs a library on the user's machine using apt-get (this was what made it finally work when @garfieldnate ran apt-get update in the Dockerfile).

In my mind, there are two issues here:

  1. mecab_config_linux_build() and mecab_config_debian_user() don't produce working configurations, despite allowing the install to continue.
  2. Prior to Python 3.10, all three of these make unclear-to-the-user modifications to the user's system that aren't contained in python-relevant directories, with mecab_config_debian_root() still doing so even on 3.10. mecab_config_debian_root()'s modifications aren't even undone when pip uninstall is run, as it performs an apt-get installation, which pip has no way to track in .egg-info or .dist-info!

@lambdadog
Copy link
Contributor

lambdadog commented Dec 20, 2021

Honestly I think the solution to this is just to remove all three of these rarely-used codepaths and simply have users install mecab with their system package manager if installing the package from source.

For mecab_config_debian_root() I strongly dislike the idea of a pip install operation installing something using my system package manager.

For mecab_config_linux_build() and mecab_config_debian_user(), they're rarely used enough that this problem wasn't noticed until now (when wheels are not yet released for Python 3.10) and both could be replaced by simply asking users to install mecab with their system package manager.

It should be theoretically possible to fix mecab_config_linux_build() and mecab_config_debian_user(), but they fall into the category of codepaths that are prone to bit rot and aren't used often enough that they're worth it IMO.

@polm
Copy link
Owner

polm commented Dec 21, 2021

For the record, these were all added in #8, so that may be helpful reference. It's true that some of these are rarely used.

What we can do is cut out most of these and leave the responsibility for installing mecab up to the user. As part of that, if no working configuration can be found, we can throw an exception instead of returning empty lists.

My concern with setup.py throwing an exception was that maybe there was some informational command, like setup.py info or something, that could be useful even without a working setup. But I can't think of anything like that and haven't found one poking around so I guess it's not worth worrying about.

@polm
Copy link
Owner

polm commented Dec 21, 2021

Test release with these changes is up as 1.1.2a4.

@garfieldnate
Copy link
Author

If the automation aspect is removed from fugashi, I think it would be a good idea to update the readme with more detailed install instructions (minimum apt-get update && apt-get install libmecab-dev). It might be gross to change the user's environment, but it's also pretty convenient for many users XD. There might also be a middle ground where we prompt or use some optional install parameters to determine if mecab should be installed.

@lambdadog
Copy link
Contributor

lambdadog commented Dec 21, 2021

Fugashi will still install automatically for the vast, vast majority of users, to be clear, at least once the Python 3.10 wheels are pushed to PyPI.

That said, probably still not a terrible idea to have at least some mention of the install failure state and an example of installing mecab, like Nate mentions.

@polm
Copy link
Owner

polm commented Dec 23, 2021

Found an issue with the new change - it breaks the sdist upload. Creating an sdist uses setup.py but doesn't require a working MeCab env. So I'll need to hack a workaround for that.

@polm
Copy link
Owner

polm commented Feb 14, 2022

Hey, I am back working on this again. I should be able to resolve the sdist issues and make a release soon.

One extra issue that came up is getting support for OSX arm64 / M1 processors. I made a test build but I don't have any way of testing it myself; if either of you does, it'd be a big help if you could take a look at #55 and give it a quick test.

@garfieldnate
Copy link
Author

garfieldnate commented Feb 14, 2022

Sorry, I only have old Macs :( Thank you for working on this, though!

@polm polm changed the title ImportError: libmecab.so.2: cannot open shared object file: No such file or directory Clean up setup.py Feb 15, 2022
@polm
Copy link
Owner

polm commented Feb 16, 2022

Changed the title of this issue to reflect what actually ended up happening.

The latest release, 1.1.2, incorporates the changes discussed here, so I'll close this, but if I forgot anything let me know.

@polm polm closed this as completed Feb 16, 2022
@polm
Copy link
Owner

polm commented Feb 16, 2022

Also, for the record, you still can't run python setup.py sdist without a working MeCab installation. But since few people need sdist and it's not super useful without MeCab anyway, I just made sure the CI installs it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants