Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dependency check to project step runs #11226

Merged

Conversation

rmitsch
Copy link
Contributor

@rmitsch rmitsch commented Jul 27, 2022

Goal

Ensure dependencies specified in requirements.txt are installed. Notes:

  • No tests have been added yet. Dedicated tests would be nice, but we'd need to mess with the test venv for that - not sure that's worth it?
  • Documentation will be added once the code review is through.

Description

Uses pkg_resources to check whether all dependencies can be imported without issues. Note that this doesn't check the complete dependency tree - theoretically a sub-dependency might be missing or mismatched without this check noticing.

Types of change

New feature.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@rmitsch rmitsch added enhancement Feature requests and improvements projects spaCy projects and project templates labels Jul 27, 2022
@rmitsch rmitsch self-assigned this Jul 27, 2022
setup.cfg Outdated Show resolved Hide resolved
@adrianeboyd
Copy link
Contributor

Have you looked at the timing for this yet?

@rmitsch
Copy link
Contributor Author

rmitsch commented Jul 27, 2022

Have you looked at the timing for this yet?

Not yet, wanted to confirm this approach is acceptable before.

@adrianeboyd
Copy link
Contributor

If the basic pkg_resources checks are extremely slow we can immediately stop working on it. (I don't think they will be, though.)

@rmitsch
Copy link
Contributor Author

rmitsch commented Jul 27, 2022

If the basic pkg_resources checks are extremely slow we can immediately stop working on it. (I don't think they will be, though.)

So far they felt pretty instantaneous. Will benchmark with more extensive requirement files tomorrow.

…ate_requirements(). Handle file reading in project_run().
@rmitsch
Copy link
Contributor Author

rmitsch commented Jul 28, 2022

Running the validation with this cobbled together list of requirements

scipy
umap
transformers>=3.4.0,<4.3.0
spacy-transformers>=1.0.1,<1.1.0
stanza>=1.3.0,<1.5.0
flair>=0.6.0,<1.0.0
ufal.udpipe
spacy-lookups-data>=1.0.0,<1.1.0
ml_datasets==0.2.0a0
spacy-experimental==0.2.0
typer
numpy
tqdm
wasabi
pydantic>=1.0.0,<2.0.0
fastapi>=0.61.1,<0.62.0
aiofiles
uvicorn>=0.11.6,<0.12.0
spacy-huggingface_hub
plotext
spacy_ray
spacy-streamlit>=1.0.0a0
streamlit
wandb>=0.12.4,<0.20.0
floret>=0.10.1,<0.11.0
natto-py
spacy-streamlit>=1.0.0a0
spacy-lookups-data>=1.0.0,<1.1.0
google-cloud-storage
streamlit
presidio-analyzer==2.2.1
presidio-anonymizer==2.2.1
skweak==0.2.13
hmmlearn==0.2.6
nlpaug

takes - on my local machine - 0.33 seconds in an environment which has all of these dependencies installed and 0.21 seconds in one that's missing most of them. I think that's reasonable.

@rmitsch rmitsch marked this pull request as ready for review July 29, 2022 09:04
spacy/cli/project/run.py Outdated Show resolved Hide resolved
@svlandeg svlandeg requested a review from adrianeboyd August 9, 2022 13:42
Copy link
Contributor

@adrianeboyd adrianeboyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the conflicting dependencies message needs more info. If the conflict is further down in the tree, it's really unclear what's going on. I think we could either skip the conflicting packages check or maybe add more in the message to run pip check or something, but right now it says there are conflicts for a package that seems to be installed correctly.

Like this is the output:

⚠ The following depencency conflicts were detected:
['spacy-streamlit>=1.0.0a0']

The conflict isn't obvious from this output.

And if I put in a non-existent future version, I still get the conflict instead of the requirement message:

⚠ The following depencency conflicts were detected:
['spacy-streamlit>=2.0.0a0']

If you want to reproduce this, the actual conflict is:

streamlit 1.9.2 has requirement click<8.1,>=7.0, but you have click 8.1.3.

If I put my whole working environment into the requirements (350 packages) it adds about 1 s to the running time. With a shorter/realistic list it looks like more like 0.1 s. Compared to the running time of actual steps, I think 0.1 s is acceptable.

spacy/cli/project/run.py Outdated Show resolved Hide resolved
spacy/cli/project/run.py Outdated Show resolved Hide resolved
spacy/cli/project/run.py Outdated Show resolved Hide resolved
@rmitsch
Copy link
Contributor Author

rmitsch commented Aug 16, 2022

I think the conflicting dependencies message needs more info.

Displaying the full message from VersionConflict.report() now. This should show streamlit 1.9.2 has requirement click<8.1,>=7.0, but you have click 8.1.3. in the warning message. Can you verify that?

@adrianeboyd
Copy link
Contributor

The merge undid some other edits?

@rmitsch
Copy link
Contributor Author

rmitsch commented Aug 16, 2022

The merge undid some other edits?

Weird. But only f581955, right? Re-added the change.

rmitsch and others added 3 commits August 16, 2022 14:41
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
spacy/cli/project/run.py Outdated Show resolved Hide resolved
rmitsch and others added 2 commits September 12, 2022 11:03
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final questions & comments, then this should be good to merge!

website/docs/usage/projects.md Outdated Show resolved Hide resolved
spacy/cli/project/run.py Show resolved Hide resolved
rmitsch and others added 2 commits September 14, 2022 12:13
…:rmitsch/spaCy into feature/projects-automatic-install-check
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Copy link
Contributor

@adrianeboyd adrianeboyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the docs need one more round of prettier, but otherwise this looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements projects spaCy projects and project templates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants