-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching yarn and pnpm dependencies in Docker #43329
Conversation
Hey @potiuk, I have started including the yarn dependency installation very similar to dependency caching. Most probably, I am still missing a lot of parts. I wanted to align with you on the approach before going too deep and include every detail :) I think using generic caches from GitHub actions have also a couple of actions for caching. Some of them specifically to yarn, npm etc... I don't think we would fully cascade that to the local environment though since it's limited to using it in GitHub CI. Maybe we can consider caching in CI and locally as separate tasks which using GitHub Actions could be easier to maintain in the future. I couldn't properly run the Thanks for your time! |
This should happen automatically with pre-commit you do not have to run it manually - just |
Yeah. That's something that we could do for regular installing of Yarn/NPM dependencies as part of CI jobs, but that will not work well for installing Yarn/NPM deps during the Dockerfile build. And yes - also we have a few places where we install it for local development as you wrote. But I think it's better to use specific caching in each case:
I think both of them will be difficult to implement with any of the "standard" actions and are generally "better" - and same in CI and locally. So yeah I think your current approach is right. |
Yeah, I found the script itself from pre-commit and ran it multiple times as pre-commit and as a raw Python call. I think this is something in my local setup then. I can sort that out. Thanks! |
Yeah, it will be difficult with them. I have checked a couple of them, and they are a bit limited. Even though they seem like they are properly caching, it's still not flexible enough to fit into multiple use cases. |
One watchout here - in order for the inlining to work with NEW script - you should make sure to add a comment with the right script path just before the script that is "inlined" (and ending comment after) - this is the way how the script finds where to inline it. |
674a600
to
b836fda
Compare
Looks good, but we also have to handle UI dependencies (with pnpm) |
c081542
to
38c7f3e
Compare
While reviewing the logs to ensure everything was running smoothly, I noticed an issue: I had accidentally reversed the -+e option 🤦♂️. However, I encountered the following error:
This suggests that yarn was either not installed or unavailable in the environment when the script was executed. This made me realize that
Indeed, thanks for pointing that out! I wanted to make sure everything was functioning as expected before implementing a similar solution with |
648075b
to
6aed976
Compare
d93f577
to
b4fb977
Compare
There are too many commits, rebased rather than merge :) Could you please check again when you have time? |
…g scripts and build, include -+e to script
…node_modules and remove when utilised
…m Dockerfile prod, separate install/remove npm-pnpm-yarn to faster caching and shared use
… cache directories
b4fb977
to
3a7d9e0
Compare
Sorry for missing that one. Rebased it. It looks really good. |
if os.getenv("AIRFLOW_PRE_CACHED_PNPM_PACKAGES", "false") == "true": | ||
# Copy pnpm-cache to node_modules from pnpm-cache | ||
shutil.copytree(pnpm_node_modules_cache_directory, node_modules_directory, dirs_exist_ok=True) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small tthing here. I think we should ALWAYS run the install step below. This covers case where the .lock file changed in the meantime and we just need to update it based on pre-cached node_modules
@@ -68,7 +69,11 @@ def get_directory_hash(directory: Path, skip_path_regexp: str | None = None) -> | |||
shutil.rmtree(dist_directory, ignore_errors=True) | |||
env = os.environ.copy() | |||
env["FORCE_COLOR"] = "true" | |||
subprocess.check_call(["yarn", "install", "--frozen-lockfile"], cwd=os.fspath(www_directory)) | |||
if os.getenv("AIRFLOW_PRE_CACHED_YARN_PACKAGES", "false") == "true": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small comment
Hmm. Actually I looked closer (something was wrong here) I think this is (and I completely forgot about ) - a bit different issue. What we are currently doing - we do not have at all node /yarn/pnpm in Breeze. We are using and expecting is that
So we do not even have anything in the image. I think the idea of doing similar caching in image (my idea) was really wrong. We could move node + all things to the image but it's not really needed. I am not really sure if we should do much now - the intermittent issues with node did not appear recently - so maybe just abandoning it for now is a better idea. |
Thanks for the review! That's exactly what's the case, so I installed the node /yarn/ppm in the image. I tested with Breeze, too, since Breeze builds the Dockerfile.ci in local development. Aa, this only covers the local development case, though which may only speed up and not solve the entire problem. I agree, I haven't seen the problem with node for a while now. Let's abandon this for now. Also, managing these dependencies even in more places would be an additional burden. Even let's say we managed the dependencies in the image and removed them from pre-commit, it would increase the image size as well as bring a lot of vulnerabilities. Keeping this in the pre-commit environment still makes more sense. I missed this one. Awesome catch! |
Closing this PR. If the problem appears again multiple times and bothers us, we can follow up again and discuss possible solutions. :) |
closes: #43167
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.