Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify frontend CI build and container #3118

Open
sarayourfriend opened this issue Oct 3, 2023 · 4 comments
Open

Simplify frontend CI build and container #3118

sarayourfriend opened this issue Oct 3, 2023 · 4 comments
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: frontend Related to the Nuxt frontend 🔒 staff only Restricted to staff members

Comments

@sarayourfriend
Copy link
Collaborator

sarayourfriend commented Oct 3, 2023

Problem

The frontend build relies on an additional build context and attempts to build the frontend in a shimmed pnpm workspace. This has two problems:

  • Complexity due to multiple build contexts
  • Complexity due to juggling manual changes to the pnpm workspace configuration

If we extract any aspects of our application into packages, then the current approach would also require manually adding those packages to the pnpm workspace configuration and shimming a new root package.json that referenced the packages in a way that made it possible to include them.

It all boils down to complexity, because we're building the frontend without leveraging the existing workspace configuration. I believe the reason we did this was to simplify the switch to the monorepo.

Description

#3141 has a first-pass/demo/draft approach. That PR mostly worked at the time (it has merge conflicts now), but the pnpm caching configuration did not. It would be excellent if we could get the pnpm caching to work during the build, as right downloading packages consumes a good deal of time in the build.

Follow the approach started in that PR, which is based on the Docker build described in pnpm's documentation: https://pnpm.io/docker

Additional context

@sarayourfriend sarayourfriend added 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work ✨ goal: improvement Improvement to an existing user-facing feature labels Oct 3, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Oct 3, 2023
@sarayourfriend sarayourfriend changed the title <Replace this with actual title> Simplify frontend CI build and container Oct 3, 2023
@sarayourfriend sarayourfriend added 💻 aspect: code Concerns the software code in the repository 🔒 staff only Restricted to staff members 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: frontend Related to the Nuxt frontend 🟨 priority: medium Not blocking but should be addressed soon and removed 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work ✨ goal: improvement Improvement to an existing user-facing feature labels Oct 3, 2023
@sarayourfriend
Copy link
Collaborator Author

cc @dhruvkb and @krysal to see what y'all think of this. It's related to work we've discussed previously about only including the necessary dependencies in the frontend build. The approach I've described would accomplish that while also allowing us to take advantage of the git context for the Sentry release as described in #3116.

Alternatively, if we do not care about the Sentry release or we are okay with copying the .git directory into the image, we could follow the less complex multi-stage build approach described by pnpm for monorepos. I don't know if all that is necessary is that the .git directory is available during the build though, so I'm not sure if that alone would make the git integration with Sentry work. Overall I think it's simpler to build outside the container and copy the necessary files into it afterwards. This makes the build go much faster on my local.

What do y'all think? The clearest trade-off is losing the supposed stability of a tightly controlled build environment... but I wonder how important that really is for us 🤔.

@dhruvkb
Copy link
Member

dhruvkb commented Oct 3, 2023

Regarding the losses in the current approach:

I'm not sure if building outside the container is a good idea. On one hand, the PoC is considerably simpler which is awesome! But on the other it just feels wrong intuitively because it goes against all my experience so far of working with Docker. I can see how "intuition" is a terrible argument so I wouldn't block this if everyone else feels like this is a good path forward.

PS: are you using the frontend Docker image for local dev? I've never built/used it locally so if you are, that's interesting!

@sarayourfriend
Copy link
Collaborator Author

(2) Was resolved in #2713 (tests shouldn't be included in the Docker image anymore)

👍

(4) We do use Docker's caching so that built artefacts and layers can be cached. That should compensate for the lack of the pnpm cache to some extent?

Unfortunately it doesn't, because any changes to the files in frontend cause the layer that copies those files to cache-bust, meaning pnpm fetch runs again. Maybe we can just move pnpm fetch before that copy line by copying just the package.json's?

(5) How does Sentry recognise Git for the API or ingestion server? Those images are also built inside their respective folders and .git is not copied into the images.

That's a good question. I wasn't sure, but it looks like the GitHub integration does it. There are "code mappings" here: https://openverse.sentry.io/settings/integrations/github/125369/

It turns out that the frontend mappings do work. See this issue, for example: https://openverse.sentry.io/issues/4381152390/?project=5799642&query=is%3Aunresolved&referrer=issue-stream&statsPeriod=14d&stream_index=3

The line of code is correct for the stack trace. However, there's no suspect commit. Maybe there isn't a suspect commit for that issue. Sentry lists several cases where that might happen: https://docs.sentry.io/product/issues/suspect-commits/?original_referrer=https%3A%2F%2Fdocsbot.ai%2F#missing-suspect-commits

So I think I'm incorrect about that point. The feature is working fine, and we can even close the source maps issue I created as invalid.

PS: are you using the frontend Docker image for local dev? I've never built/used it locally so if you are, that's interesting!

I'm not. This wouldn't have been possible with the current Dockerfile until recently, since Docker compose added additional_contexts to the compose file spec. We could try to do it, it would be nice if it worked.

Given the situation with Sentry, I think it's actually better to try to implement the pnpm recommended approaches for monorepo dockerfile building. Their instructions also include how to re-use the host machines module cache, which would improve the other aspects as well. We can pretty easily create multiple stages to retrieve the build artefacts and prod dependencies separately 🎉

@sarayourfriend
Copy link
Collaborator Author

For posterity, here are the pnpm instructions: https://pnpm.io/docker

I'll mark this issue as needing changes, but will close my PR for it and update this issue later to reflect using the approach pnpm has documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: frontend Related to the Nuxt frontend 🔒 staff only Restricted to staff members
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants