Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Artifactory storage and bandwidth use #4533

Open
MarkEWaite opened this issue Feb 7, 2025 · 56 comments
Open

Reduce Artifactory storage and bandwidth use #4533

MarkEWaite opened this issue Feb 7, 2025 · 56 comments

Comments

@MarkEWaite
Copy link

MarkEWaite commented Feb 7, 2025

Service(s)

Artifactory

Summary

Jenkins artifact storage use has increased to greater than 10 TB and bandwidth use has recently increased significantly, almost to the bandwidth use prior to the work done in:

JFrog sponsors open source without charging the open source project. They have asked us to reduce our storage to less than their 5 TB open source sponsorship threshold. We also want to reduce our bandwidth use to be much closer to the bandwidth we were using in early 2024.

Reproduction steps

  1. Review the storage use and reduce storage use to less than 5 TB
  2. Review the bandwidth use and reduce to be much closer to the January 2024 usage
@MarkEWaite MarkEWaite added triage Incoming issues that need review and removed triage Incoming issues that need review labels Feb 7, 2025
@dduportal dduportal added this to the infra-team-sync-2025-02-11 milestone Feb 7, 2025
@MarkEWaite MarkEWaite added triage Incoming issues that need review and removed artifactory triage Incoming issues that need review labels Feb 7, 2025
@dduportal
Copy link
Contributor

Let's start with a snapshot of metrics provided by JFrog so we can watch the effect of any change.

  • Outbound Bandwidth (e.g. amount of data downloaded) can be seen in JFrog's portal:
Image Image
  • Storage (e.g. amount of data stored on disk) can be found in Artifactory administration:
Image

=> We are clearly pointing fingers to the Atlassian Mirror `` for both storage and outbound bandwidth as the top-level item to work on
=> Cleaning up incrementals repository is useful BUT should be secondary

@timja
Copy link
Member

timja commented Feb 7, 2025

I mean the atlassian one should be fairly easy to resolve, just swap it to includes only.

@dduportal
Copy link
Contributor

Update: we've granted access to @darinpope to Artifactory so he can drive this topic:

  • Ref. https://groups.google.com/g/jenkinsci-dev/c/6_X0ABdLAgE/m/3XwzSn1xEAAJ for the decision process
  • I've created him a secondary (non LDAP) account darinpope-admin with a permission scheme named darinpope-helpdesk-4533 for easier tracking (and reverting in the future)
    • He already has accessed the account and set up a strong password
    • @darinpope do you mind enabling 2FA on this account (since it is not an LDAP)?
    • Despite the account name (darinpope-admin) , he does not have administration permission: the permission scheme allows him to Manage only 5 repositories: all *atlassian* mirrors and the incrementals. That should be sufficient enough: we'll see if he need more permissions.

@timja
Copy link
Member

timja commented Feb 7, 2025

I wonder if we still need the jcenter-cache as well?

@dduportal
Copy link
Contributor

I mean the atlassian one should be fairly easy to resolve, just swap it to includes only.

Let's now track work for each subject on associated issues:

=> there might be subsqiuent follows up issue.

@dduportal
Copy link
Contributor

I wonder if we still need the jcenter-cache as well?

Yes, we do. It has artifacts which have been deleted from remotes as far as I remember. Worth checking this subject but let's not get carried away: if @darinpope is able to solve the atlassian and incremental, we'll see immediate results to share with JFrog

@darinpope
Copy link
Collaborator

@dduportal I've dug around and cannot find a way to implement MFA on my admin account. Is that another switch you need to set on my account?

@dduportal
Copy link
Contributor

@dduportal I've dug around and cannot find a way to implement MFA on my admin account. Is that another switch you need to set on my account?

My mistake: MFA is only available on the JFrog portal, not on Artifactory. Sorry for wasting your time on this one.

Ref. #4472 (comment):

@dduportal with my current permissions with the admin user, I don't have access to look at the configuratio page for repos.

@darinpope I've set your account as admin, you should have access now

@darinpope
Copy link
Collaborator

darinpope commented Feb 11, 2025

applied com/atlassian/**/* as an include on atlassian-maven-public at 10:15am Central/1615 UTC on 2025-02-11

Image

Now waiting to see if the repo cleans itself up or not. If not by this evening my time, I'll start purging.

@darinpope
Copy link
Collaborator

darinpope commented Feb 11, 2025

Well, that was fast. It appears that it's already cleaned up.

Image

but not really. It appears it still exists on disk:

Image

so now to figure out how/if the HDD gets cleaned up

@darinpope
Copy link
Collaborator

It might be because of "Store Artifacts Locally":

Image

I want to wait on unchecking that box until we meet with JFrog on 2025-02-12.

@MarkEWaite
Copy link
Author

MarkEWaite commented Feb 11, 2025

Thanks very much! I've started jobs on ci.jenkins.io that I expect will require Atlassian libraries. In case others want to check those jobs, here they are with embeddable build status links:

Jira plugin master branch Jira plugin PR Jira ext plugin master branch Jira ext plugin PR Jira trigger plugin master branch Jira integration master branch Atlassian Software Cloud Atlassian Software Cloud PR Jira test result reporter

@timja
Copy link
Member

timja commented Feb 11, 2025

I think what dduportal normally does is move all artifacts out to a separate archived repository that isn't mirrored.

Then once we are confident we can delete?

We can likely just repopulate the atlassian one by running builds like Mark has.

@dduportal
Copy link
Contributor

I think what dduportal normally does is move all artifacts out to a separate archived repository that isn't mirrored.

Then once we are confident we can delete?

Usually, we let the Artifactory's internal Garbage Collector to kick. But we never really checked artifacts were really deleted (unless exceptional cases with less than 10 occurrences) from disk as the storage always has been advertised by JFrog as "not an issue" until nowadays (focus was on bandwidth and builds safety).
Additionally, there are artifacts only present on mirror of upstream which are now gone so we used to be really careful.

The "Store Artifacts Locally" checkbox (to unselect) seems like a good idea for Atlassian public specifically.

We can likely just repopulate the atlassian one by running builds like Mark has.

+1 with @timja : it make sense for this specific repository. @darinpope I believe you can go for this

@darinpope
Copy link
Collaborator

darinpope commented Feb 13, 2025

We implemented the new artifactory-pkg-maven-cache today (2025-02-13) into public and remove the old atlassian entry around 1545 UTC.

At ~1910 UTC, I changed the include pattern from com/atlassian/**/* to just com/atlassian/** to see if that will make the pull through work as we expect.

@MarkEWaite
Copy link
Author

@yonarbel could you post the most recent usage status report? We have our weekly infra team meeting tomorrow and we want to be sure that storage use is still less than 5 TB. @darinpope is out of the office at the moment, so the next reduction in storage use may be delayed until he returns, but we have a plan for the next reduction.

@MarkEWaite
Copy link
Author

Latest usage is above 5 TB of storage:

Image

@MarkEWaite
Copy link
Author

@dduportal will investigate the increased storage use tomorrow. @darinpope is out of the office with a family emergency.

@darinpope
Copy link
Collaborator

darinpope commented Mar 5, 2025

I'm still out, but I did change the include com/atlassian/connect/** to com/atlassian/connect/ac-play-java_2.10/** because the only plugin using that path is zephyr-for-jira-test-management-plugin. It looks like the last release of that plugin was 7+ years ago, so we might be able to completely drop the connect include. I'm just trying to see if that properly cleans up the connect path in Artifactory.

@darinpope
Copy link
Collaborator

The repo cleans up, but the cache doesn't, so I'm going through and deleting the leftover directories in cache.

@MarkEWaite
Copy link
Author

the only plugin using that path is zephyr-for-jira-test-management-plugin. It looks like the last release of that plugin was 7+ years ago, so we might be able to completely drop the connect include.

I agree that we can drop the connect include.

The plugin has less than 200 installations and has 2 unresolved security vulnerabilities. As far as I can tell from the Atlassian documentation, that plugin has been superseded by one of these two plugins:

@darinpope
Copy link
Collaborator

I'm going to do the same variation for the plugins include. There's a lot we can get rid of. Looking at:

https://github.com/search?q=org%3Ajenkinsci+%22%3CgroupId%3Ecom.atlassian.plugins%3C%2FgroupId%3E%22&type=code

we can add two include paths for plugins (atlassian-plugins-core and atlassian-plugins-osgi) and really trim that down too. I'm starting on that now.

@darinpope
Copy link
Collaborator

Same with crowd:

https://github.com/search?q=org%3Ajenkinsci+%22%3CgroupId%3Ecom.atlassian.crowd%3C%2FgroupId%3E%22&type=code

headed towards:

  • com/atlassian/crowd/crowd-integration-client/**
  • com/atlassian/crowd/crowd-core/**
  • com/atlassian/crowd/crowd-integration-client-common/**
  • com/atlassian/crowd/crowd-integration-api/**

@darinpope
Copy link
Collaborator

Same with confluence:

https://github.com/search?q=org%3Ajenkinsci+%22%3CgroupId%3Ecom.atlassian.confluence%3C%2FgroupId%3E%22&type=code

headed towards:

  • com/atlassian/confluence/confluence-rest-client/**

@darinpope
Copy link
Collaborator

darinpope commented Mar 5, 2025

Here's the rough before I started deleting connect, plugins, crowd and confluence (I forgot to do a real capture but this is close):

Image

and after:

Image

so we are headed in the right direction. I need to dig through the other top levels, mainly bitbucket and jira, to see how to tighten them up.

@darinpope
Copy link
Collaborator

@darinpope
Copy link
Collaborator

@darinpope
Copy link
Collaborator

darinpope commented Mar 5, 2025

yes, jira is the problem child:

Image

it will probably be tomorrow before I can work through that. I need to make sure I don't break any plugin builds after I put in the includes.

@darinpope
Copy link
Collaborator

darinpope commented Mar 5, 2025

https://github.com/search?q=org%3Ajenkinsci+%22%3CgroupId%3Ecom.atlassian.jira%3C%2FgroupId%3E%22&type=code

  • com/atlassian/jira/atlassian-jira/**
  • com/atlassian/jira/jira-rest-java-client-api/**
  • com/atlassian/jira/jira-rest-java-client-core/**
  • com/atlassian/jira/jira-rest-java-client/**
  • com/atlassian/jira/jira-rest-plugin/**

Again, I'm holding on cleaning up the cache for jira until I get back to a more stable internet connection later this evening Central time. It'll probably run a good part of my night and then I can verify in the morning.

@darinpope
Copy link
Collaborator

The cleanup ran pretty fast, so I tested buildling jira-plugin. I had to add two more includes:

  • com/atlassian/jira/jira-rest-java-client-parent/**
  • com/atlassian/sal/sal-parent/**

I'm not calling this done yet, but definitely better that earlier today.

@darinpope
Copy link
Collaborator

When building JiraTestResultReporter-plugin, I had to add a handful more includes:

  • com/atlassian/jira/jira-internal-bom/**
  • com/atlassian/jira/jira-bom/**
  • com/atlassian/jira/jira-api-bom/**
  • com/atlassian/jira/jira-deprecated-api-bom/**
  • com/atlassian/plugins/rest/atlassian-rest-exported-libraries/**
  • com/atlassian/plugins/rest/atlassian-rest-parent/**
  • com/atlassian/plugins/rest/atlassian-rest-v2-exported-libraries/**

@darinpope
Copy link
Collaborator

darinpope commented Mar 6, 2025

here's the current usage (timestamp in shot)

Image

so I think its closer. If a plugin fails, we'll need to determine the correct include to add.

However, we're right on the edge of 4TB, so the sooner we can get jcenter-cache archived, the better.

@darinpope
Copy link
Collaborator

Looks like we are overall holding reasonably steady after about 9 hours:

Image

@jglick
Copy link

jglick commented Mar 6, 2025

lots of "weird" paths (activeobject, adf, alternator, etc) that none of the plugins use

Can the POMs of the Jira-related plugin(s) be improved to exclude large unused transitive deps so the problem does not recur, rather than trying to hack around in repository configuration? (disregard if I failed to follow the discussion correctly)

@darinpope
Copy link
Collaborator

@jglick I agree that's the right mid/long term solution, but we needed to get back quickly below the storage limit that JFrog requested.

FTR, here's the plugins I tested with mvn clean verify using Maven 3.9.9 and Temurin 17.0.14:

  • JiraTestResultReporter-plugin
  • atlassian-bitbucket-server-integration-plugin
  • atlassian-jira-software-cloud-plugin
  • confluence-publisher-plugin
  • jira-plugin

There are a few others that import com.atlassian or io.atlassian (crowd, artifactory, etc) but I didn't test them right now.

@dduportal
Copy link
Contributor

@jglick I agree that's the right mid/long term solution, but we needed to get back quickly below the storage limit that JFrog requested.

FTR, here's the plugins I tested with mvn clean verify using Maven 3.9.9 and Temurin 17.0.14:

* `JiraTestResultReporter-plugin`

* `atlassian-bitbucket-server-integration-plugin`

* `atlassian-jira-software-cloud-plugin`

* `confluence-publisher-plugin`

* `jira-plugin`

There are a few others that import com.atlassian or io.atlassian (crowd, artifactory, etc) but I didn't test them right now.

Do you mind if we add https://github.com/jenkins-infra/repository-permissions-updater/blob/c2cd952d3afb2e808ec0e647221c4ce15d4d9c01/pom.xml#L120-L124 to the list of "sanity checks" (to avoid reproducing #4578)?

@darinpope
Copy link
Collaborator

yeah, that shouldn't be any problem. I see that its already been added. I didn't think about searching through jenkins-infra for usages 🤦

@dduportal
Copy link
Contributor

yeah, that shouldn't be any problem. I see that its already been added. I didn't think about searching through jenkins-infra for usages 🤦

No worries, we also did not check either and it is an healthy reminder! Thanks for the huge work @darinpope !

@darinpope
Copy link
Collaborator

darinpope commented Mar 7, 2025

@dduportal since we seem to have atlassian-pkg-maven-public pretty well dialed in and it's the only one active right now:

Image

would it make sense to delete the other 3 remote repos:

  • atlassian = 109GB
  • atlassian-maven-atlassian-external = 0 bytes
  • private-atlassian = 0 bytes

We're only talking 109GB, but if we don't need it, why keep it around?

@MarkEWaite
Copy link
Author

Makes sense to me to delete them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants