Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which GitHub accont we should/can use for migration? #4

Closed
mocobeta opened this issue Jun 29, 2022 · 43 comments
Closed

Which GitHub accont we should/can use for migration? #4

mocobeta opened this issue Jun 29, 2022 · 43 comments
Assignees
Labels

Comments

@mocobeta
Copy link
Contributor

mocobeta commented Jun 29, 2022

  1. We cannot preserve the original Jira issue/comment authors since GitHub API assumes the caller's account is the author and does not allow callers to change it by any means.

  2. To import/create issues with GItHub API, you need admin access to the repo and we developers are not allowed to have it.
    Actual migration will be done by infra; it seems a personal account was used for the import job when Lucene.NET project migrated their issues to GitHub. See .NETify the public API where appropriate lucenenet#280.

For example, Spring uses an organization account that is not tied to a person (spring-projects/spring-framework#22178). Can we do the same? What organization account is available to us?

@mocobeta mocobeta changed the title Which GitHub accont should we use for migration? Which GitHub accont we should/can use for migration? Jun 29, 2022
@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 2, 2022

Seems ASF organization has a few organization accounts.
https://github.com/orgs/apache/people?query=asf

I'm thinking of this account may perfectly fit for the job - I'll ask infra if we can use it as the author account for migrated issues.
https://github.com/asfgit.

cc @uschindler

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 3, 2022

This is two-pass migration.

  1. First pass: import all issues/comments; this has to be done by an infra (admin) account.

I think we can ask infra to use an official ASF bot account (e.g. https://github.com/asfgit) for the first pass. This means the author of all issues/comments will be the bot.

  1. Second pass: iterate issues/comments and update ones that include cross-issue links so that the links are re-mapped with GitHub issue numbers; this can be done by a committer account.

For the second pass, I would like to do this part ourselves since it is the most time-consuming [1] and risky [2] part of the migration process. This means, my account will be noted in each migrated issue/comment header like "edited by mocobeta". I don't want to use my personal account for our migration purpose, but it's more important to avoid any accidents and if there are accidents, quickly react to them as far as possible; it's a trade-off for me.

[1] It will take > 24h.
[2] It involves lots of REST API calls via the internet with a strict rate limit, and to make matters worse, it is not an idempotent operation.

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 3, 2022

If we must use an ASF bot account throughout all processes, I'd break up the second pass into three sub-steps.

  1. dump all issues/comments again from GitHub
  2. convert them locally
  3. pass the converted result to infra and ask to run the update script

It'll make the total time of the migration longer (maybe 2-3 days to 4-5 days or more) since it involves additional scripts and communication with infra; but it'll make the updating step a bit safer than the current draft plan in #7, and allows us to ask infra to run the second pass with an infra account.

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 3, 2022

If there are no comments/objections, I'll decide how we proceed with that.

@madrob
Copy link

madrob commented Jul 4, 2022

Regarding rate limits, would it be possible to reach out to GitHub directly or through ASF infra to get those temporarily raised?

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 4, 2022

I'm not familiar with the relationship or alliance between ASF and GitHub, but the ASF organization accounts could possibly be already counted as Enterprise Accounts (with a higher limit of 15,000 requests per hour).

I think the difficulty here is that we developers cannot test our migration script with a real ASF account and infra would expect us to provide "tested" scripts. If we mistakenly estimate the throttling interval the script will fall into an unstable status in the middle of processing (maybe one or two hours later after starting), and there is no way to roll back.
Maybe as a possible scenario, we could make the interval tunable and ask infra to set it to an appropriate value, if possible.

Anyway, we'll need infra's help on that. I will open an INFRA issue to ask for advice/information after summing up our draft plan(s).

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 15, 2022

There have been no additional comments/requests.
I decided to use my account for the second pass (updating step after importing) since I don't think we should bother infra with running a time-consuming job that can be done by ourselves.

@mikemccand
Copy link
Member

It involves lots of REST API calls via the internet with a strict rate limit, and to make matters worse, it is not an idempotent operation.

Hmm why is the script NOT idempotent? It remaps a LUCENE-XXXX link to a #YYYY GitHub link right? If you ran it again, wouldn't it have nothing to remap?

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 17, 2022

Hmm why is the script NOT idempotent? It remaps a LUCENE-XXXX link to a #YYYY GitHub link right? If you ran it again, wouldn't it have nothing to remap?

@mikemccand it was not idempotent at that time (multiple same cross-issue links were created if we applied the update script multiple times), but I made it idempotent by this change #16 in exchange for additional steps/time.

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 17, 2022

If you ran it again, wouldn't it have nothing to remap?

It's difficult (or at least cumbersome) to correctly determine if there is already a remapped cross-issue link so it "should not do anything", since a cross-issue link is just a string (#XX) in plain texts. With the current simple text replacing logic, the conversion step should be applied only once.

@vlsi
Copy link

vlsi commented Aug 6, 2022

@mocobeta , can you please clarify why you have decided to do two-pass migration?

Why don't you just import the issues with the proper cross-references in the first place?

In other words:

  1. If you import issues in ascending order, then you can statically predict the generated issue numbers
  2. When doing the import, you can wait till the issue is finished, so all of them are generated in order

Here's an example of an issue imported in a single go: https://github.com/vlsi/tmp-jmeter-issues/issues/1188#issue-1327437303

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 6, 2022

We could predict the issue numbers as you pointed out, but we must prioritize safety. Importing takes 24 hours and if there are short-time GitHub outages (it's not very uncommon), the numbers will be inconsistent with the predicted issue numbers - it'd be a disaster for us. (We can't stop importing for a few errors since the importing is done by another team.)

In addition, we decided to make GitHub issues available before the migration is finished not to interrupt our issue system while preventing people from opening a new Jira issue; this makes it almost impossible to determine the imported issue numbers.

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 6, 2022

There will also be pull requests (we already heavily use it) - @vlsi I'm just curious if there is a way to correctly determine the issue numbers while new issues/PRs are arrived during importing. We won't be able to stop new issues/PRs.

@vlsi
Copy link

vlsi commented Aug 6, 2022

There will also be pull requests (we already heavily use it)

Just in case: I'm doing the migration for Apache JMeter (see https://github.com/vlsi/bugzilla2github).

I'm just curious if there is a way to correctly determine the issue numbers while new issues/PRs are arrived during importing

I do not think so. As far as I know, the issues are allocated sequentially, and the latest assigned number can be fetched via https://api.github.com/repos/apache/lucene/issues?per_page=1 API (see number: 1058).

With JMeter, the flow of issues and PRs is not really high, so we would just make everything read-only during the migration.


issue numbers while new issues/PRs are arrived during importing

I assume you use bulk issue import API (https://gist.github.com/jonmagic/5282384165e0f86ef105), and if you wait for the import to complete, then it does return the assigned issue_number.

However, the key need for pre-assigning the numbers is that you could generate comments that reference "issues that will be created later". If GitHub breaks, you could just continue from what you stopped.
On the other hand, if you really want to allow creating issues and pull requests during the migration, then you can't easily "reference non-yet-created issues" since every PR created will shift the numbers.

@vlsi
Copy link

vlsi commented Aug 6, 2022

We won't be able to stop new issues/PRs

Do you know GitHub has "Temporary interaction limits"?

I think it should be able to prevent creating issues and PRs during the import:
https://docs.github.com/en/communities/moderating-comments-and-conversations/limiting-interactions-in-your-repository#about-temporary-interaction-limits

By the way. I wonder if you tried contacting GitHub support somehow.
Lucene is a really popular project, and it might be that you could somehow get GitHub support, so they help you with migrating the issues. For instance, when LLVM migrated their issues to GitHub, they migrated the issues to a temporary GitLab instance, and then GitHub support used a special migratory tool to migrate the issues from GitLab to GitHub. That enabled LLVM to forge commenters (see https://github.com/llvm/bugzilla2gitlab)

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 6, 2022

Thanks for your suggestions.

I think it should be able to prevent creating issues and PRs during the import:

Personally, I wish we could temporarily stop all new issues/PRs during migration, but our community is unlikely to accept it.

@mikemccand
Copy link
Member

I think it should be able to prevent creating issues and PRs during the import:

Personally, I wish we could temporarily stop all new issues/PRs during migration, but our community is unlikely to accept it.

Actually I think that would be fine. We could VOTE on it, but such down time is warranted if it de-risks the migration.

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 6, 2022

Current two-pass migration is carefully considered and safe, and there is no risk; though I admit it's a bit complicated. I don't think we should change the two-pass migration plan just to shorten the total time a bit?

I don't mean we shouldn't introduce downtime and make it just one pass. But the scripts are already ready and well tested - I think it'd be riskier for us to change it from now.

@mikemccand
Copy link
Member

Yeah I'm not proposing we change this approach now.

I am proposing we mark both GitHub and Jira read-only during the migration. I think the community would agree, since/if can de-risk migration. A two day downtime every 10 years or so seems fine ;)

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 6, 2022

Maybe I should have strongly argued we should allow some downtime (no new issues, PRs and comments for one or two days) so that we make the migration plan more simple.

But if we follow the two-pass migration plan written in #7, we do not need any downtime.

@vlsi
Copy link

vlsi commented Aug 6, 2022

Does the second pass generate GitHub notifications for the users mentioned in the issues?
I think those notifications would be too much

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 6, 2022

Does the second pass generate GitHub notifications for the users mentioned in the issues?

We confirmed the second pass does not cause any notifications. It seems GitHub does not trigger any notifications by issue/comments updating API.

@mikemccand
Copy link
Member

Have we confirmed with INFRA that we can quickly make Jira read-only for just our project (Lucene)?

@mikemccand
Copy link
Member

But if we follow the two-pass migration plan written in #7, we do not need any downtime.

OK, as long as we feel the risks are all contained (because you had planned on keeping both issue trackers accessible during the migration), then let's stick with that plan. I just want to point out that asking the community to freeze one or both issue trackers is fine IMO.

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 6, 2022

Have we confirmed with INFRA that we can quickly make Jira read-only for just our project (Lucene)?

We need to update all Jira issues at the last step. If we want Jira read-only during migration, we have to ask infra extra work: make Jira read-only, make Jira writable after the migration, and lastly make Jira read-only again after adding comments to each issue - personally, I don't think we should pursue this way.

Making a project read-only is not an easy configuration; workflow (or a database record?) needs to be changed I think.
https://community.atlassian.com/t5/Jira-questions/Fastest-way-to-make-JIRA-read-only/qaq-p/1261492

@uschindler
Copy link
Contributor

Just do this live with the Infra team on Slack. They are very cooperative and fix stuff in realtime. Just make some appointment with them and you can work together with them. We did this several times with migration on Jenkins. They are there to help!

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 6, 2022

Yes, but - making a Jira project read-only, writable, and read-only again would not be a quick fix I think? I'm missing something maybe?

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 6, 2022

I'll reach INFRA, but If possible, I would like to proceed with the whole process in async-style without real-time conversation using Slack - there is a time difference between me and US/Europe people. It's especially critical if we do the migration on business days... (I also have a daytime job.)

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 6, 2022

Don't get me wrong - I appreciate suggestions from all of you. However, there are many things you can easily control but I can't due to several differences (timezone, language fluency, etc.) I have to proceed with this project with my very limited resources and ability if there is no person who is willing to take over this work.

Please feel free to pick any tasks if you see I'm doing them badly.

@uschindler
Copy link
Contributor

Hi mocobeta,
I can also look into stuff if it is going on. On whcih machine do you want to start the meigration? I'd suggest to do this on some server at ASF (like the Jenkins Slave lucene1/lucene2).

Regarding read only: Don't misunderstand me. I just wanted to say: Let's make JIRA and Github Readonly for outsiders. Switching this is easy by chaning the permission scheme on the Project, that's two mouseclicks. We can'T do it ourselves, but I'd really want to enforce this.

I don't think it is a problem to prevent people from opening issues! We just put a message there like "we have some maintenance, you can't open issues at the moment. Please send a message to dev@licene.apache.org, we will take care to create a new issue once all systems are backup and running."

@uschindler
Copy link
Contributor

And doing something like changing permission scheme can be done with communication on Slack. That's all. I did this several times.

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 7, 2022

I can also look into stuff if it is going on. On whcih machine do you want to start the meigration? I'd suggest to do this on some server at ASF (like the Jenkins Slave lucene1/lucene2).

Sorry I don't know - I'm not familiar with our ASF infrastructure, but I think the infra team selects a proper machine for the job. Anyway we can't run the import script ourselves.

Regarding read only: Don't misunderstand me. I just wanted to say: Let's make JIRA and Github Readonly for outsiders. Switching this is easy by chaning the permission scheme on the Project, that's two mouseclicks. We can'T do it ourselves, but I'd really want to enforce this.
I don't think it is a problem to prevent people from opening issues! We just put a message there like "we have some maintenance, you can't open issues at the moment. Please send a message to dev@licene.apache.org, we will take care to create a new issue once all systems are backup and running."

Thanks, it'd be great if we make them completely non-writable. I think it'd be fine for Jira side, maybe people wouldn't care much about it.
For GitHub side, I think we'll need to stop all PRs/reviews as well as issues on GitHub for two or three days ("PR"s are special issues in GitHub). Many people would never imagine it. I have to explain why this is needed on the dev@ list - I'll post an email on it later.

We could make Jira/GitHub not writable only for external contributors with fine-grained access control, but it'd be confusing- I think we should ensure to prevent everyone including committers from opening issues/PRs/adding comments for both Jira and GitHub, if we are going to introduce a downtime.

And doing something like changing permission scheme can be done with communication on Slack. That's all. I did this several times.

Of course, I can use ASF Slack for asynchronous communication. I'll reach the infra team and consult on how to arrange the whole process considering the time difference.

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 7, 2022

I recognize strong suggestions to make both Jira and GitHub non-writable during migration. Generally, I agree with it - stopping all activities for two or three days would not be a big deal (hope there are no objections since I've seen at least one request not to stop our issue system).

Allowing downtime, the migration plan will be:

  1. Make both Jira and GitHub read-only for everyone including committers except for the account we use in the migration. Make sure no one can open issues, pull requests, or add comments to existing issues/PRs.
  2. Start migration.
  3. Make GitHub issues/PRs available once the migration is completed.
  4. Make Jira writable (only for committers or me, if possible).
  5. Add each Jira issue a comment saying "This wad moved to GitHub "
  6. Make Jira read-only.

For details, I'll reach infra by Jira or Slack.

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 7, 2022

I sent an email on the dev@ list to share the change in the migration (~72 hours downtime for issues/PRs). I'd be glad if you give further suggestions there.
If there are no strong responses, I will open an INFRA issue to explain our plans/requests. We are still in work-in-progress here, but there are lots of uncertainties in access control in Jira and GitHub to me; I need to clarify a couple of questions on that.

@vlsi
Copy link

vlsi commented Aug 7, 2022

This wad moved to GitHub 

Are you going to include the link to GitHub issue?
Would that trigger JIRA notifications?

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 7, 2022

Are you going to include the link to GitHub issue?

Yes, this is the main purpose of leaving a comment to each Jira. The message will be something like This was moved to GitHub. [URL]

Would that trigger JIRA notifications?

We plan to completely silence JIRA notifications. If this can't be done ourselves, we'd need to ask infra.

@mikemccand
Copy link
Member

This was moved to GitHub. [URL]

Could we make it look something like this:

This issue was moved to GitHub issue #517.

I.e. embed the link in the Jira comment?

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 7, 2022

It's just a hyperlink, we can use any anchor string.

In Jira syntax it would be:

This issue was moved to GitHub issue [#517|https://link-to-github-issue/].

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 7, 2022

Refined the comment message for Jira. See:
https://issues.apache.org/jira/browse/LUCENE-10557?focusedCommentId=17576357&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17576357

Screenshot from 2022-08-07 19-48-19

commit: 3ad68da

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 9, 2022

Hi,
here is the INFRA issue.
https://issues.apache.org/jira/browse/INFRA-23563

Can you please watch the issue, and give comments if needed? I tried to explain our intricate requests but am not so confident that I'm doing well it.

@mikemccand
Copy link
Member

Thanks @mocobeta!

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 18, 2022

Infra created this account for issue migration purposes. https://github.com/asfimport.
The profile icon will be set later.

@mocobeta
Copy link
Contributor Author

mocobeta commented Aug 18, 2022

I tested it works.
https://github.com/apache/lucene/issues/1072

I'm closing this. Thank you everyone who gave comments on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants