Which GitHub accont we should/can use for migration? #4

mocobeta · 2022-06-29T16:29:44Z

We cannot preserve the original Jira issue/comment authors since GitHub API assumes the caller's account is the author and does not allow callers to change it by any means.
To import/create issues with GItHub API, you need admin access to the repo and we developers are not allowed to have it.
Actual migration will be done by infra; it seems a personal account was used for the import job when Lucene.NET project migrated their issues to GitHub. See .NETify the public API where appropriate lucenenet#280.

For example, Spring uses an organization account that is not tied to a person (spring-projects/spring-framework#22178). Can we do the same? What organization account is available to us?

mocobeta · 2022-07-02T03:10:27Z

Seems ASF organization has a few organization accounts.
https://github.com/orgs/apache/people?query=asf

I'm thinking of this account may perfectly fit for the job - I'll ask infra if we can use it as the author account for migrated issues.
https://github.com/asfgit.

cc @uschindler

mocobeta · 2022-07-03T06:18:38Z

This is two-pass migration.

First pass: import all issues/comments; this has to be done by an infra (admin) account.

I think we can ask infra to use an official ASF bot account (e.g. https://github.com/asfgit) for the first pass. This means the author of all issues/comments will be the bot.

Second pass: iterate issues/comments and update ones that include cross-issue links so that the links are re-mapped with GitHub issue numbers; this can be done by a committer account.

For the second pass, I would like to do this part ourselves since it is the most time-consuming [1] and risky [2] part of the migration process. This means, my account will be noted in each migrated issue/comment header like "edited by mocobeta". I don't want to use my personal account for our migration purpose, but it's more important to avoid any accidents and if there are accidents, quickly react to them as far as possible; it's a trade-off for me.

[1] It will take > 24h.
[2] It involves lots of REST API calls via the internet with a strict rate limit, and to make matters worse, it is not an idempotent operation.

mocobeta · 2022-07-03T06:38:19Z

If we must use an ASF bot account throughout all processes, I'd break up the second pass into three sub-steps.

dump all issues/comments again from GitHub
convert them locally
pass the converted result to infra and ask to run the update script

It'll make the total time of the migration longer (maybe 2-3 days to 4-5 days or more) since it involves additional scripts and communication with infra; but it'll make the updating step a bit safer than the current draft plan in #7, and allows us to ask infra to run the second pass with an infra account.

mocobeta · 2022-07-03T06:57:42Z

If there are no comments/objections, I'll decide how we proceed with that.

madrob · 2022-07-04T03:29:43Z

Regarding rate limits, would it be possible to reach out to GitHub directly or through ASF infra to get those temporarily raised?

mocobeta · 2022-07-04T06:59:51Z

I'm not familiar with the relationship or alliance between ASF and GitHub, but the ASF organization accounts could possibly be already counted as Enterprise Accounts (with a higher limit of 15,000 requests per hour).

I think the difficulty here is that we developers cannot test our migration script with a real ASF account and infra would expect us to provide "tested" scripts. If we mistakenly estimate the throttling interval the script will fall into an unstable status in the middle of processing (maybe one or two hours later after starting), and there is no way to roll back.
Maybe as a possible scenario, we could make the interval tunable and ask infra to set it to an appropriate value, if possible.

Anyway, we'll need infra's help on that. I will open an INFRA issue to ask for advice/information after summing up our draft plan(s).

mocobeta · 2022-07-15T12:40:41Z

There have been no additional comments/requests.
I decided to use my account for the second pass (updating step after importing) since I don't think we should bother infra with running a time-consuming job that can be done by ourselves.

mikemccand · 2022-07-17T10:33:26Z

It involves lots of REST API calls via the internet with a strict rate limit, and to make matters worse, it is not an idempotent operation.

Hmm why is the script NOT idempotent? It remaps a LUCENE-XXXX link to a #YYYY GitHub link right? If you ran it again, wouldn't it have nothing to remap?

mocobeta · 2022-07-17T10:55:12Z

Hmm why is the script NOT idempotent? It remaps a LUCENE-XXXX link to a #YYYY GitHub link right? If you ran it again, wouldn't it have nothing to remap?

@mikemccand it was not idempotent at that time (multiple same cross-issue links were created if we applied the update script multiple times), but I made it idempotent by this change #16 in exchange for additional steps/time.

mocobeta · 2022-07-17T11:09:04Z

If you ran it again, wouldn't it have nothing to remap?

It's difficult (or at least cumbersome) to correctly determine if there is already a remapped cross-issue link so it "should not do anything", since a cross-issue link is just a string (#XX) in plain texts. With the current simple text replacing logic, the conversion step should be applied only once.

vlsi · 2022-08-06T09:06:04Z

@mocobeta , can you please clarify why you have decided to do two-pass migration?

Why don't you just import the issues with the proper cross-references in the first place?

In other words:

If you import issues in ascending order, then you can statically predict the generated issue numbers
When doing the import, you can wait till the issue is finished, so all of them are generated in order

Here's an example of an issue imported in a single go: https://github.com/vlsi/tmp-jmeter-issues/issues/1188#issue-1327437303

mocobeta · 2022-08-06T09:31:37Z

We could predict the issue numbers as you pointed out, but we must prioritize safety. Importing takes 24 hours and if there are short-time GitHub outages (it's not very uncommon), the numbers will be inconsistent with the predicted issue numbers - it'd be a disaster for us. (We can't stop importing for a few errors since the importing is done by another team.)

In addition, we decided to make GitHub issues available before the migration is finished not to interrupt our issue system while preventing people from opening a new Jira issue; this makes it almost impossible to determine the imported issue numbers.

mocobeta · 2022-08-06T09:51:36Z

There will also be pull requests (we already heavily use it) - @vlsi I'm just curious if there is a way to correctly determine the issue numbers while new issues/PRs are arrived during importing. We won't be able to stop new issues/PRs.

vlsi · 2022-08-06T09:59:58Z

There will also be pull requests (we already heavily use it)

Just in case: I'm doing the migration for Apache JMeter (see https://github.com/vlsi/bugzilla2github).

I'm just curious if there is a way to correctly determine the issue numbers while new issues/PRs are arrived during importing

I do not think so. As far as I know, the issues are allocated sequentially, and the latest assigned number can be fetched via https://api.github.com/repos/apache/lucene/issues?per_page=1 API (see number: 1058).

With JMeter, the flow of issues and PRs is not really high, so we would just make everything read-only during the migration.

issue numbers while new issues/PRs are arrived during importing

I assume you use bulk issue import API (https://gist.github.com/jonmagic/5282384165e0f86ef105), and if you wait for the import to complete, then it does return the assigned issue_number.

However, the key need for pre-assigning the numbers is that you could generate comments that reference "issues that will be created later". If GitHub breaks, you could just continue from what you stopped.
On the other hand, if you really want to allow creating issues and pull requests during the migration, then you can't easily "reference non-yet-created issues" since every PR created will shift the numbers.

vlsi · 2022-08-06T10:04:11Z

We won't be able to stop new issues/PRs

Do you know GitHub has "Temporary interaction limits"?

I think it should be able to prevent creating issues and PRs during the import:
https://docs.github.com/en/communities/moderating-comments-and-conversations/limiting-interactions-in-your-repository#about-temporary-interaction-limits

By the way. I wonder if you tried contacting GitHub support somehow.
Lucene is a really popular project, and it might be that you could somehow get GitHub support, so they help you with migrating the issues. For instance, when LLVM migrated their issues to GitHub, they migrated the issues to a temporary GitLab instance, and then GitHub support used a special migratory tool to migrate the issues from GitLab to GitHub. That enabled LLVM to forge commenters (see https://github.com/llvm/bugzilla2gitlab)

mocobeta · 2022-08-06T10:07:43Z

Thanks for your suggestions.

I think it should be able to prevent creating issues and PRs during the import:

Personally, I wish we could temporarily stop all new issues/PRs during migration, but our community is unlikely to accept it.

mikemccand · 2022-08-06T10:41:35Z

I think it should be able to prevent creating issues and PRs during the import:

Personally, I wish we could temporarily stop all new issues/PRs during migration, but our community is unlikely to accept it.

Actually I think that would be fine. We could VOTE on it, but such down time is warranted if it de-risks the migration.

mocobeta · 2022-08-06T10:50:17Z

Current two-pass migration is carefully considered and safe, and there is no risk; though I admit it's a bit complicated. I don't think we should change the two-pass migration plan just to shorten the total time a bit?

I don't mean we shouldn't introduce downtime and make it just one pass. But the scripts are already ready and well tested - I think it'd be riskier for us to change it from now.

mikemccand · 2022-08-06T15:39:51Z

Yeah I'm not proposing we change this approach now.

I am proposing we mark both GitHub and Jira read-only during the migration. I think the community would agree, since/if can de-risk migration. A two day downtime every 10 years or so seems fine ;)

mocobeta · 2022-08-06T15:58:34Z

Maybe I should have strongly argued we should allow some downtime (no new issues, PRs and comments for one or two days) so that we make the migration plan more simple.

But if we follow the two-pass migration plan written in #7, we do not need any downtime.

vlsi · 2022-08-06T16:20:36Z

Does the second pass generate GitHub notifications for the users mentioned in the issues?
I think those notifications would be too much

mocobeta · 2022-08-06T16:24:06Z

Does the second pass generate GitHub notifications for the users mentioned in the issues?

We confirmed the second pass does not cause any notifications. It seems GitHub does not trigger any notifications by issue/comments updating API.

mikemccand · 2022-08-06T16:31:22Z

Have we confirmed with INFRA that we can quickly make Jira read-only for just our project (Lucene)?

mikemccand · 2022-08-06T16:32:21Z

But if we follow the two-pass migration plan written in #7, we do not need any downtime.

OK, as long as we feel the risks are all contained (because you had planned on keeping both issue trackers accessible during the migration), then let's stick with that plan. I just want to point out that asking the community to freeze one or both issue trackers is fine IMO.

mocobeta · 2022-08-06T16:37:26Z

Have we confirmed with INFRA that we can quickly make Jira read-only for just our project (Lucene)?

We need to update all Jira issues at the last step. If we want Jira read-only during migration, we have to ask infra extra work: make Jira read-only, make Jira writable after the migration, and lastly make Jira read-only again after adding comments to each issue - personally, I don't think we should pursue this way.

Making a project read-only is not an easy configuration; workflow (or a database record?) needs to be changed I think.
https://community.atlassian.com/t5/Jira-questions/Fastest-way-to-make-JIRA-read-only/qaq-p/1261492

uschindler · 2022-08-06T18:17:47Z

Just do this live with the Infra team on Slack. They are very cooperative and fix stuff in realtime. Just make some appointment with them and you can work together with them. We did this several times with migration on Jenkins. They are there to help!

mocobeta · 2022-08-06T18:31:43Z

Yes, but - making a Jira project read-only, writable, and read-only again would not be a quick fix I think? I'm missing something maybe?

mocobeta · 2022-08-06T18:39:56Z

I'll reach INFRA, but If possible, I would like to proceed with the whole process in async-style without real-time conversation using Slack - there is a time difference between me and US/Europe people. It's especially critical if we do the migration on business days... (I also have a daytime job.)

mocobeta · 2022-08-06T19:51:02Z

Don't get me wrong - I appreciate suggestions from all of you. However, there are many things you can easily control but I can't due to several differences (timezone, language fluency, etc.) I have to proceed with this project with my very limited resources and ability if there is no person who is willing to take over this work.

Please feel free to pick any tasks if you see I'm doing them badly.

uschindler · 2022-08-06T21:58:15Z

Hi mocobeta,
I can also look into stuff if it is going on. On whcih machine do you want to start the meigration? I'd suggest to do this on some server at ASF (like the Jenkins Slave lucene1/lucene2).

Regarding read only: Don't misunderstand me. I just wanted to say: Let's make JIRA and Github Readonly for outsiders. Switching this is easy by chaning the permission scheme on the Project, that's two mouseclicks. We can'T do it ourselves, but I'd really want to enforce this.

I don't think it is a problem to prevent people from opening issues! We just put a message there like "we have some maintenance, you can't open issues at the moment. Please send a message to dev@licene.apache.org, we will take care to create a new issue once all systems are backup and running."

uschindler · 2022-08-06T21:59:42Z

And doing something like changing permission scheme can be done with communication on Slack. That's all. I did this several times.

mocobeta · 2022-08-07T03:00:48Z

I can also look into stuff if it is going on. On whcih machine do you want to start the meigration? I'd suggest to do this on some server at ASF (like the Jenkins Slave lucene1/lucene2).

Sorry I don't know - I'm not familiar with our ASF infrastructure, but I think the infra team selects a proper machine for the job. Anyway we can't run the import script ourselves.

Regarding read only: Don't misunderstand me. I just wanted to say: Let's make JIRA and Github Readonly for outsiders. Switching this is easy by chaning the permission scheme on the Project, that's two mouseclicks. We can'T do it ourselves, but I'd really want to enforce this.
I don't think it is a problem to prevent people from opening issues! We just put a message there like "we have some maintenance, you can't open issues at the moment. Please send a message to dev@licene.apache.org, we will take care to create a new issue once all systems are backup and running."

Thanks, it'd be great if we make them completely non-writable. I think it'd be fine for Jira side, maybe people wouldn't care much about it.
For GitHub side, I think we'll need to stop all PRs/reviews as well as issues on GitHub for two or three days ("PR"s are special issues in GitHub). Many people would never imagine it. I have to explain why this is needed on the dev@ list - I'll post an email on it later.

We could make Jira/GitHub not writable only for external contributors with fine-grained access control, but it'd be confusing- I think we should ensure to prevent everyone including committers from opening issues/PRs/adding comments for both Jira and GitHub, if we are going to introduce a downtime.

And doing something like changing permission scheme can be done with communication on Slack. That's all. I did this several times.

Of course, I can use ASF Slack for asynchronous communication. I'll reach the infra team and consult on how to arrange the whole process considering the time difference.

mocobeta · 2022-08-07T03:27:26Z

I recognize strong suggestions to make both Jira and GitHub non-writable during migration. Generally, I agree with it - stopping all activities for two or three days would not be a big deal (hope there are no objections since I've seen at least one request not to stop our issue system).

Allowing downtime, the migration plan will be:

Make both Jira and GitHub read-only for everyone including committers except for the account we use in the migration. Make sure no one can open issues, pull requests, or add comments to existing issues/PRs.
Start migration.
Make GitHub issues/PRs available once the migration is completed.
Make Jira writable (only for committers or me, if possible).
Add each Jira issue a comment saying "This wad moved to GitHub "
Make Jira read-only.

For details, I'll reach infra by Jira or Slack.

mocobeta · 2022-08-07T05:31:33Z

I sent an email on the dev@ list to share the change in the migration (~72 hours downtime for issues/PRs). I'd be glad if you give further suggestions there.
If there are no strong responses, I will open an INFRA issue to explain our plans/requests. We are still in work-in-progress here, but there are lots of uncertainties in access control in Jira and GitHub to me; I need to clarify a couple of questions on that.

vlsi · 2022-08-07T05:39:17Z

This wad moved to GitHub

Are you going to include the link to GitHub issue?
Would that trigger JIRA notifications?

mocobeta · 2022-08-07T05:42:52Z

Are you going to include the link to GitHub issue?

Yes, this is the main purpose of leaving a comment to each Jira. The message will be something like This was moved to GitHub. [URL]

Would that trigger JIRA notifications?

We plan to completely silence JIRA notifications. If this can't be done ourselves, we'd need to ask infra.

mikemccand · 2022-08-07T10:27:56Z

This was moved to GitHub. [URL]

Could we make it look something like this:

This issue was moved to GitHub issue #517.

I.e. embed the link in the Jira comment?

mocobeta · 2022-08-07T10:33:20Z

It's just a hyperlink, we can use any anchor string.

In Jira syntax it would be:

This issue was moved to GitHub issue [#517|https://link-to-github-issue/].

mocobeta · 2022-08-07T10:51:34Z

Refined the comment message for Jira. See:
https://issues.apache.org/jira/browse/LUCENE-10557?focusedCommentId=17576357&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17576357

commit: 3ad68da

mocobeta · 2022-08-09T09:41:38Z

Hi,
here is the INFRA issue.
https://issues.apache.org/jira/browse/INFRA-23563

Can you please watch the issue, and give comments if needed? I tried to explain our intricate requests but am not so confident that I'm doing well it.

mikemccand · 2022-08-09T14:49:54Z

Thanks @mocobeta!

mocobeta · 2022-08-18T07:44:53Z

Infra created this account for issue migration purposes. https://github.com/asfimport.
The profile icon will be set later.

mocobeta · 2022-08-18T07:49:52Z

I tested it works.
https://github.com/apache/lucene/issues/1072

I'm closing this. Thank you everyone who gave comments on this.

mocobeta added migration discuss labels Jun 29, 2022

mocobeta changed the title ~~Which GitHub accont should we use for migration?~~ Which GitHub accont we should/can use for migration? Jun 29, 2022

mocobeta removed the migration label Jun 29, 2022

mocobeta self-assigned this Jul 10, 2022

mocobeta mentioned this issue Jul 14, 2022

Add a tool to generate account mapping #34

Merged

mocobeta closed this as completed Aug 18, 2022

Which GitHub accont we should/can use for migration? #4

Which GitHub accont we should/can use for migration? #4

Comments

mocobeta commented Jun 29, 2022 • edited Loading

mocobeta commented Jul 2, 2022 • edited Loading

mocobeta commented Jul 3, 2022 • edited Loading

mocobeta commented Jul 3, 2022 • edited Loading

mocobeta commented Jul 3, 2022

madrob commented Jul 4, 2022

mocobeta commented Jul 4, 2022 • edited Loading

mocobeta commented Jul 15, 2022 • edited Loading

mikemccand commented Jul 17, 2022

mocobeta commented Jul 17, 2022 • edited Loading

mocobeta commented Jul 17, 2022 • edited Loading

vlsi commented Aug 6, 2022

mocobeta commented Aug 6, 2022 • edited Loading

mocobeta commented Aug 6, 2022

vlsi commented Aug 6, 2022

vlsi commented Aug 6, 2022

mocobeta commented Aug 6, 2022

mikemccand commented Aug 6, 2022

mocobeta commented Aug 6, 2022 • edited Loading

mikemccand commented Aug 6, 2022

mocobeta commented Aug 6, 2022

vlsi commented Aug 6, 2022

mocobeta commented Aug 6, 2022

mikemccand commented Aug 6, 2022

mikemccand commented Aug 6, 2022

mocobeta commented Aug 6, 2022 • edited Loading

uschindler commented Aug 6, 2022

mocobeta commented Aug 6, 2022 • edited Loading

mocobeta commented Aug 6, 2022 • edited Loading

mocobeta commented Aug 6, 2022

uschindler commented Aug 6, 2022

uschindler commented Aug 6, 2022

mocobeta commented Aug 7, 2022 • edited Loading

mocobeta commented Aug 7, 2022 • edited Loading

mocobeta commented Aug 7, 2022

vlsi commented Aug 7, 2022

mocobeta commented Aug 7, 2022

mikemccand commented Aug 7, 2022

mocobeta commented Aug 7, 2022

mocobeta commented Aug 7, 2022

mocobeta commented Aug 9, 2022

mikemccand commented Aug 9, 2022

mocobeta commented Aug 18, 2022 • edited Loading

mocobeta commented Aug 18, 2022 • edited Loading

mocobeta commented Jun 29, 2022 •

edited

Loading

mocobeta commented Jul 2, 2022 •

edited

Loading

mocobeta commented Jul 3, 2022 •

edited

Loading

mocobeta commented Jul 3, 2022 •

edited

Loading

mocobeta commented Jul 4, 2022 •

edited

Loading

mocobeta commented Jul 15, 2022 •

edited

Loading

mocobeta commented Jul 17, 2022 •

edited

Loading

mocobeta commented Jul 17, 2022 •

edited

Loading

mocobeta commented Aug 6, 2022 •

edited

Loading

mocobeta commented Aug 6, 2022 •

edited

Loading

mocobeta commented Aug 6, 2022 •

edited

Loading

mocobeta commented Aug 6, 2022 •

edited

Loading

mocobeta commented Aug 6, 2022 •

edited

Loading

mocobeta commented Aug 7, 2022 •

edited

Loading

mocobeta commented Aug 7, 2022 •

edited

Loading

mocobeta commented Aug 18, 2022 •

edited

Loading

mocobeta commented Aug 18, 2022 •

edited

Loading