Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a tool to generate account mapping #34

Merged
merged 15 commits into from
Jul 15, 2022
Merged

Add a tool to generate account mapping #34

merged 15 commits into from
Jul 15, 2022

Conversation

mocobeta
Copy link
Contributor

@mocobeta mocobeta commented Jul 11, 2022

#3

This adds a helper tool to create a Jira user - GitHub account mapping file; this is used in "Convert Jira issues to GitHub issues" step.

We could do this a bit smarter way, I would start with this...

NOTE: there are 2200+ committers and contributors in Jira (this number includes duplication since some people seem to have multiple Jira accounts).

$ wc -l work/jira-users.csv 
2266 work/jira-users.csv

@mocobeta
Copy link
Contributor Author

FYI @mikemccand @dweiss
I will keep this open for a while and do some more extensive tests on that (this is a helper tool that should not block/conflict with the main scripts). If you have suggestions for generating account mapping, please review this when you have some time. I think there is room to improve in this simplistic approach.

@mikemccand
Copy link
Member

Thanks @mocobeta! I was wondering what to pass as the account mapping as I ran the tooling ;) Today all of my migrated issues are all commented / opened by mikemccand lol.

Given how important this mapping file is, maybe we should 1) commit this PR and further iterate on it in future PRs, and 2) commit the mapping file, so all of us can scrutinize it, maybe correct / insert our own mapping, etc.? Once we do the migration, the mapping is burned into the GitHub issues so we really want to try to account for everyone. So we should treat this file as a vital source code I think?

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 14, 2022

Today all of my migrated issues are all commented / opened by mikemccand lol.

As for the "author" of each GitHub issue/comments, we won't be able to preserve/migrate the original Jira author. The author will be the caller's account. Please see #4 for the details.

In short, the author for all issues/comments will be an Infra's account.

@mikemccand
Copy link
Member

As for the "author" of each GitHub issue/comments, we won't be able to preserve/migrate the original Jira author. The author will be the caller's account. Please see #4 for the details.

In short, the author for all issues/comments will be an Infra's account.

Ahh OK got it.

So it's the @-calls inside issues that we will replace with the corresponding github id?

If we do check-in the account mapping file, I suggest we break it into two sections: unverified and verified. This tool will put them all as unverified to start? And those of us that "know" (or just for our own mapping) we can commit a change to move it to the verified section?

@mocobeta
Copy link
Contributor Author

Yes I'll run the tool and commit a result file.

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 15, 2022

I committed a candidate mapping file (without any manual checks/editing). e336bdc

grep "yes$" mappings-data/account-map.csv.20220714.234825

would effectively extract committers' accounts.

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 15, 2022

As other possible clues, we could

  • list ASF GitHub organization member accounts and infer committers' accounts that cannot be detected by display/full names
    • about 20 committers still do not set their GitHub name to the same string in Jira full name (or perhaps do not have a GitHub account)
  • list authors' e-mail addresses from the whole commit history and compare them with candidate GitHub user's e-mail addresses
    • this would be useful for manual check/disambiguation
    • this is effective only if the contributor makes public his/her e-mail address on GitHub
  • get all commits by GitHub Commit API and use author field
    • this would be useful for manual check/disambiguation

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 15, 2022

This is the list of accounts that have push access to apache/lucene (i.e., committers' accounts)
b04318e
70 accounts are detected.

There are 95 committers in total according to this page, so 25 people do not associate their GitHub accounts with ASF/Jira accounts.
https://projects.apache.org/committee.html?lucene

This means we can't make an assumption that "committers' github accounts have push access to apache/lucene repo in GitHub"; although they should have write access on Apache's GitBox repo I think.
e.g., https://github.com/ChrisHegarty does not belong to ASF GitHub organization and then does not have push access to apache/lucene - maybe there could be something wrong in the onboarding procedure if it is not intentional.

@mocobeta
Copy link
Contributor Author

mocobeta commented Jul 15, 2022

For verification, I'll do

  1. Check if the candidate github account has push access on apache/lucene repo.
  2. Check if the candidate github account has been logged as "author" in the commit history at least once.

For accounts that do not satisfy the above criteria, I would just omit them.

There should be some false negatives (for example, Jira issue reporters are omitted if their possible GitHub accounts were not logged in the commit history). I'd put priority on avoiding false positives.

@mikemccand
Copy link
Member

Wow, the mapping file is massive! 5,793 developers. We've had so many contributors over the years ;) Inspiring.

I'd put priority on avoiding false positives.

+1

@mocobeta
Copy link
Contributor Author

Here's the re-taken candidate and verified (with the above criteria) mapping.
b44bd73

  • 5792 candidate mapping
  • 163 verified mapping

@mocobeta mocobeta merged commit 580bf2f into main Jul 15, 2022
@mocobeta mocobeta deleted the make-account-map branch July 15, 2022 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants