Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GitHub repository cloning #63

Merged
merged 2 commits into from
Aug 19, 2022

Conversation

mcdonnnj
Copy link
Contributor

This pull request fixes GitHub repository cloning so that it is functional again.

Per this blog post, and this blog post for background, GitHub has disabled the unencrypted Git protocol for cloning repositories. Since this project was not updated and is still using git:// URLs to clone GitHub repositories it is unable to correctly work with them. Additionally it silently fails on this issue and unless SLOC/labor_hours are checked it runs with no error output.

The repository cloning bug is resolved by using the repository.clone_url attribute to get an HTTPS URL to use for cloning. Additionally the return code of commands run with the scraper.util.execute() function is checked and an error message is logged if it is not 0. Another logging message was changed from logging.debug() to logging.error() to reflect that a failure there impacts core functionality.

Running against the main branch:

$ scraper --config scraper.json
2022-08-19 14:18:20,668 - INFO: Connected to: https://github.com
2022-08-19 14:18:21,038 - INFO: Processing: cisagov/pshtt
2022-08-19 14:20:34,240 - INFO: SLOC: 0
2022-08-19 14:20:34,240 - INFO: labor_hours: 0
2022-08-19 14:20:34,707 - INFO: Number of Projects: 1
2022-08-19 14:20:34,707 - INFO: Writing output to: code.json

Running against this branch without the repository URL fix:

$ scraper --config scraper.json
2022-08-19 14:21:06,586 - INFO: Connected to: https://github.com
2022-08-19 14:21:06,990 - INFO: Processing: cisagov/pshtt
2022-08-19 14:23:17,839 - ERROR: Error Executing: command=git clone --depth=1 git://github.com/cisagov/pshtt.git /tmp/tmpqsep80l1/clone-dir, returncode=128
2022-08-19 14:23:18,041 - ERROR: Error Decoding: url=git://github.com/cisagov/pshtt.git, out=b'\n1 error:\nUnable to read:  /tmp/tmpqsep80l1/clone-dir\n'
2022-08-19 14:23:18,042 - INFO: SLOC: 0
2022-08-19 14:23:18,042 - INFO: labor_hours: 0
2022-08-19 14:23:18,471 - INFO: Number of Projects: 1
2022-08-19 14:23:18,473 - INFO: Writing output to: code.json

Running against this branch:

$ scraper --config scraper.json
2022-08-19 14:18:00,587 - INFO: Connected to: https://github.com
2022-08-19 14:18:00,979 - INFO: Processing: cisagov/pshtt
2022-08-19 14:18:01,821 - INFO: SLOC: 2380
2022-08-19 14:18:01,822 - INFO: labor_hours: 1159
2022-08-19 14:18:02,208 - INFO: Number of Projects: 1
2022-08-19 14:18:02,208 - INFO: Writing output to: code.json

GitHub deprecated use of the unencrypted Git protocol this year per:
https://github.blog/2021-09-01-improving-git-protocol-security-github/#no-more-unauthenticated-git
This results in a quiet failure when cloning GitHub repositories. The
clone_url property of a GitHub Repository object contains the HTTPS URL
that can be used to clone a repository and will likely continue to be
the source for a valid URL to clone a GitHub repository successfully.
Add a check of the returncode of the command executed in
scraper.util.execute() and output an error message if it is not zero.
Additionally change the logging level from DEBUG to ERROR for failures
to process the JSON output from cloc. These combined will make it more
clear when failures in core functionality are occurring.
@IanLee1521 IanLee1521 self-requested a review August 19, 2022 18:35
@IanLee1521 IanLee1521 merged commit 37bfdd3 into LLNL:main Aug 19, 2022
@IanLee1521
Copy link
Member

Thanks @mcdonnnj !

@mcdonnnj mcdonnnj deleted the bugfix/fix_github_cloning branch August 19, 2022 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants