forked from kubeflow/pipelines
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Moving go license tools from KFP repo (kubeflow#540)
* Moving go license tools from KFP repo * Move go license tools to py folder * Add __init__.py file * Update setup.py * Fix pylint format * Fix lint errors
- Loading branch information
1 parent
b0ec604
commit e94be0c
Showing
7 changed files
with
539 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# CLI tools to fetch go library's license info | ||
|
||
## Why we need this? | ||
|
||
When we release go library images (can be considered as redistributing third | ||
party binary). | ||
|
||
We need to do the following to be compliant: | ||
* Put license declarations in the image for licences of all dependencies and transistive dependencies. | ||
* Mirror source code in the image for code with MPL, EPL, GPL or CDDL licenses. | ||
|
||
It's not an easy task to get license of all (transitive) dependencies of a go | ||
library. Thus, we need these tools to automate this task. | ||
|
||
## How to get all dependencies with license and source code? | ||
|
||
1. Install CLI tools here: `python setup.py install` | ||
1. Collect dependencies + transitive dependencies in a go library. Put them together in a text file called `dep.txt`. Format: each line has a library name. The library name should be a valid golang import module name. | ||
|
||
Example ways to get it: | ||
* argo uses gopkg for package management. It has a [Gopkg.lock file](https://github.com/argoproj/argo/blob/master/Gopkg.lock) | ||
with all of its dependencies and transitive dependencies. All the name fields in this file is what we need. You can run `parse-toml-dep` to parse it. | ||
* minio uses [official go modules](https://blog.golang.org/using-go-modules), there's a [go.mod file](https://github.com/minio/minio/blob/master/go.mod) describing its direct dependencies. Run command `go list -m all` to get final versions that will be used in a build for all direct and indirect dependencies, [reference](https://github.com/golang/go/wiki/Modules#daily-workflow). Parse its output to make a file we need. | ||
|
||
Reminder: don't forget to put the library itself into `dep.txt`. | ||
1. Run `get-github-repo` to resolve github repos of golang imports. Not all | ||
imports can be figured out by my script, needs manual help for <2% of libraries. | ||
|
||
For a library we cannot resolve, manually put it in `dep-repo-mapping.manual.csv`, so the tool knows how to find its github repo the next time. | ||
|
||
Defaults to read dependencies from `dep.txt` and writes to `repo.txt`. | ||
1. Run `get-github-license-info` to crawl github license info of these libraries. (Not all repos have github recognizable license, needs manual help for <2% of libraries) | ||
|
||
Defaults to read repos from `repo.txt` and writes to `license-info.csv`. You | ||
need to configure github personal access token because it sends a lot of | ||
requests to github. Follow instructions in `get-github-license-info -h`. | ||
|
||
For repos that fails to fetch license, it's usually because their github repo | ||
doesn't have a github understandable license file. Check its readme and | ||
update correct info into `license-info.csv`. (Usually, use its README file which mentions license.) | ||
1. Edit license info file. Manually check the license file for all repos with a license categorized as "Other" by github. Figure out their true license names. | ||
1. Run `concatenate-license` to crawl full text license files for all dependencies and concat them into one file. | ||
|
||
Defaults to read license info from `license-info.csv`. Writes to `license.txt`. | ||
Put `license.txt` to `third_party/library/license.txt` where it is read when building docker images. | ||
1. Manually update a list of dependencies that requires source code, put it into `third_party/library/repo-MPL.txt`. |
Empty file.
82 changes: 82 additions & 0 deletions
82
py/kubeflow/testing/go-license-tools/concatenate_license.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Copyright 2019 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import argparse | ||
import requests | ||
import sys | ||
import traceback | ||
|
||
parser = argparse.ArgumentParser( | ||
description='Generate dependencies json from license.csv file.') | ||
parser.add_argument( | ||
'license_info_file', | ||
nargs='?', | ||
default='license_info.csv', | ||
help='CSV file with license info fetched from github using get-github-license-info CLI tool.' | ||
+'(default: %(default)s)', | ||
) | ||
parser.add_argument( | ||
'-o', | ||
'--output', | ||
dest='output_file', | ||
nargs='?', | ||
default='license.txt', | ||
help= | ||
'Concatenated license file path this command generates. (default: %(default)s)' | ||
) | ||
args = parser.parse_args() | ||
|
||
|
||
def fetch_license_text(download_link): | ||
response = requests.get(download_link) | ||
assert response.ok, 'Fetching {} failed with {} {}'.format( | ||
download_link, response.status_code, response.reason) | ||
return response.text | ||
|
||
|
||
def main(): | ||
with open(args.license_info_file, | ||
'r') as license_info_file, open(args.output_file, | ||
'w') as output_file: | ||
repo_failed = [] | ||
for line in license_info_file: | ||
line = line.strip() | ||
[repo, license_link, license_name, | ||
license_download_link] = line.split(',') | ||
try: | ||
print('Repo {} has license download link {}'.format( | ||
repo, license_download_link), | ||
file=sys.stderr) | ||
license_text = fetch_license_text(license_download_link) | ||
print( | ||
'--------------------------------------------------------------------------------', | ||
file=output_file, | ||
) | ||
print('{} {} {}'.format(repo, license_name, license_link), | ||
file=output_file) | ||
print( | ||
'--------------------------------------------------------------------------------', | ||
file=output_file, | ||
) | ||
print(license_text, file=output_file) | ||
except Exception as e: # pylint: disable=broad-except | ||
print('[failed]', e, file=sys.stderr) | ||
traceback.print_exc(file=sys.stderr) | ||
repo_failed.append(repo) | ||
print('Failed to download license file for {} repos.'.format(len(repo_failed)), file=sys.stderr) | ||
for repo in repo_failed: | ||
print(repo, file=sys.stderr) | ||
|
||
|
||
main() |
107 changes: 107 additions & 0 deletions
107
py/kubeflow/testing/go-license-tools/get_github_license_info.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
# Copyright 2019 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import argparse | ||
import requests | ||
import sys | ||
import traceback | ||
from pathlib import Path | ||
|
||
home = str(Path.home()) | ||
parser = argparse.ArgumentParser( | ||
description='Get github license info from github APIs.') | ||
parser.add_argument( | ||
'repo_list', | ||
nargs='?', | ||
default='repo.txt', | ||
help= | ||
'Github repo list file with one line per github repo. Format: org/repo. (default: %(default)s)', | ||
) | ||
parser.add_argument( | ||
'-o', | ||
'--output', | ||
dest='output_file', | ||
nargs='?', | ||
default='license_info.csv', | ||
help= | ||
'Output file with one line per github repo. Line format: ' | ||
+'org/repo,license_html_url,license_name,license_download_url. (default: %(default)s)', | ||
) | ||
parser.add_argument( | ||
'--github-api-token-file', | ||
dest='github_api_token_file', | ||
default='{}/.github_api_token'.format(home), | ||
help='You need to create a github personal access token at https://github.com/settings/tokens, ' | ||
+'because github has very strict limit on anonymous API usage. (default: %(default)s) Format: a ' | ||
+'text file with one line. ' | ||
+'"<40 characters string shown when a new personal access token is created>"' | ||
) | ||
args = parser.parse_args() | ||
|
||
|
||
def main(): | ||
token = None | ||
try: | ||
with open(args.github_api_token_file, 'r') as token_file: | ||
token = token_file.read().strip() | ||
print('Read github API token from {}, length {}.'.format( | ||
args.github_api_token_file, len(token)), | ||
file=sys.stderr) | ||
except FileNotFoundError: | ||
raise Exception(( | ||
'Please put a github api token file at {}, or specify a different token file path by ' | ||
+'--github-api-token-file. Github API token is needed because anonymous API access limit ' | ||
+'is not enough.' | ||
).format(args.github_api_token_file)) | ||
|
||
# github personal access token is always 40 characters long | ||
assert len(token) == 40 | ||
# reference: https://developer.github.com/v3/#oauth2-token-sent-in-a-header | ||
headers = {'Authorization': 'token {}'.format(token)} | ||
with open(args.repo_list, | ||
'r') as repo_list_file, open(args.output_file, | ||
'w') as output_file: | ||
repo_succeeded = [] | ||
repo_failed = [] | ||
for repo in repo_list_file: | ||
repo = repo.strip() | ||
print('Fetching license for {}'.format(repo), file=sys.stderr) | ||
try: | ||
url = 'https://api.github.com/repos/{}/license'.format(repo) | ||
response = requests.get(url, headers=headers) | ||
if not response.ok: | ||
print('Error response content:\n{}'.format(response.content), file=sys.stderr) | ||
raise Exception('fetching {} failed with {} {}'.format( | ||
url, response.status_code, response.reason)) | ||
data = response.json() | ||
|
||
download_url = data['download_url'] | ||
license_name = data['license']['name'] | ||
html_url = data['html_url'] | ||
print('{},{},{},{}'.format(repo, html_url, license_name, download_url), file=output_file) | ||
repo_succeeded.append(repo) | ||
except Exception as e: # pylint: disable=broad-except | ||
print('[failed]', e, file=sys.stderr) | ||
traceback.print_exc(file=sys.stderr) | ||
repo_failed.append(repo) | ||
print('Fetched github license info, {} succeeded, {} failed.'.format( | ||
len(repo_succeeded), len(repo_failed)), file=sys.stderr) | ||
if repo_failed: | ||
print('The following repos failed:', file=sys.stderr) | ||
for repo in repo_failed: | ||
print(repo, file=sys.stderr) | ||
|
||
|
||
if __name__ == '__main__': | ||
main() |
Oops, something went wrong.