-
-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update summary plugin #2914
Update summary plugin #2914
Conversation
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* Set summary2 to be new summary plugin * Update tests Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* Remove declared values from license_expressions, holders, programming_language summary lists and then call those lists `other_license_expressions, `other_holders`, and `other_programming_languages`, respectivly. Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* Move old summarizer code to legacy_summarizer.py Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* Remove summarizer2.py Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* Update commoncode to v30.1.0 * Update expected test results Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* We have updated commoncode, which added a new field to the headers. We need to regen any test expectations that tests scancode output Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* legacy_summarizer.py has been renamed to summarizer_legacy.py Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* Update CHANGELOG.rst Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* Determine declared license expression and declared holders from package data * Map more Microsoft common names in COMMON_NAMES Signed-off-by: Jono Yang <jyang@nexb.com>
* Update summarizer test cases using codebases of real code Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* Only consider the number of times a programming language occurs in get_primary_language Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* Add new summarizer tests for license ambiguity, conflicting license categories * Rename previous simple test to package * Remove full and copyright test Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* In the summary, the primary_programming_language field has been renamed programming_language. The other_programming_languages field has been renamed other_languages Signed-off-by: Jono Yang <jyang@nexb.com>
The remaining pieces left are to figure out what to do when multiple package data is detected at the root of the codebase we are scanning. Currently, we are under the assumption that the codebase we are scanning is a package itself. However, some packages have multiple package manifests at the same root directory. We have some code that picks which package data to use based on the number of code files that are of the same language for those packages, but it is not a reliable heuristic. |
Signed-off-by: Jono Yang <jyang@nexb.com>
* Rename any references to summarizing in copyrights_summary.py to tallying and save as a new file Signed-off-by: Jono Yang <jyang@nexb.com>
946c485
to
ced0cb8
Compare
ced0cb8
to
5afa8f2
Compare
523bbe9
to
b562e3f
Compare
* Remove copyright_summary.py * Update summarizer help text Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* Add entry for summary plugin changes Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
* This allows us to more effectivly remove declared holders from the other holders list * Update holders summarization tests Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
To handle the case of multiple package data present at a root of a directory, we've decided to just combine all the package data and return it as the declared origin information. |
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great, yet I do not think this is worth updating the attribute name to tallies in the cluecode tests. Changing just the code should be enough IMHO. What do you think?
e92c8c5
to
c071232
Compare
* Do not rename summaries to tallies Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
c071232
to
72504e5
Compare
@pombredanne In the cluecode tests, I've renamed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great and avoiding the copyright tests YAML means we have more reasonable churn.
Some expected JSON still need fixing though and IMHO we should not report thing without copyrights or holder in tallies but only things that have a value
Reference: #2842 Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
This PR further overhauls the summary plugin in summarycode. The previous summary plugin returned a count of all the detected license expressions, holders, programming languages, and other scan attributes. We are now evolving that, where we have scancode-toolkit automatically deduce the correct origin info for the codebase it scans using the summary plugin.
This PR also updates commoncode to version 30.1.0.
The summary codebase attribute created by the summary plugin now has these fields:
declared_license_expression
license_clarity_score
declared_holder
primary_language
other_license_expressions
declared_license_expression
other_holders
declared_holder
other_languages
primary_language
.