Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update summary plugin #2914

Merged
merged 36 commits into from
Apr 20, 2022
Merged

Update summary plugin #2914

merged 36 commits into from
Apr 20, 2022

Conversation

JonoYang
Copy link
Member

@JonoYang JonoYang commented Apr 11, 2022

This PR further overhauls the summary plugin in summarycode. The previous summary plugin returned a count of all the detected license expressions, holders, programming languages, and other scan attributes. We are now evolving that, where we have scancode-toolkit automatically deduce the correct origin info for the codebase it scans using the summary plugin.

This PR also updates commoncode to version 30.1.0.

The summary codebase attribute created by the summary plugin now has these fields:

declared_license_expression

  • Primary license expression for a scanned codebase

license_clarity_score

  • The score of how well a codebase has declared its origin information

declared_holder

  • Primary holder of the scanned codebase

primary_language

  • The main programming language of the scanned codebase

other_license_expressions

  • Other detected license expressions from the scanned codebase, excluding the declared_license_expression

other_holders

  • Other detected copyright holders from the scanned codebase, excluding the declared_holder

other_languages

  • Other detected programming languages, excluding primary_language.

JonoYang added 20 commits March 24, 2022 10:33
Reference: #2842

Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Set summary2 to be new summary plugin
    * Update tests

Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Remove declared values from license_expressions, holders, programming_language summary lists and then call those lists `other_license_expressions, `other_holders`, and `other_programming_languages`, respectivly.

Reference: #2842

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Move old summarizer code to legacy_summarizer.py

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Remove summarizer2.py

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Update commoncode to v30.1.0
    * Update expected test results

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * We have updated commoncode, which added a new field to the headers. We need to regen any test expectations that tests scancode output

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * legacy_summarizer.py has been renamed to summarizer_legacy.py

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Update CHANGELOG.rst

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Determine declared license expression and declared holders from package data
    * Map more Microsoft common names in COMMON_NAMES

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Update summarizer test cases using codebases of real code

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Only consider the number of times a programming language occurs in get_primary_language

Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Add new summarizer tests for license ambiguity, conflicting license categories
    * Rename previous simple test to package
    * Remove full and copyright test

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * In the summary, the primary_programming_language field has been renamed programming_language. The other_programming_languages field has been renamed other_languages

Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang
Copy link
Member Author

The remaining pieces left are to figure out what to do when multiple package data is detected at the root of the codebase we are scanning. Currently, we are under the assumption that the codebase we are scanning is a package itself. However, some packages have multiple package manifests at the same root directory. We have some code that picks which package data to use based on the number of code files that are of the same language for those packages, but it is not a reliable heuristic.

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Rename any references to summarizing in copyrights_summary.py to tallying and save as a new file

Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang JonoYang force-pushed the update-summary-plugin branch from 946c485 to ced0cb8 Compare April 13, 2022 01:57
Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Combine all detected origin info from multiple package data and use the resulting values in the summary
    * Create new test for multiple package data summarization

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang JonoYang force-pushed the update-summary-plugin branch from ced0cb8 to 5afa8f2 Compare April 14, 2022 02:04
    * Remove redundant code in summarizer.py

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Remove all references to summarization in tallies tests

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang JonoYang force-pushed the update-summary-plugin branch from 523bbe9 to b562e3f Compare April 14, 2022 19:24
    * Remove copyright_summary.py
    * Update summarizer help text

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Add entry for summary plugin changes

Signed-off-by: Jono Yang <jyang@nexb.com>
Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
    * This allows us to more effectivly remove declared holders from the other holders list
    * Update holders summarization tests

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang
Copy link
Member Author

To handle the case of multiple package data present at a root of a directory, we've decided to just combine all the package data and return it as the declared origin information.

@JonoYang JonoYang requested a review from pombredanne April 15, 2022 18:00
Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, yet I do not think this is worth updating the attribute name to tallies in the cluecode tests. Changing just the code should be enough IMHO. What do you think?

@JonoYang JonoYang force-pushed the update-summary-plugin branch from e92c8c5 to c071232 Compare April 20, 2022 00:06
    * Do not rename summaries to tallies

Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang JonoYang force-pushed the update-summary-plugin branch from c071232 to 72504e5 Compare April 20, 2022 00:15
@JonoYang
Copy link
Member Author

@pombredanne In the cluecode tests, I've renamed the tallies attribute back to summary. I'm still failing a test for the CycloneDX output plugin and I'm not sure why it's doing that. I've opened a ticket for that #2917

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great and avoiding the copyright tests YAML means we have more reasonable churn.
Some expected JSON still need fixing though and IMHO we should not report thing without copyrights or holder in tallies but only things that have a value

JonoYang and others added 2 commits April 20, 2022 09:36
Reference: #2842
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants