Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update summary plugin #2914

Merged
merged 36 commits into from
Apr 20, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
4a11db5
Add deprecation warnings to summary plugins
JonoYang Mar 24, 2022
d59d9df
Add deprecation message to headers
JonoYang Mar 28, 2022
de4c250
Rename summary option to summary-legacy #2842
JonoYang Mar 28, 2022
9a77c72
Determine primary programming language in summary
JonoYang Apr 1, 2022
dc0b943
Add counts to declared holders and primary language
JonoYang Apr 2, 2022
976ea12
Run license clarity scoring and summary at once
JonoYang Apr 5, 2022
49f72ed
Split old and new summarizer code
JonoYang Apr 5, 2022
3c80157
Combine license scoring and summary plugins
JonoYang Apr 5, 2022
1cb2c4a
Update expected test results
JonoYang Apr 5, 2022
017a181
Rename legacy_summarizer.py
JonoYang Apr 5, 2022
3cc279f
Determine primary language from detected package
JonoYang Apr 6, 2022
57710b7
Use package data for origin info
JonoYang Apr 7, 2022
c9c0003
Determine declared info from Package data
JonoYang Apr 7, 2022
1f91ffb
Refactor get_primary_language
JonoYang Apr 7, 2022
2cdcb76
Remove unused functions
JonoYang Apr 8, 2022
f91cb46
Pick package in get_origin_info_from_package_data
JonoYang Apr 8, 2022
02e4977
Return empty strings if a package cannot be determined
JonoYang Apr 8, 2022
77adf3b
Fix bug where joined expressions were not returned
JonoYang Apr 9, 2022
f91fc24
Simplify package scan summarization tests
JonoYang Apr 11, 2022
fb012aa
Simplify programming language fields in summary
JonoYang Apr 11, 2022
0abe6f0
Ignore package uuid in tests
JonoYang Apr 12, 2022
ae0aef1
Rename summarizer_legacy to tallies
JonoYang Apr 13, 2022
20cd6c0
Remove legacy summarizer and tests
JonoYang Apr 13, 2022
5afa8f2
Get origin info from multiple package data
JonoYang Apr 14, 2022
7ce3734
Remove references to summary in tallies.py
JonoYang Apr 14, 2022
b562e3f
Rename tallies test files
JonoYang Apr 14, 2022
be6b2a1
Update tests for copyright_tallies.py
JonoYang Apr 14, 2022
47ec08f
Add tests for helper functions
JonoYang Apr 15, 2022
64db2e8
Update CHANGELOG.rst
JonoYang Apr 15, 2022
222014d
Update CLI help text test expectations
JonoYang Apr 15, 2022
a4ea113
Return holders as a list before joining
JonoYang Apr 15, 2022
6163969
Update CHANGELOG.rst
JonoYang Apr 15, 2022
34720c1
Merge branch 'develop' into update-summary-plugin
JonoYang Apr 18, 2022
72504e5
Revert changes to cluecode_test_utils.CopyrightTest
JonoYang Apr 19, 2022
6a9576c
Update tallies test expectations
JonoYang Apr 20, 2022
e9df507
Regen CycloneDX expectation with metadata
pombredanne Apr 20, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
87 changes: 27 additions & 60 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Important API changes:
return "package_data" package information at the manifest file-level
rather than "packages". This has all the data attributes of a "package_data"
field plus others: "package_uuid", "package_data_files" and "files".

- There is a a new top-level "packages" attribute that contains package
instances that can be aggregating data from multiple manifests.

Expand All @@ -47,6 +47,14 @@ Important API changes:
- The data structure for CSV output has been changed to rename the Resource
column to "path". The "copyright_holder" has been ranmed to "holder"

- The license clarity scoring plugin has been overhauled to show new license
clarity criteria. More details of the new criteria are provided below.

- The functionality of the summary plugin has been changed to provide declared
origin information for the codebase being scanned. The previous summary plugin
functionality has been preserved in the new ``tallies`` plugin. More details
are provided below.


Copyright detection:
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -142,7 +150,7 @@ Package detection:
as these are really package data that are being detected, and can be manifests,
lockfiles or other package data. This has all the data attributes of a `package_data`
field plus others: `package_uuid`, `package_data_files` and `files`.


- A new top-level attribute `packages` has been added which contains package
instances created from `package_data` detected in the codebase.
Expand All @@ -156,7 +164,7 @@ Package detection:

- There is a new resource-level attribute `for_packages` which refers to packages
through package_uuids (pURL + uuid string).

- The package_data attribute `dependencies` (which is a list of DependentPackages),
now has a new attribute `resolved_package` having a package data mapping.
Also the `requirement` attribute here is renamed to `extracted_requirement`.
Expand Down Expand Up @@ -222,64 +230,20 @@ License Clarity Scoring Update
- Scoring Weight = -20


License Clarity Scoring Update
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- We are moving away from the license clarity scoring defined by ClearlyDefined
in the license clarity score plugin. The previous license clarity scoring
logic produced a score that was misleading, where it would return a low score
when scanning packages due to the stringent scoring criteria. We are now
using more general criteria to get a sense of what provenance information has
been provided and whether or not there is a conflict in licensing between
what licenses were declared at the top-level key files and what licenses have
been detected in the files under the top-level.

- The license clarity score is a value from 0-100 calculated by combining the
weighted values determined for each of the scoring elements:

- Declared license:

- When true, indicates that the software package licensing is documented at
top-level or well-known locations in the software project, typically in a
package manifest, NOTICE, LICENSE, COPYING or README file.
- Scoring Weight = 40

- Identification precision:

- Indicates how well the license statement(s) of the software identify known
licenses that can be designated by precise keys (identifiers) as provided in
a publicly available license list, such as the ScanCode LicenseDB, the SPDX
license list, the OSI license list, or a URL pointing to a specific license
text in a project or organization website.
- Scoring Weight = 40

- License texts:

- License texts are provided to support the declared license expression in
files such as a package manifest, NOTICE, LICENSE, COPYING or README.
- Scoring Weight = 10

- Declared copyright:

- When true, indicates that the software package copyright is documented at
top-level or well-known locations in the software project, typically in a
package manifest, NOTICE, LICENSE, COPYING or README file.
- Scoring Weight = 10

- Ambiguous compound licensing:

- When true, indicates that the software has a license declaration that
makes it difficult to construct a reliable license expression, such as in
the case of multiple licenses where the conjunctive versus disjunctive
relationship is not well defined.
- Scoring Weight = -10

- Conflicting license categories:
Summary Plugin Update
~~~~~~~~~~~~~~~~~~~~~
The summary plugin's behavior has been changed. Previously, it provided a count
of the detected license expressions, copyrights, holders, authors, and
programming languages from a scan. We have preserved this functionality by
creating a new plugin called ``tallies``. All functionality of the previous
summary plugin have been preserved in the tallies plugin.

- When true, indicates the declared license expression of the software is in
the permissive category, but that other potentially conflicting categories,
such as copyleft and proprietary, have been detected in lower level code.
- Scoring Weight = -20
The plugin now attempts to determine a declared license expression, holder, and
primary programming language from a scan. The license clarity score is provided
context on what origin information is provided from key files. It also returns
lists of tallies of the other detected license expressions, holders, and
programming languages. All information is provided in the codebase level
attribute named ``summary``.


Outputs:
Expand All @@ -288,6 +252,9 @@ Outputs:
- Add new outputs for the CycloneDx format.
The CLI now exposes options to produce CycloneDx BOMs in either JSON or XML format

- A new field ``warnings`` has been added to the headers of ScanCode toolkit output
that contains any warning messages that occur during a scan.


Output version
--------------
Expand Down
8 changes: 4 additions & 4 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -177,10 +177,10 @@ scancode_scan =
# module for details and doc.
scancode_post_scan =
summary = summarycode.summarizer:ScanSummary
summary2 = summarycode.summarizer2:ScanSummary
summary-keeping-details = summarycode.summarizer:ScanSummaryWithDetails
summary-key-files = summarycode.summarizer:ScanKeyFilesSummary
summary-by-facet = summarycode.summarizer:ScanByFacetSummary
tallies = summarycode.tallies:Tallies
tallies-with-details = summarycode.tallies:TalliesWithDetails
tallies-key-files = summarycode.tallies:KeyFilesTallies
tallies-by-facet = summarycode.tallies:FacetTallies
license-clarity-score = summarycode.score:LicenseClarityScore
license-policy = licensedcode.plugin_license_policy:LicensePolicy
mark-source = scancode.plugin_mark_source:MarkSource
Expand Down
4 changes: 2 additions & 2 deletions src/licensedcode/data/licenses/4suite-1.1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ spdx_license_key: LicenseRef-scancode-4suite-1.1
ignorable_copyrights:
- Copyright (c) 2000 Fourthought, Inc.
- Copyright (c) 2000 The Apache Software Foundation
ignorable_authors:
- Fourthought, Inc. (http://www.fourthought.com)
ignorable_holders:
- Fourthought, Inc.
- The Apache Software Foundation
ignorable_authors:
- Fourthought, Inc. (http://www.fourthought.com)
ignorable_urls:
- http://www.fourthought.com/
ignorable_emails:
Expand Down
4 changes: 2 additions & 2 deletions src/licensedcode/data/licenses/accellera-systemc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ ignorable_copyrights:
- (c) 1996- current year here
- (c) 1996- current year here by all Contributors
- Copyright (c) 1996- current year here by all Contributors
ignorable_authors:
- through the Accellera working group process
ignorable_holders:
- here
- here by all Contributors
ignorable_authors:
- through the Accellera working group process
ignorable_urls:
- http://www.accellera.org/
ignorable_emails:
Expand Down
4 changes: 2 additions & 2 deletions src/licensedcode/data/licenses/ace-tao.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ minimum_coverage: 30
ignorable_copyrights:
- copyrighted by Douglas C. Schmidt and his research group at Washington University, University
of California, Irvine, and Vanderbilt University, Copyright (c) 1993-2009
ignorable_authors:
- the DOC Group at the Institute for Software Integrated Systems (ISIS) and the Center
ignorable_holders:
- Douglas C. Schmidt and his research group at Washington University, University of California,
Irvine, and Vanderbilt University
ignorable_authors:
- the DOC Group at the Institute for Software Integrated Systems (ISIS) and the Center
2 changes: 1 addition & 1 deletion src/licensedcode/data/licenses/acroname-bdk.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ ignorable_holders:
ignorable_urls:
- https://libusb.info/
ignorable_emails:
- support@acroname.com
- support@acroname.com
2 changes: 1 addition & 1 deletion src/licensedcode/data/licenses/adapt-1.0.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ owner: OSI - Open Source Initiative
homepage_url: http://www.opensource.org/licenses/apl1.0.php
notes: Per SPDX.org, this license is OSI certified.
spdx_license_key: APL-1.0
osi_license_key: APL-1.0
text_urls:
- http://www.opensource.org/licenses/apl1.0.php
osi_url: http://www.opensource.org/licenses/apl1.0.php
other_urls:
- http://www.opensource.org/licenses/APL-1.0
- https://opensource.org/licenses/APL-1.0
osi_license_key: APL-1.0
Loading