Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new licenses and new detection rules #2765

Merged
merged 68 commits into from
Jan 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
82b7187
Improve license metadata
pombredanne Oct 15, 2021
2af8e50
Merge branch 'develop' into omnibus-fall2-license-improvements
pombredanne Nov 20, 2021
9e0c205
Do not report template as license #270
pombredanne Nov 21, 2021
db306db
Add new and improved license tests
pombredanne Nov 21, 2021
2cf3117
Add new copyleft licenses
pombredanne Nov 21, 2021
09d1601
Add new permissive licenses rules
pombredanne Nov 21, 2021
1aadc1c
Add new miscellaneous license rules
pombredanne Nov 21, 2021
0956537
Add new license detection tests
pombredanne Nov 21, 2021
2630654
Promote rule as new license
pombredanne Nov 22, 2021
7bbaf3a
Promote rule as new license
pombredanne Nov 22, 2021
36e3491
Improve license metadata
pombredanne Nov 22, 2021
6c94508
Add new licenses
pombredanne Nov 22, 2021
4a5f99b
Add new license detection tests
pombredanne Nov 22, 2021
dc08c93
Add new license tests
pombredanne Nov 22, 2021
63df817
Add new license detection rules
pombredanne Nov 22, 2021
36cbe98
Update licenses to SPDX license list 3.15
pombredanne Nov 22, 2021
1330682
Improve license sync
pombredanne Nov 22, 2021
4ba7111
Add new proprietary license rule
pombredanne Nov 22, 2021
93bd9f3
Update license metadata
pombredanne Nov 22, 2021
22f578c
Fix typo in test data file
pombredanne Nov 23, 2021
cc30ed2
Update SPDX tests with latest list version
pombredanne Nov 23, 2021
97a1a57
Do not use defaultdict for Query unknowns
pombredanne Nov 25, 2021
a8a75b7
Correct failing license detection tests
pombredanne Nov 25, 2021
5e3bfa7
Correct license detection test
pombredanne Nov 25, 2021
dfc0b69
Format functions arguments, black-style
pombredanne Nov 25, 2021
d4820f2
Ensure that all single word rule are references
pombredanne Nov 26, 2021
73e0376
Remove unused arguments and format code
pombredanne Nov 26, 2021
1ce8879
Check for Rule stored text first in Rule.text()
pombredanne Nov 26, 2021
72faf0c
Do not detect some solo word license in binaries
pombredanne Nov 26, 2021
829fc52
Merge remote-tracking branch 'upstream/develop' into omnibus-fall3-li…
pombredanne Nov 26, 2021
bb54959
Add test for continuous detection #2769
pombredanne Nov 27, 2021
ff7c148
Rename only_known_words to continuous #2769
pombredanne Nov 27, 2021
274d116
Deprecate license
pombredanne Dec 20, 2021
cc10916
Refine license matches filtering. Add new rules
pombredanne Dec 23, 2021
2691925
Improve French and German copyright detection
pombredanne Dec 23, 2021
e249dee
Format regex code for readability.
pombredanne Dec 23, 2021
131b056
Update CHANGELOG
pombredanne Dec 23, 2021
cfc2eab
Do not cap dependent versions
pombredanne Dec 23, 2021
4b82fb8
Align test expectations with latest code
pombredanne Dec 23, 2021
3e36927
Merge latest develop branch
pombredanne Dec 26, 2021
61bf43d
Add new and improved license detection rules
pombredanne Dec 26, 2021
75082b9
Refine unknown and stop handling
pombredanne Dec 26, 2021
7330914
Improve LicenseMatch filtering
pombredanne Dec 29, 2021
d9789ac
Remove old "rule templates" markers from tests
pombredanne Dec 29, 2021
b5972e1
Improve tests of LicenseMatch
pombredanne Dec 29, 2021
f84a23d
Improve LicenseMatch fields documentation
pombredanne Dec 29, 2021
5f92c93
Use license_expression.combine_expressions()
pombredanne Dec 29, 2021
a824594
Rename Rule.compute_relevance to set_relevance
pombredanne Dec 29, 2021
840527f
Format docstrings
pombredanne Dec 29, 2021
e138421
Use new cache.build_spdx_license_expression()
pombredanne Dec 29, 2021
0595e65
Update CHANGELOG.rst
pombredanne Dec 29, 2021
8d57889
Merge remote-tracking branch 'upstream/develop' into omnibus-fall3-li…
pombredanne Dec 29, 2021
0a1a562
Use license expression strings, not objects
pombredanne Dec 29, 2021
107eea7
Restore combine_expression behavior
pombredanne Dec 30, 2021
b7a7593
Remove SPDX license lists false positive rules
pombredanne Dec 30, 2021
5637c82
Validate and use only lower case license keys
pombredanne Dec 31, 2021
37db46f
Create shared get_licenses_by_spdx_key function
pombredanne Jan 1, 2022
10eadcc
Improve rule generation
pombredanne Jan 1, 2022
b9ad65f
Add new filter for false positive license lists
pombredanne Jan 3, 2022
f4ddd14
Add new licenses
pombredanne Jan 3, 2022
56deead
Remove unused test options for Python2
pombredanne Jan 3, 2022
0adb718
Add more license detection rules
pombredanne Jan 3, 2022
92a1dbb
Add missing test file
pombredanne Jan 3, 2022
84a5ede
Improve handling of continuous and key phrases
pombredanne Jan 5, 2022
012800f
Refine license sync
pombredanne Jan 5, 2022
ba249d1
Refine short Apache license rules and tests
pombredanne Jan 5, 2022
ad4dfff
Refine license tests
pombredanne Jan 5, 2022
fd628e6
Add new draft LicenseDetection
pombredanne Jan 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
26 changes: 25 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Changelog
-----------------------



Important API changes:
~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -36,7 +37,9 @@ Copyright detection:

- The data structure in the JSON is now using consistently named attributes as
opposed to a plain value.
- Several copyright detection bugs have been fixed.
- Several copyright detection bugs have been fixed.
- French and German copyright detection is improved.
- Some spurious trailing dots in holders are not stripped.


License detection:
Expand All @@ -53,6 +56,27 @@ License detection:
`{{` and `}}`. When defined a RULE will only match when the key phrases match
exactly.

- The rule attribute "only_known_words" has been renamed to "is_continuous" and its
meaning has been updated and expanded. A rule tagged as "is_continuous" can only
be matched if there are no gaps between matched words, be they stopwords, extra
unknown or known words. This improves several false positive license detections.

- When scanning binary files, the detection of single word rules is filtered when
surrounded by gibberish or is using mixed case. For instance $#%$GpL$ is a false
positive and is no longer reported.

- Several rules we tagged as is_license_notice incorrectly but were references
and have been requalified as is_license_reference. All rules made of a single
ord have been requalified as is_license_reference if they were not qualified
this way.

- Matches to small license rules (with small defined as under 15 words)
that are scattered on too many lines are now filtered as false matches.

- Small, two-words matches that overlap the previous or next match by
by the word "license" and assimilated are now filtered as false matches.


Package detection:
~~~~~~~~~~~~~~~~~~

Expand Down
8 changes: 0 additions & 8 deletions conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,6 @@ def pytest_addoption(parser):
"""
group = parser.getgroup('scancode', 'Test suite options for ScanCode')

group.addoption(
'--force-py3',
dest='force_py3',
action='store_true',
default=False,
help='[DEPRECATED and ignored] Python 3 port is completed.',
)

group.addoption(
'--test-suite',
action='store',
Expand Down
2 changes: 1 addition & 1 deletion etc/scripts/licenses/buildrules.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ def cli(licenses_file):
rulerec = models.Rule(**rd)

# force recomputing relevance to remove junk stored relevance for long rules
rulerec.compute_relevance(_threshold=18.0)
rulerec.set_relevance()

rulerec.data_file = base_loc + '.yml'
rulerec.text_file = base_loc + '.RULE'
Expand Down
142 changes: 142 additions & 0 deletions etc/scripts/licenses/gen_spdx_lists.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# -*- coding: utf-8 -*-
#
# Copyright (c) nexB Inc. and others. All rights reserved.
# ScanCode is a trademark of nexB Inc.
# SPDX-License-Identifier: Apache-2.0
# See http://www.apache.org/licenses/LICENSE-2.0 for the license text.
# See https://github.com/nexB/scancode-toolkit for support or download.
# See https://aboutcode.org for more information about nexB OSS projects.
#

import click

from licensedcode.cache import get_licenses_by_spdx_key

import synclic

"""
A script to generate license detection rules from lists of SPDX
licenses for their name or id/name combos.

It is common to see SPDX license names and ids used for licensing documentation.

Here we fetch the latest SPDX licenses list and generate rules for each
license id/name, name and a few other related combinations.
"""

TRACE = False

template = '''----------------------------------------
license_expression: {key}
relevance: 100
{is_license}: yes
minimum_coverage: 100
is_continuous: yes
notes: Rule based on an SPDX license identifier and name
---
{text}
'''


@click.command()
@click.argument(
# 'A buildrules-formatted file used to generate new licenses rules.')
'output', type=click.Path(), metavar='FILE')

@click.help_option('-h', '--help')
def cli(output):
"""
Generate ScanCode license detection rules from a list of SPDX
license. Save these in FILE for use with buildrules.

The `spdx` directory is used as a temp store for fetched SPDX licenses.
"""

licenses_by_spdx_key = get_licenses_by_spdx_key(
licenses=None,
include_deprecated=False,
lowercase_keys=False,
include_other_spdx_license_keys=True,
)

spdx_source = synclic.SpdxSource(external_base_dir=None)
spdx_data = list(spdx_source.fetch_spdx_licenses())

messages = []
with open(output, 'w') as o:
for spdx in spdx_data:
is_exception = 'licenseExceptionId' in spdx
spdx_key = spdx.get('licenseId') or spdx.get('licenseExceptionId')
name = spdx['name']
lic = licenses_by_spdx_key.get(spdx_key)
if not lic:
print('--> Skipping SPDX license unknown in ScanCode:', spdx_key,)
continue
for rule in build_rules(lic.key, spdx_key, name, is_exception):
o.write(rule)

o.write('----------------------------------------\n')

for msg in messages:
print(*msg)


def build_rules(key, spdx_key, name, is_exception=False):
yield template.format(
key=key,
is_license='is_license_reference',
text=name,
)

yield template.format(
key=key,
is_license='is_license_reference',
text=f'name: {name}',
)

yield template.format(
key=key,
is_license='is_license_reference',
text=f'{spdx_key} {name}',
)

yield template.format(
key=key,
is_license='is_license_reference',
text=f'{name} {spdx_key}',
)

yield template.format(
key=key,
is_license='is_license_tag',
text=f'{spdx_key} {name}',
)

yield template.format(
key=key,
is_license='is_license_tag',
text=f'license: {spdx_key}',
)

yield template.format(
key=key,
is_license='is_license_tag',
text=f'license: {name}',
)

if is_exception:
yield template.format(
key=key,
is_license='is_license_tag',
text=f'licenseExceptionId: {spdx_key}',
)
else:
yield template.format(
key=key,
is_license='is_license_tag',
text=f'licenseId: {spdx_key}',
)


if __name__ == '__main__':
cli()
172 changes: 0 additions & 172 deletions etc/scripts/licenses/gen_spdx_lists_fp.py

This file was deleted.

2 changes: 1 addition & 1 deletion etc/scripts/licenses/genrulevariants.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def cli(source, replacement):
rulerec = models.Rule(**rd)

# force recomputing relevance to remove junk stored relevance for long rules
rulerec.compute_relevance(_threshold=18.0)
rulerec.set_relevance()

rulerec.data_file = base_loc + '.yml'
rulerec.text_file = base_loc + '.RULE'
Expand Down
Loading