Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scancode-toolkit-31.0.2 returns an unknown-license-reference just before the x11-lucent text #3079

Open
DennisClark opened this issue Sep 1, 2022 · 1 comment
Assignees
Labels

Comments

@DennisClark
Copy link
Member

DennisClark commented Sep 1, 2022

I scanned doris-1.1.1-rc03 ( available at https://github.com/apache/doris/archive/refs/tags/1.1.1-rc03.tar.gz )
using scancode-toolkit-31.0.2
and although it detected most of the licenses in the rather complex notice (attached) in
doris-1.1.1-rc03/dist/LICENSE-dist.txt
it returns both unknown-license-reference and x11-lucent for this chunk of text:

be/src/gutil/utf/*: licensed under the following terms:

  UTF-8 Library

  The authors of this software are Rob Pike and Ken Thompson.
      Copyright (c) 1998-2002 by Lucent Technologies.

  Permission to use, copy, modify, and distribute this software for any purpose without
  fee is hereby granted, provided that this entire notice is included in all copies of any
  software which is or includes a copy or modification of this software and in all copies
  of the supporting documentation for such software.  THIS SOFTWARE IS BEING PROVIDED "AS
  IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY.  IN PARTICULAR, NEITHER THE AUTHORS NOR
  LUCENT TECHNOLOGIES MAKE ANY REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE
  MERCHANTABILITY OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.

See lines 9102 through 9189 in the attached scan results to see both detection instances.

Apparently the "licensed under the following terms:" text snippet misled the scan logic, even though it found the x11-lucent license correctly right after that. There is no reason to return unknown-license-reference for the introductory sentence, which is primarily to provide clarity to the reader of the file.

LICENSE-dist.txt.zip

doris-1.1.1-rc03-results.json.zip

@AyanSinhaMahapatra
Copy link
Member

@DennisClark this is already fixed in the LicenseDetection branch for the upcoming release: https://github.com/nexB/scancode-toolkit/tree/add-license-detection.

Similar to Issue 2 in #3069 (comment) and also similar to this issue reported by eclipse foundation here: #2878 (comment), this is solved by:

Here the detection rule is "unknown-intro-followed-by-match" i.e. an unknown intro was there followed by a proper detection and so this unknown can be removed. This is achieved by tagging specific rules as is_license_intro as True.

New license detection looks like this:

      "detected_license_expression": "x11-lucent",
      "detected_license_expression_spdx": "LicenseRef-scancode-x11-lucent",
      "license_detections": [
        {
          "license_expression": "x11-lucent",
          "detection_rules": [
            "unknown-intro-followed-by-match"
          ],
          "matches": [
            {
              "score": 100.0,
              "start_line": 1,
              "end_line": 1,
              "matched_length": 5,
              "match_coverage": 100.0,
              "matcher": "2-aho",
              "license_expression": "unknown-license-reference",
              "rule_identifier": "license-intro_4.RULE",
              "referenced_filenames": [],
              "is_license_text": false,
              "is_license_notice": false,
              "is_license_reference": false,
              "is_license_tag": false,
              "is_license_intro": true,
              "rule_length": 5,
              "rule_relevance": 100,
              "matched_text": "licensed under the following terms:",
              "licenses": [
                {
                  "key": "unknown-license-reference",
                  "name": "Unknown License file reference",
                  "short_name": "Unknown License reference",
                  "category": "Unstated License",
                  "is_exception": false,
                  "is_unknown": true,
                  "owner": "Unspecified",
                  "homepage_url": null,
                  "text_url": "",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/unknown-license-reference",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.yml",
                  "spdx_license_key": "LicenseRef-scancode-unknown-license-reference",
                  "spdx_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.LICENSE"
                }
              ]
            },
            {
              "score": 100.0,
              "start_line": 8,
              "end_line": 14,
              "matched_length": 93,
              "match_coverage": 100.0,
              "matcher": "2-aho",
              "license_expression": "x11-lucent",
              "rule_identifier": "x11-lucent_1.RULE",
              "referenced_filenames": [],
              "is_license_text": true,
              "is_license_notice": false,
              "is_license_reference": false,
              "is_license_tag": false,
              "is_license_intro": false,
              "rule_length": 93,
              "rule_relevance": 100,
              "matched_text": "Permission to use, copy, modify, and distribute this software for any purpose without\n  fee is hereby granted, provided that this entire notice is included in all copies of any\n  software which is or includes a copy or modification of this software and in all copies\n  of the supporting documentation for such software.  THIS SOFTWARE IS BEING PROVIDED \"AS\n  IS\", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY.  IN PARTICULAR, NEITHER THE AUTHORS NOR\n  LUCENT TECHNOLOGIES MAKE ANY REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE\n  MERCHANTABILITY OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.",
              "licenses": [
                {
                  "key": "x11-lucent",
                  "name": "X11-Style (Lucent)",
                  "short_name": "X11-Style (Lucent)",
                  "category": "Permissive",
                  "is_exception": false,
                  "is_unknown": false,
                  "owner": "Alcatel-Lucent",
                  "homepage_url": null,
                  "text_url": "",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/x11-lucent",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/x11-lucent.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/x11-lucent.yml",
                  "spdx_license_key": "LicenseRef-scancode-x11-lucent",
                  "spdx_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/x11-lucent.LICENSE"
                }
              ]
            }
          ]
        }
      ],
      "license_clues": [],

There was also a bug related to how we group matches into LicenseDetection, I have solved this to factor in license intros when doing this grouping.

Here are the scan results for you to look at:

Old scan just this issue:
doris-issue-3079.json.txt

New scan just this issue:
doris-add-license-detection-issue-3079.json.txt

Old scan entire file:
doris-v31.1.1-LICENSE-dist.json.txt

New scan entire file:
doris-add-license-detection-LICENSE-dist.json.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants