Skip to content

Commit

Permalink
Accomodate updated Scancode attribute names
Browse files Browse the repository at this point in the history
Scancode v31.0.0 includes changes[1] to JSON output attribute
names which was causing processing KeyErrors when Tern would run with
Scancode. Scancode v32.0.0 also includes changes[2] to license_detection
output which was similarly causing parsing KeyErrors when Tern ran
with Scancode. This commit adds code that can accomodate the new
attribute property names in the newer versions of Scancode, as well as
the older value names (in case we have users still using older Scancode
versions). At some point in the future, it probably makes sense to
re-visit some of these changes and see if we want to continue to support
older versions of scancode.

This commit also has small changes that updated the README instructions
for how to install newer Scancode versions on M1/ARM hardware and also
fixes a small bug that was causing purl generation to fail when Scancode
doesn't detect a package format.

[1]https://github.com/nexB/scancode-toolkit/blob/e3099637b195daca54942df9f695f58990097896/CHANGELOG.rst#v3100---2022-08-17

[2]https://github.com/nexB/scancode-toolkit/blob/e3099637b195daca54942df9f695f58990097896/CHANGELOG.rst#license-detection

Resolves #1202

Signed-off-by: Rose Judge <rjudge@vmware.com>
  • Loading branch information
rnjudge committed Jul 13, 2023
1 parent 852af8c commit d3dd148
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 14 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,8 @@ NOTE: Neither the Docker container nor the Vagrant image has any of the extensio
## Scancode<a name="scancode">
[scancode-toolkit](https://github.com/nexB/scancode-toolkit) is a license analysis tool that "detects licenses, copyrights, package manifests and direct dependencies and more both in source code and binary files". Note that Scancode currently works on Python 3.6 to 3.9. Be sure to check what python version you are using below.

**NOTE** Installation issues have been [reported](https://github.com/nexB/scancode-toolkit/issues/3205) on macOS on M1 and Linux on ARM for Scancode>=31.0.0. If you are wanting to run Tern + Scancode in either of these environments, you will need to install `scancode-toolkit-mini`.

1. Install system dependencies for Scancode (refer to the [Scancode GitHub repo](https://github.com/nexB/scancode-toolkit) for instructions)

2. Setup a python virtual environment
Expand All @@ -360,6 +362,10 @@ $ source bin/activate
```
$ pip install tern scancode-toolkit
```
<br> If you are using macOS on M1 or Linux on ARM, run:</br>
```
$ pip install tern scancode-toolkit-mini
```
4. Run tern with scancode
```
$ tern report -x scancode -i golang:1.12-alpine
Expand Down
57 changes: 46 additions & 11 deletions tern/extensions/scancode/executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,15 +58,38 @@ def get_scancode_file(file_dict):
file_dict['name'], fspath, file_dict['date'], file_dict['file_type'])
fd.short_file_type = get_file_type(file_dict)
fd.add_checksums({'sha1': file_dict['sha1'], 'md5': file_dict['md5']})
if file_dict['licenses']:
fd.licenses = [li['short_name'] for li in file_dict['licenses']]
fd.license_expressions = file_dict['license_expressions']
try:
# For scancode versions <= 32.0.0
if file_dict['licenses']:
fd.licenses = [li['short_name'] for li in file_dict['licenses']]
fd.license_expressions = file_dict['license_expressions']
except KeyError:
# License detection changed for scancode version >= 32.0
## https://github.com/nexB/scancode-toolkit/blob/e3099637b195daca54942df9f695f58990097896/CHANGELOG.rst#license-detection
if file_dict['license_detections']:
fd.licenses = [li['license_expression'] for li in file_dict['license_detections']]
fd.license_expressions = file_dict['detected_license_expression']
## Several of the scancode attribute names have changed. See:
# https://github.com/nexB/scancode-toolkit/blob/e3099637b195daca54942df9f695f58990097896/CHANGELOG.rst#important-api-changes-1
# The following try/except statements accomodate metadata from scancode versions
# prior to this scancode JSON output change as well as after the change was made.
if file_dict['copyrights']:
fd.copyrights = [c['value'] for c in file_dict['copyrights']]
try:
# For scancode versions <=30.*
fd.copyrights = [c['value'] for c in file_dict['copyrights']]
except KeyError:
# Data structure fields changed in scancode >= 31.0.0
fd.copyrights = [c['copyright'] for c in file_dict['copyrights']]
if file_dict['urls']:
fd.urls = [u['url'] for u in file_dict['urls']]
fd.packages = file_dict['packages']
fd.authors = [a['value'] for a in file_dict['authors']]
try:
fd.packages = file_dict['packages']
except KeyError:
fd.packages = file_dict['package_data']
try:
fd.authors = [a['value'] for a in file_dict['authors']]
except KeyError:
fd.authors = [a['author'] for a in file_dict['authors']]
if file_dict['scan_errors']:
# for each scan error make a notice
for err in file_dict['scan_errors']:
Expand Down Expand Up @@ -112,12 +135,18 @@ def get_scancode_package(package_dict):
object with the results'''
package = Package(package_dict['name'])
package.version = package_dict['version']
package.pkg_license = filter_pkg_license(package_dict['declared_license'])
try:
package.pkg_license = filter_pkg_license(package_dict['declared_license'])
package.licenses = [package_dict['declared_license'],
package_dict['license_expression']]
except KeyError:
## https://github.com/nexB/scancode-toolkit/blob/e3099637b195daca54942df9f695f58990097896/CHANGELOG.rst#license-detection
package.pkg_license = filter_pkg_license(package_dict['extracted_license_statement'])
package.licenses = [li['license_expression'] for li in package_dict['license_detections']]
package.licenses.append(package_dict['extracted_license_statement'])
package.copyright = package_dict['copyright']
package.proj_url = package_dict['repository_homepage_url']
package.download_url = package_dict['download_url']
package.licenses = [package_dict['declared_license'],
package_dict['license_expression']]
return package


Expand Down Expand Up @@ -160,8 +189,14 @@ def collect_layer_data(layer_obj):
for f in data['files']:
if f['type'] == 'file' and f['size'] != 0:
files.append(get_scancode_file(f))
for package in f['packages']:
packages.append(get_scancode_package(package))
try:
for package in f['packages']:
packages.append(get_scancode_package(package))
except KeyError:
# See comment in get_scancode_file() above about attribute name changes
# in newer Scancode versions
for package in f['package_data']:
packages.append(get_scancode_package(package))
return files, packages


Expand Down
6 changes: 3 additions & 3 deletions tern/formats/spdx/spdx_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,10 +241,10 @@ def get_purl(package_obj):
purl_namespace = package_obj.pkg_supplier.split(' ')[1].lower()
else:
purl_namespace = package_obj.pkg_supplier.split(' ')[0].lower()
# TODO- this might need adjusting for alpm. Currently can't test on M1
purl = PackageURL(purl_type, purl_namespace, package_obj.name.lower(), package_obj.version,
qualifiers={'arch': package_obj.arch if package_obj.arch else ''})
try:
# TODO- this might need adjusting for alpm. Currently can't test on M1
purl = PackageURL(purl_type, purl_namespace, package_obj.name.lower(), package_obj.version,
qualifiers={'arch': package_obj.arch if package_obj.arch else ''})
return purl.to_string()
except ValueError:
return ''

0 comments on commit d3dd148

Please sign in to comment.