Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DiscoveredDependency model #486

Merged
merged 47 commits into from
Aug 31, 2022
Merged

Conversation

JonoYang
Copy link
Member

@JonoYang JonoYang commented Aug 4, 2022

This PR adds the DiscoveredDependency model to scancode.io. New views, based off the existing DiscoveredPackage views, have been added for DiscoveredDependency.

This is the second PR split from #482 and is built on top of #485

@JonoYang JonoYang requested a review from tdruez August 4, 2022 21:06
@JonoYang JonoYang force-pushed the implement-discovereddependency-model branch from a5594ae to 6fe6052 Compare August 12, 2022 22:33
@JonoYang
Copy link
Member Author

@tdruez What would be a better way to write this migration query? https://github.com/nexB/scancode.io/blob/e2c9bcdae3530c379f17478f40303a94a9939659/scanpipe/migrations/0023_migrate_dependencies.py#L8

I am trying to populate the DiscoveredDependency table from the dependency data stored in the JSON fields on DiscoveredPackage before I remove the field.

@tdruez
Copy link
Contributor

tdruez commented Aug 15, 2022

What would be a better way to write this migration query?

@JonoYang unfortunatly, I don't think there's an easy way here to use update() since the old dependencies field uses a complex data structure (JSON field as list of dict).
Since the amount of Packages is much less than Resources, I think we can go ahead with your approach.
Also, dependencies__isnull=False is not the proper lookup as it will return [] empty list value as well, and dependencies_data is not a field on the model, dependencies is.

I would refactor the code into something along:

def migrate_dependencies_to_discovereddependencies(apps, schema_editor):
    DiscoveredPackage = apps.get_model('scanpipe', 'DiscoveredPackage')
    DiscoveredDependency = apps.get_model('scanpipe', 'DiscoveredDependency')

    package_with_dependencies = DiscoveredPackage.objects.exclude(dependencies=[])
    
    for package in package_with_dependencies:
        for dependencies_data in package.dependencies:
            # Remove non-supported fields from the data dict
            dependencies_data.pop("extra_data", None)
            dependencies_data.pop("resolved_package", None)
            DiscoveredDependency.objects.create(project=package.project, **dependencies_data)

scanpipe/models.py Outdated Show resolved Hide resolved
scanpipe/models.py Outdated Show resolved Hide resolved
scanpipe/models.py Outdated Show resolved Hide resolved
scanpipe/models.py Outdated Show resolved Hide resolved
@JonoYang JonoYang force-pushed the implement-discovereddependency-model branch 3 times, most recently from 7413e38 to f301e27 Compare August 15, 2022 21:56
@JonoYang JonoYang force-pushed the implement-discovereddependency-model branch from 2cc57ed to 250a7ce Compare August 25, 2022 02:05
JonoYang and others added 17 commits August 25, 2022 11:32
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Thomas Druez <tdruez@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
This reverts commit c9b8bed.

Sorting Packages, Dependencies, and Resources from DatafileHandler.assemble() will never work. The code needs to be changed in scancode-toolkit.

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
    * This is so we are consistent with scancode-toolkit JSON output
    * Update expected test results

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Update test expectations

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Create new dependency list and detail views
    * Update assemble_packages() to create DiscoveredDependencies
    * Update test expectations

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Remove for_package_uid and replace with ForeignKey for_package
    * Remove datafile_path and replace with ForeignKey datafile_resource
    * Create properties for the two removed fields
    * Update dependency views to link to datafile_resource
    * Update expected test results

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Add strip_datafile_path_root to DiscoveredDependency.create_from_data
    * This argument strips the root path segment from `datafile_path` before using the path to look up the corresponding CodebaseResource
    * This is used in the case where we are importing a scan from scancode-toolkit, where the root path segments are not stripped by default
    * Update expected test results

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Used cached_property for DiscoveredDependency properties

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Order DiscoveredDependencies by is_runtime, is_optional, is_resolved, and dependency_uid
    * Do not show dependency_uid value in DiscoveredDependency list view

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang JonoYang force-pushed the implement-discovereddependency-model branch 2 times, most recently from 9c54a9b to b63981c Compare August 26, 2022 01:17
    * Use updated table header include
    * Update dependency presentation in package detail view
    * Show package uid on hover on for package tab

Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang JonoYang force-pushed the implement-discovereddependency-model branch from b63981c to 633a656 Compare August 26, 2022 01:19
    * Update DiscoveredDependency ordering

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Update DiscoveredDependency ordering
    * Update daglib test expectations

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Update test expectations

Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang JonoYang force-pushed the implement-discovereddependency-model branch from 00f803e to 40862fe Compare August 27, 2022 00:16
tdruez added 5 commits August 29, 2022 10:45
Signed-off-by: Thomas Druez <tdruez@nexb.com>
Signed-off-by: Thomas Druez <tdruez@nexb.com>
Signed-off-by: Thomas Druez <tdruez@nexb.com>
Signed-off-by: Thomas Druez <tdruez@nexb.com>
Signed-off-by: Thomas Druez <tdruez@nexb.com>
@tdruez
Copy link
Contributor

tdruez commented Aug 29, 2022

@JonoYang What's the status on your side? Could you provide a list of what's left todo on this one?

  1. I've made some changes to have a consistent "discovereddependencies" related name, this solves issues with the filter system, see 5d8cd43#diff-d71c8a8d98cec2c888bbf875d6a34a481041ba8a56656ec516aa53b0b7f6f194R1978

  2. I've notice a UI issue for both "For package" and "Datafile resource" tab when there's no values to display:
    Screenshot 2022-08-29 at 10 53 03

  3. Lastly, I've refactored the update_from_data method into a Mixin to remove the duplicated code, see dba4891#diff-d71c8a8d98cec2c888bbf875d6a34a481041ba8a56656ec516aa53b0b7f6f194R1013
    The only difference with the DiscoveredDependency.update_from_data is the following skip exception:
    field_name in PURL_FIELDS. Is it something we do not want to check on the dependency side?
    If that's the case, we can add a skip_purl_fields kwarg to the update_from_data method to make it usable for both models.
    We should also add a unit test for dependencies update similar to test_scanpipe_discovered_package_model_update_from_data

tdruez and others added 3 commits August 29, 2022 11:44
Signed-off-by: Thomas Druez <tdruez@nexb.com>
    * Only show links in dependency for_package tab or dependency datafile_resource tab if there is a value

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Use UpdateFromDataMixin in DiscoveredDependency
    * Create test for DiscoveredDependency.update_from_data()

Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang JonoYang force-pushed the implement-discovereddependency-model branch from f4fc4d1 to 0fa6783 Compare August 30, 2022 23:25
@JonoYang
Copy link
Member Author

@tdruez

What's the status on your side? Could you provide a list of what's left todo on this one?

I think we're ready to go for now as the important stuff is here:

  • model created
  • pipelines updated to create DiscoveredDependencies and relate them to the Package and Resource they're from
  • view updated to show DiscoveredDependency results and their relations

I was taking a look at #445 with this branch and I run into an issue with rubygem packages and their dependencies being reported multiple times. This is an issue with scancode-toolkit, and not scancode.io (aboutcode-org/scancode-toolkit#3072). I've also run into the license scan errors that you reported earlier.

I've made some changes to have a consistent "discovereddependencies" related name, this solves issues with the filter system, see 5d8cd43#diff-d71c8a8d98cec2c888bbf875d6a34a481041ba8a56656ec516aa53b0b7f6f194R1978

Thanks! I've updated the verbose_name and the verbose_plural_name and I was wondering why the related name didn't change.

The only difference with the DiscoveredDependency.update_from_data is the following skip exception:
field_name in PURL_FIELDS. Is it something we do not want to check on the dependency side?

I skipped this initially because we did not have purl fields in the DiscoveredDependency model. We can have this check again now that we're using the PURL field mixin in DiscoveredDependencies.

Signed-off-by: Thomas Druez <tdruez@nexb.com>
@tdruez tdruez merged commit 6c26874 into main Aug 31, 2022
@tdruez tdruez deleted the implement-discovereddependency-model branch August 31, 2022 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants