Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of Concept: Dependency Graph-Based Update Process for npm and Yarn #11262

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

kbukum1
Copy link
Contributor

@kbukum1 kbukum1 commented Jan 9, 2025

What are you trying to accomplish?

This PR introduces a Proof of Concept (PoC) for using a Directed Acyclic Graph (DAG) structure in dependabot-core to manage dependencies more effectively. The focus is on the npm_and_yarn ecosystem, which includes three package managers—npm, pnpm, and yarn. These package managers have distinct lockfile formats and functionalities, making it challenging to manage them effectively using the current flat list dependency structure.

Why:

  • The current flat list structure loses the hierarchical relationships between main and transitive dependencies, leading to inefficiencies, redundant updates, and errors like "No files updated!" or "No Change Error."
  • The DAG structure serves as a unified representation to hold dependencies and their relationships, making updates more accurate and reducing unnecessary operations.
  • The responsibility for building the DAG lies with the package manager parsers, ensuring compatibility with the unique characteristics of each package manager.

Anything you want to highlight for special attention from reviewers?

  • This PoC applies to the npm_and_yarn ecosystem, focusing on handling updates for:
    • npm (package.json, package-lock.json)
    • pnpm (pnpm-lock.yaml)
    • yarn (yarn.lock)
  • The DAG structure is designed to store dependencies and their relationships, while the package manager parsers are responsible for constructing the DAG based on manifest and lockfile data.
  • Feedback is requested on:
    • The implementation of the DAG structure and its integration into the update process.
    • The division of responsibilities between the DAG and the package manager parsers.

How will you know you've accomplished your goal?

  • Dependencies in the npm_and_yarn ecosystem will be accurately represented in the DAG structure.
  • Updates will be processed efficiently, focusing on dependencies that require changes.
  • Errors like "No files updated!" or "No Change Error" will be significantly reduced.
  • Tests will validate the correctness of the DAG representation and the update process for npm, pnpm, and yarn.
  • Performance improvements and better error handling will be observed during the update process.

Checklist

  • I have run the complete test suite to ensure all tests and linters pass.
  • I have thoroughly tested my code changes to ensure they work as expected, including adding additional tests for new functionality.
  • I have written clear and descriptive commit messages.
  • I have provided a detailed description of the changes in the pull request, including the problem it addresses, how it fixes the problem, and any relevant details about the implementation.
  • I have ensured that the code is well-documented and easy to understand.

add generated dependencies graphs into dependency snapshot
# frozen_string_literal: true

module Dependabot
class DependencyGraph
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the graph from the dependency-graph team if they have such a concept?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can use it. Currently waiting them reply. I think it is going to be something similar to what I am doing. Just need to modify places related to the structure such as add dependencies and so on but lock files parsings are going to be in this way just will need to add main/transtive dependencies somehow to the structure.

@elireisman
Copy link

👋 Hello @kbukum1 and @abdulapopoola (cc'ing @brrygrdn who is working on the new DG -> Dependabot APIs)

In DG's legacy services we do not maintain a dependency tree we store flat lists as you mention. The new service is storing edge pair connections to model the dependency tree more accurately alongside those lists, but there are some caveats to the representation we capture at the moment:

  1. Each lockfile format is lossy in it's own way, so captures are not perfect compared to a build-time SBOM (for instance) generated by a CI or build tool
  2. The current data model we're using has some caveats the DG team needs to address to express more detailed tree relationships vs. the simpler notion of "reachability" of other project packages from any one entry. Details here.

Also worth noting but may not be a blocker for you - the edge-pair representation of the connections between the packages in a project comes with some access pattern concerns we're still sussing out use case by use case. For instance we should be able to support the following access patterns fairly well:

  • Extract all dependencies/dependents reachable from a given starting package in the project
  • Extract all dependency relationships for a given project in full (similar to listing all dependencies for the project)

Here, a "project" means a single NPM package.json and it's immediately related lockfiles (NPM, YARN, PNPM.) The new system does not yet support connections such as references to repo-local "workspaces" only dependencies associated with a published package registry (public or private) are recorded.

Hope that's helpful for a start. I assume Barry will have thoughts as well, but happy to answer more questions for you if I can 👍

@kbukum1
Copy link
Contributor Author

kbukum1 commented Jan 10, 2025

2. nships vs. the simpler notion of "reach

Thank you @elireisman for all the information. That's was really helpful. I will go over the ADR that you shared and will see if this approach going to be ok for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants