Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Scorecard action for dependencies #1070

Closed
naveensrinivasan opened this issue Jan 13, 2023 · 17 comments
Closed

Implement Scorecard action for dependencies #1070

naveensrinivasan opened this issue Jan 13, 2023 · 17 comments
Assignees

Comments

@naveensrinivasan
Copy link
Member

naveensrinivasan commented Jan 13, 2023

Currently, Scorecard has scanned over 1,200,000 repositories, which are available in the API. However, when a GitHub project adds a new dependency, customers don't know about the health of the dependency. This can lead to potential issues in the project.

GitHub provides an API to get differences in dependencies between commits (https://docs.github.com/en/rest/dependency-graph/dependency-review?apiVersion=2022-11-28#get-a-diff-of-the-dependencies-between-commits). By utilizing this API in conjunction with the Scorecard API, we can provide the health of the dependency when there is a pull request to the repository. This will help consumers be aware of the health of the dependency and make informed decisions.

Proposed solution:

  • Implement an action that utilizes the GitHub API to get the differences in dependencies between commits.
  • Use the Scorecard API to check the health of the new dependencies.
  • Display the results of the health check in the pull request.

Benefits:

  • Helps customers make informed decisions about new dependencies
  • Increases transparency and trust in the project
  • It can potentially prevent issues caused by unhealthy dependencies.

Additional Information:

The reason for using the API https://docs.github.com/en/rest/dependency-graph/dependency-review?apiVersion=2022-11-28#get-a-diff-of-the-dependencies-between-commits instead of using other OSS projects for getting dependencies is that GitHub provides a mechanism to map dependencies to a GitHub project, which is its unique feature. This is a more reliable way to identify dependencies as it links the dependency to its source. Using this API, we can ensure that we get the correct information about the dependencies and their health.

It could be something like this naveensrinivasan#1 (comment)

image

@naveensrinivasan
Copy link
Member Author

naveensrinivasan commented Jan 13, 2023

I am planning to work on this, and I would like feedback. I presented this in the Bi-Weekly meeting.

@naveensrinivasan naveensrinivasan self-assigned this Jan 13, 2023
@laurentsimon
Copy link
Contributor

laurentsimon commented Jan 13, 2023

related to API rate limiting: #1071

@laurentsimon
Copy link
Contributor

laurentsimon commented Jan 13, 2023

IIRC, some of the things to consider:

  1. Where do we implement this:
    a. In this Action:
    • pros: all existing users get it for free if they update the trigger
    • cons: we have rate limit problems. Is that only a problem for our repos or other people also hit this problem? What that number of PR / commits per hour (?) where we hit this problem in practice? Can anything be done in Scorecard to remediate this problem? NOTE: scorecard runs only file-based checks on PRs.
      b. In another Action. pros and cons are the reverse of the above
  2. What do you want the UX to look like when displaying the results. Can anyone with UX design expertise help? Someone at OSSF?
  3. In the result, can we link to a richer UX on a website, and avoid effort duplication? deps.dev, scorecard.dev? I know deps.dev are working on improving scorecard result visualization once the structured results are landed.

@naveensrinivasan
Copy link
Member Author

  1. What do you want the UX to look like when displaying the results. Can anyone with UX design expertise help? Someone at OSSF?
  2. In the result, can we link to a richer UX on a website, and avoid effort duplication? deps.dev, scorecard.dev? I know deps.dev are working on improving scorecard result visualization once the structured results are landed.

To start off we can use deps.dev which has a better UI and then move to others when we have a better UI.

@naveensrinivasan
Copy link
Member Author

IIRC, some of the things to consider:

  1. Where do we implement this:
    a. In this Action:

    • pros: all existing users get it for free if they update the trigger
    • cons: we have rate limit problems. Is that only a problem for our repos or other people also hit this problem? What that number of PR / commits per hour (?) where we hit this problem in practice? Can anything be done in Scorecard to remediate this problem? NOTE: scorecard runs only file-based checks on PR for SARIF results, but all the checks for the JSON results uploaded to the API server.
      b. In another Action. pros and cons are the reverse of the above

Storing them in a separate folder, like the one at https://github.com/github/codeql-action, has long-term benefits. We can release them separately, and assign different permission models. The greatest advantage is that we won't ever encounter rate-limiting issues.

@laurentsimon
Copy link
Contributor

Since existing Action users don't run on pull_request (not supported so very rare), we don't get much by adding this new functionalities inside the existing Action: users will need to enable it manually.

So how about the following:

  1. We keep the existing Action as-is, so that our current users don't break
  2. We create a new Action under a new folder, e.g. dependency-analysis, that users will call as ossf/scorecard-action/dependency-analysis. This should help not break existing users (rate limiting, etc), and provide the other benefits described in previous comments, like decoupling features and it lets users decide what they want to run without a complicated config file.

Wdut?

@naveensrinivasan
Copy link
Member Author

Since existing Action users don't run on pull_request (not supported so very rare), we don't get much by adding this new functionalities inside the existing Action: users will need to enable it manually.

So how about the following:

  1. We keep the existing Action as-is, so that our current users don't break
  2. We create a new Action under a new folder, e.g. dependency-analysis, that users will call as ossf/scorecard-action/dependency-analysis. This should help not break existing users (rate limiting, etc), and provide the other benefits described in previous comments, like decoupling features and it lets users decide what they want to run without a complicated config file.

Wdut?

Sounds good. Thanks 👍

@naveensrinivasan naveensrinivasan changed the title Issue: Implement Scorecard action for dependencies Implement Scorecard action for dependencies Jan 30, 2023
naveensrinivasan added a commit to ossf-tests/scorecard-action-new that referenced this issue Feb 23, 2023
 New scorecard action ossf#1070

- Add workflow to publish dependency analysis Docker image
- Add a new filter function to filter slices
- Add a GetScorecardChecks function to get scorecard checks
- Add a GetScore function to get score of a repo
- Add a Validate function to validate token, owner, repo, commitSHA, and PR
- Add a new action file for OSSF Scorecard dependency analysis
- Add structs for ScorecardResult, Check, DependencyDiff, and V

Signed-off-by: naveensrinivasan <172697+naveensrinivasan@users.noreply.github.com>
naveensrinivasan added a commit to ossf-tests/scorecard-action-new that referenced this issue Feb 24, 2023
 New scorecard action ossf#1070

- Add workflow to publish dependency analysis Docker image
- Add a new filter function to filter slices
- Add a GetScorecardChecks function to get scorecard checks
- Add a GetScore function to get score of a repo
- Add a Validate function to validate token, owner, repo, commitSHA, and PR
- Add a new action file for OSSF Scorecard dependency analysis
- Add structs for ScorecardResult, Check, DependencyDiff, and V

Signed-off-by: naveensrinivasan <172697+naveensrinivasan@users.noreply.github.com>
@azeemshaikh38
Copy link
Contributor

I like the idea, my concern is maintainability. We have so many different offerings and the issues/bugs on these offerings are piling up since we maintainers don't have the bandwidth to tackle them. Should we consider adding this in the future instead?

Apart from that few questions on the design:

  • Is the user journey for this GitHub Action actionable? E.g, the action shows that an added dependency has binary artifacts. Should this affect an user who is using this dependency as a library? What is the expected action we want the developer to take here?
  • Have we given some thought to whether Scorecard data for the version being imported is more significant vs. Scorecard at HEAD?
  • Should we consider starting with just the Maintained check? Warning the user that they are about to depend on a unmaintained project and should consider an alternative might be both actionable and impactful.

@naveensrinivasan
Copy link
Member Author

naveensrinivasan commented Feb 28, 2023

I like the idea, my concern is maintainability. We have so many different offerings and the issues/bugs on these offerings are piling up since we maintainers don't have the bandwidth to tackle them. Should we consider adding this in the future instead?

Thank you for your question. I understand your concern about the maintainability of adding a new feature when you already have a large number of offerings to maintain. However, with the recent addition of two new maintainers, Raghav (see: ossf/scorecard#2663) and Spencer (see: ossf/scorecard#2269), it seems that the team has expanded, which could provide additional support and resources to tackle the issues/bugs that are currently piling up.

Moreover, the additional funding from AWS 500k can be utilized to improve the project, including the addition of new features such as the one under consideration.

Regarding the code complexity, the proposed feature involves only one API call to GitHub and same API call to Scorecards API, which should not be complicated to implement. Therefore, the overall complexity of the feature is relatively low, which means that it should not require a significant amount of maintenance resources.

Finally, the primary reason for adding this feature is to provide customers with information about their dependencies. This information is essential for customers to make informed decisions about managing their dependencies and avoiding potential security risks. By adding this feature, we can help customers to understand their dependencies better and minimize their exposure to potential security threats.

In summary, while adding a new feature can increase the workload, the recent additions to the team, additional funding, low complexity of the feature, and the benefits of addressing customer needs make it a viable option to consider.

Apart from that few questions on the design:

  • Is the user journey for this GitHub Action actionable? E.g, the action shows that an added dependency has binary artifacts. Should this affect an user who is using this dependency as a library? What is the expected action we want the developer to take here?

I understand that we are considering adding a new feature to our GitHub Action that can help customers define dependency policies similar to the Envoy project (see: envoyproxy/envoy#14334 and https://docs.google.com/document/d/1HbREo7pv7rgeIIjQn6mNpySzQE5rx2Yv9dXm5NqR2N8/edit#heading=h.qqlbt6betxi7).

We recognize that to do this effectively, we need a policy engine that can enforce success and failure, which we currently do not have. However, we still believe that providing information to customers about potential issues with their dependencies, such as binary artifacts, is essential, even in the absence of a policy engine.

By using our GitHub Action, customers can be made aware of any issues with their dependencies and make informed decisions about how to manage them. For example, they may decide to fork or send a patch to upstream or decide to accept the risk or reject the PR. Without this information, customers would be unaware of these issues and unable to make informed decisions.

We understand that the current user journey for the GitHub Action may not be fully actionable in the sense of enforcing policy. However, we believe that this is a valuable first step towards providing customers with the information they need to manage their dependencies effectively.

Moving forward, the next step in this progression would be to integrate our GitHub Action with a policy engine when it becomes available. This would allow us to provide customers with a more comprehensive solution that could enforce success and failure based on their defined policies. However, in the meantime, we believe that providing information about potential issues with dependencies is a valuable step forward.

  • Have we given some thought to whether Scorecard data for the version being imported is more significant vs. Scorecard at HEAD?

We understand that Scorecard data for the version being imported might not always be present. Moreover, we believe that the Scorecard at HEAD is critical because one of the significant factors that determine whether a project is likely to be a VULNERABLE PROJECT (as reported by Sonatype at https://www.sonatype.com/state-of-the-software-supply-chain/project-quality-metrics) is Code Review.

For instance, suppose a customer is referring to a version of the project that is six months old, and after that, the code hasn't been maintained, or the code review score has fallen down. In that case, the Scorecard at HEAD will provide more up-to-date information and be more relevant to the customer's needs.

In addition, we would add the results that we are pointing to the HEAD to let the customers know which one they are looking at. This would ensure that customers have the information they need to make informed decisions about how to manage their dependencies effectively.

In summary, while Scorecard data for the version being imported is important, we believe that the Scorecard at HEAD is more critical in many cases. This is because it provides more up-to-date information, including the Code Review score, which is a crucial factor in determining whether a project is likely to be vulnerable.

  • Should we consider starting with just the Maintained check? Warning the user that they are about to depend on a unmaintained project and should consider an alternative might be both actionable and impactful.

Regarding the question of whether we should consider starting with just the Maintained check, I think it is worth noting that the present implementation already provides an option for customers to choose which checks they would like to report.

While warning the user that they are about to depend on an unmaintained project and suggesting alternative options might be impactful, I believe that including the Critical and High checks is important as well. This is because these checks are designed to highlight potential issues that could significantly impact the project's security posture.

Therefore, even if we were to start with just the Maintained check, I would still recommend including the Critical and High checks as the default options if customers choose not to customize their checks. This would ensure that customers have the information they need to make informed decisions about their dependencies and minimize their exposure to potential security risks.

In summary, while including the Maintained check is important, I believe that including the Critical and High checks is essential as well. By providing this information, customers can make informed decisions about their dependencies and avoid potential security threats.

@spencerschrock
Copy link
Member

Re discussion in weekly sync, my vote is a "yes" but with a question around how versioning will be done. If both actions live in the same repo (scorecard-action) tagged releases will apply to both actions.

Is the user journey for this GitHub Action actionable?

I don't see a user journey problem. Instead of a developer running the Scorecard CLI individually on dependencies the PR introduces, this is an action to provide that information automatically. Since it's a separate action it's opt-in, any developer that writes a workflow to use this action has expressed desire to see the information.

E.g, the action shows that an added dependency has binary artifacts. Should this affect an user who is using this dependency as a library? What is the expected action we want the developer to take here?

You could say the same about maintainers who run Scorecard manually to evaluate potential library dependencies.

Therefore, the overall complexity of the feature is relatively low, which means that it should not require a significant amount of maintenance resources.

I agree that it's a relatively straight forward feature not likely to require significant changes. My only thoughts are:

  • changes in either API
  • potential changes required if/when structured results land

I also concur with what Naveen brought up in the sync that realistically a maintainer who adds the feature tends to be the one primarily supporting it. Of course others help out when needed and there is a small reviewing effort.

However, with the recent addition of two new maintainers, Raghav (see: ossf/scorecard#2663) and Spencer (see: ossf/scorecard#2269), it seems that the team has expanded, which could provide additional support and resources to tackle the issues/bugs that are currently piling up.

While I think there are plenty of eyes in ossf/scorecard, I often require a concious effort to remember to look at scorecard-action and scorecard-webapp.

@raghavkaul
Copy link

My opinion is that we should move forward with the feature, it seems useful. To reduce the maintenance burden:

  • We can tag related open issues with dependency-diff
  • We can keep our eye out for good first issues in the backlog for this feature

Since the feature is purely informational (and still experimental) we don't have enough data to say what the user journeys are right now, but we may in a few months once there are enough users who've built a workflow around this.

As mentioned above, the feature is opt-in, so I don't feel as worried that if the code falls behind, that rest of scorecard-action or scorecard would suffer.

@laurentsimon
Copy link
Contributor

Might be a good idea to not publicly advertise it and start rolling in stages on repos we own for a first round of feedback

@azeemshaikh38
Copy link
Contributor

Seems like we have a majority to go forward. So looks good to me.

@naveensrinivasan
Copy link
Member Author

Seems like we have a majority to go forward. So looks good to me.

Thank you!

@naveensrinivasan
Copy link
Member Author

Might be a good idea to not publicly advertise it and start rolling in stages on repos we own for a first round of feedback

I agree. That was my plan! Thanks

@laurentsimon
Copy link
Contributor

laurentsimon commented Jan 5, 2024

I was talking to @josepalafox from GitHub today and he's interested in demo'ing the Action to the community and customers when it's ready. Jose mentioned we can use the GitHub PR diff API to get the diff between old and new dependencies. I don't recall how our earlier PoC worked. We were already using it I suppose?

Is someone working on the Action atm?

@spencerschrock
Copy link
Member

For historical context before closing:

Work on this was started shortly after in May 2023 in the https://github.com/ossf/scorecard-dependencyanalysis repo. The repo has since been archived due to a GitHub 1P action:

Update, June 2024: This repo is no longer maintained. Please use actions/dependency-review-action which can show Scorecard API data as of v4.2.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants