Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate alternative version ideas for manifest-sourced dependencies #129

Closed
jonabc opened this issue Jan 22, 2019 · 4 comments · Fixed by #164
Closed

Investigate alternative version ideas for manifest-sourced dependencies #129

jonabc opened this issue Jan 22, 2019 · 4 comments · Fixed by #164
Labels
enhancement New feature or request

Comments

@jonabc
Copy link
Contributor

jonabc commented Jan 22, 2019

Currently, the manifest source uses the most git commit hash of the files in each dependency to determine the version.

This mostly works well but has an annoying edge case when licensed is run on an updated file before the file is committed. When the file change is then committed, the dependency will have a new latest git commit hash. licensed status will then report an error for the cached dependency content as being out of date.

It would be great to have an alternative method for finding the dependencies "version" that is reproducible.

One suggestion was to create a hash of the file contents for each file in the dependency. This would be great for small numbers of small files but if the dependency contains a large number of files or any number of very large files, calculating the hash could become expensive. Are there cheap ways to calculate hashes for large files and/or large numbers of files?

Any other ideas?

@jonabc jonabc added the enhancement New feature or request label Jan 22, 2019
@mlinksva
Copy link
Member

Another idea would be to emit a warning if licensed is run on modified file, at cost of licensed having to invoke git status each run if it has manifest-sourced dependencies? Not saying it is the right idea, just an idea.

@jonabc
Copy link
Contributor Author

jonabc commented May 6, 2019

@mlinksva 👋 apologies for the essay that you're about to read 😆 😬

Another idea would be to emit a warning if licensed is run on modified file, at cost of licensed having to invoke git status each run if it has manifest-sourced dependencies? Not saying it is the right idea, just an idea.

I'm a little hesitant for this approach because it wouldn't work well for automation or if logs of running licensed were unavailable down the line.

I have something coded up and (mostly) working that gets a hash from the file contents for each source. I'm running into a block though and would appreciate your thoughts while I continue investigation on a fix.

The versioning strategy would be used by the go source, however I'm not sure if I can accurately get all the files exclusive/related to a go dependency.

Go packages tend to be structured where a single Git repo can contain multiple packages, or at least multiple importable components.

For example https://github.com/hashicorp/golang-lru has two packages, with two import paths

  • github.com/hashicorp/golang-lru (package lru)
  • github.com/hashicorp/golang-lru/simplelru (package simplelru)

Licensed doesn't pay much attention to the package name, instead opting to use the package import path as the dependency name since those are the units of code that are imported between packages.

The problem I'm finding is that it's common for all packages in a repo to share a single set of metadata files (README, LICENSE) that are in the root of the repo. If I'm looking at the simplelru package above, that means that I have no knowledge of the LICENSE file at the root of the repo and would not be updated if the repo LICENSE changes. licensed is able to find this file through specialized logic that looks up the parent hierarchy, but that's done at license enumeration time and (if possible) I'd prefer to not trigger that logic to figure out a version string.

Thoughts? For comparison, versioning based on Git uses the most recent Git commit SHA from the import path directory - it suffers from the same problem where an update to the repo LICENSE file won't trigger a new version for any packages imported from subfolders.

@mlinksva
Copy link
Member

mlinksva commented May 6, 2019

I'm a little hesitant for this approach because it wouldn't work well for automation or if logs of running licensed were unavailable down the line.

When would an automated scenario create/modify files and expect licensed to test against those (ephemeral?) files?

The block/problem you describe in the penultimate paragraph in the comment above (LICENSE in parent directory) is a separate issue from the "annoying edge case when licensed is run on an updated file before the file is committed" though, right? I'm not sure how to address using either git commit or licensed own source hashing without some custom logic taking into account some files outside of module being pertinent to versioning.

@jonabc
Copy link
Contributor Author

jonabc commented May 6, 2019

The block/problem you describe in the penultimate paragraph in the comment above (LICENSE in parent directory) is a separate issue from the "annoying edge case when licensed is run on an updated file before the file is committed" though, right?

Yep

I'm not sure how to address using either git commit or licensed own source hashing without some custom logic taking into account some files outside of module being pertinent to versioning.

Given it's a pre-existing issue I think I'm going to move forward with the source hashing PR and leave that for another day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

2 participants