Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use restore keys to restore cache from partial key matches + cache manifest install directory #94

Closed
wants to merge 2 commits into from

Conversation

Be-ing
Copy link

@Be-ing Be-ing commented Aug 6, 2021

From the GitHub Actions documentation:
https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows#matching-a-cache-key

You can provide a list of restore keys to use when there is a
cache miss on key. You can create multiple restore keys ordered
from the most specific to least specific. The cache action
searches for restore-keys in sequential order. When a key doesn't
match directly, the action searches for keys prefixed with the
restore key. If there are multiple partial matches for a restore
key, the action returns the most recently created cache.

@Be-ing Be-ing marked this pull request as draft August 6, 2021 04:19
@Be-ing
Copy link
Author

Be-ing commented Aug 6, 2021

I am testing this here: https://github.com/tenacityteam/tenacity/pull/228

@Be-ing Be-ing force-pushed the restore_keys branch 8 times, most recently from 643f73f to f9e61bb Compare August 6, 2021 07:17
@Be-ing Be-ing marked this pull request as ready for review August 6, 2021 07:25
@Be-ing Be-ing force-pushed the restore_keys branch 4 times, most recently from 5400eaf to 0193e0f Compare August 6, 2021 15:01
@vadi2
Copy link
Contributor

vadi2 commented Aug 6, 2021

The force-pushing makes Github notifications be useless for this since one can't see any of the changes :(

@Be-ing
Copy link
Author

Be-ing commented Aug 6, 2021

Sorry for all the force pushing. I think I finally have the package caching working right.

Comment on lines 185 to 179
if (isWindows) {
paths.push(process.env.LOCALAPPDATA + "\\vcpkg");
} else {
paths.push(process.env.HOME + "/.cache/vcpkg");
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried caching both VPCKG_ROOT/packages and CMAKE_BINARY_DIR/vcpkg_installed but neither worked. Only caching these locations outside of both the vcpkg root and the CMake build directory works.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the approach described in the other comment about VCPKG_DEFAULT_BINARY_CACHE

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i look this right, the variable VCPKG_DEFAULT_BINARY_CACHE is not being set by the action.
My suggestion is that this action sets the env var value in order to explicitly drive where the vcpkg is going to install the built libraries.

Copy link
Author

@Be-ing Be-ing Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding that variable does not seem right. I think the most robust way to handle this would be checking if VCPKG_DEFAULT_BINARY_CACHE is set, and if so cache that location, otherwise use the default. https://vcpkg.io/en/docs/users/binarycaching.html#configuration

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many default locations that vcpkg can write the binary caching to, so the run-vcpkg action cannot know 100% what was the place vcpkg put the binary cache. This is the reason that the action must drive and be certain of the location.

I agree that as you stated, if VCPKG_DEFAULT_BINARY_CACHE is set and it is valid (assuming valid. means existing and writeable, something the action must check), the action must not override it. On the other hand if not set, the action should set it to a sensible and valid value.

@Be-ing Be-ing changed the title use restore keys to restore cache from partial key matches use restore keys to restore cache from partial key matches + fix package caching Aug 6, 2021
}
else if (userProvidedCommitId) {
core.info(`Using user provided vcpkg's Git commit id='${userProvidedCommitId}', adding it to the cache's key.`);
key += "localGitId=" + Utils.hashCode(userProvidedCommitId);
key += `_vcpkgGitCommit=${userProvidedCommitId}`;
restoreKeys.push(key);
}
else {
core.info(`No vcpkg's commit id was provided, does not contribute to the cache's key.`);
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to put the hash of the vcpkg.json file before the commit hash (#95) in the cache key. Once a project is setup I would think the vcpkg.json file would change infrequently. Updating the packages would simply be a matter of changing the commit hash/updating the submodule. When doing so, the cache should hit for the last cache with the same vcpkg.json to avoid having to rebuild every package. But this needs to wait for the hashFiles function to be published in the @actions/glob NPM package (actions/toolkit#830).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the hashFiles function really necessary? Or any hash function should be a good fit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose another hash function could do 🤷

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see that your point of using eactly the same hash function has its benefits: it would not invalidate existing keys. On the other hands, this PR already invalidate existing keys as the key changed its format, so no concern here. Any hash function would be a good fit.

@Be-ing Be-ing mentioned this pull request Aug 6, 2021
@Be-ing Be-ing force-pushed the restore_keys branch 2 times, most recently from f9e61bb to 914f819 Compare August 6, 2021 21:31
__tests__/basic.test.ts Show resolved Hide resolved
src/vcpkg-action.ts Outdated Show resolved Hide resolved
src/vcpkg-utils.ts Show resolved Hide resolved
src/vcpkg-utils.ts Outdated Show resolved Hide resolved
src/vcpkg-utils.ts Outdated Show resolved Hide resolved
"!" + vcpkgRootDir + directorySeparator + "buildtrees",
"!" + vcpkgRootDir + directorySeparator + "downloads" ];

if (isWindows) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of pushing hardcoded path, let's have an input that has a default value, and that can be set by the user as well.
Based on that input, the value of the environment variable VCPKG_DEFAULT_BINARY_CACHE must be set to the path where the cache is going to be stored. E.g., $VCPKG_ROOT/cache could be a valid default path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a need to add more complexity to this action. vcpkg already allows the user to control the sources used for binary caching with the VCPKG_BINARY_SOURCES environment variable. I think this action should unconditionally cache the default directories of the binary caching. If the user wants to use a different path for whatever reason, that path can be specified with VCPKG_BINARY_SOURCES and passed to the additionalCachedPaths parameter of this action. In this case I think the default location would be empty so there wouldn't be a harm in caching it. If vcpkg does still fall back to the default cache path, the user can add clear to the end of their VCPKG_BINARY_SOURCES environment variable.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think my other comment about VCPKG_DEFAULT_BINARY_CACHE is valid here as well. But since GitHub view is not really helpful in highlighting the exact lines this comment was referring too.

src/vcpkg-utils.ts Outdated Show resolved Hide resolved
Comment on lines 185 to 179
if (isWindows) {
paths.push(process.env.LOCALAPPDATA + "\\vcpkg");
} else {
paths.push(process.env.HOME + "/.cache/vcpkg");
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the approach described in the other comment about VCPKG_DEFAULT_BINARY_CACHE

@Be-ing Be-ing changed the title use restore keys to restore cache from partial key matches + fix package caching use restore keys to restore cache from partial key matches + cache manifest install directory Aug 6, 2021
@Be-ing
Copy link
Author

Be-ing commented Aug 6, 2021

After more experimentation I found that the way it was before without caching the vcpkg binary cache archives was indeed slightly better. Please take another look at the code. The issue was that the action was not caching the vcpkg_installed directory for manifest mode. I don't think users should have to specify that with the additionalCachedPaths argument.

@lukka
Copy link
Owner

lukka commented Aug 6, 2021

After more experimentation I found that the way it was before without caching the vcpkg binary cache archives was indeed slightly better. Please take another look at the code. The issue was that the action was not caching the vcpkg_installed directory for manifest mode. I don't think users should have to specify that with the additionalCachedPaths argument.

This contribution is great, thanks for submitting it! I will certainly take a look, and I totally agree on avoiding the additionalCachedPaths input.

I'd be a bit slower in answering over next couple of days (hopefully not), but I look forward for finalizing this work.
The final steps would be to run all the validation workflows by pointing all run-vcpkg to this branch (or i could do it after PR completion).

From the GitHub Actions documentation:
https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows#matching-a-cache-key

You can provide a list of restore keys to use when there is a
cache miss on key. You can create multiple restore keys ordered
from the most specific to least specific. The cache action
searches for restore-keys in sequential order. When a key doesn't
match directly, the action searches for keys prefixed with the
restore key. If there are multiple partial matches for a restore
key, the action returns the most recently created cache.
Comment on lines 170 to 171
const vcpkg_installed_path = path.resolve(vcpkgRootDir + "/../vcpkg_installed");
paths.push(vcpkg_installed_path);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would better be added to the getOrdinaryCachedPaths function in the runvcpkglib but I do not know how to edit that library and use that branch in this action for testing. This is my first time using Node other than quick tests with the REPL and my first time writing TypeScript. So maybe you could take care of changing the library and cherry pick just the first commit of this branch. In the meantime I will keep using this branch for Tenacity.

@Be-ing
Copy link
Author

Be-ing commented Aug 6, 2021

The final steps would be to run all the validation workflows by pointing all run-vcpkg to this branch (or i could do it after PR completion).

Is this something you would take care of? I don't know how to do that.

@Be-ing
Copy link
Author

Be-ing commented Aug 7, 2021

Upon further testing, it seems that caching the vcpkg_installed directory doesn't actually help. My tests earlier were just running vcpkg install in manifest mode. When the vcpkg_installed directory is restored from cache, vcpkg install is indeed very fast. However, then the CMake configure step rebuilds every package. I confirmed this locally by deleting my CMake build directory then reconfiguring CMake. When this occurs, vcpkg searches for packages in its binary cache and builds them if they are not found. It does not seem to check vcpkg_installed. I do not know if this is a bug or by design.

@Be-ing
Copy link
Author

Be-ing commented Aug 7, 2021

I have opened an issue for vcpkg to clarify if searching the binary cache before vcpkg_installed is by design or not. Refer to the issue for how to reproduce that locally.

vcpkg rebuilds all packages for a new CMake build directory when
caching only vcpkg_install in manifest mode unless they are in
the binary cache.
microsoft/vcpkg#19424
@lukka
Copy link
Owner

lukka commented Aug 10, 2021

After more experimentation I found that the way it was before without caching the vcpkg binary cache archives was indeed slightly better. Please take another look at the code. The issue was that the action was not caching the vcpkg_installed directory for manifest mode. I don't think users should have to specify that with the additionalCachedPaths argument.

I see two improvements/changes in this PR:
1- the run-vcpkg caches automatically the directory containing the archived binaries (i.e., one of the directory described in here).
2- The key of the cache is split to allow for _partial matches, so that old cache is reused although not exactly the one required. And while the workflow continues execution, vcpkg will take care of building and installing the missing packages and caching them with a new cache entry in the GitHub cache service.

My understanding is that in my testing caching <vcpkg_root>/installed or vcpkg_installed is always faster than caching the archives. This makes sense because the archives are not the final product, but an intermediate artifact. The final product are the installed directory.

Regarding point 2, there are many variables, and it can be low value. Whenever a project has established the dependencies, they may change once each month, and whenever rebuilding all cache from scratch takes six hours vs three hours, it may not be worth it.

I am open to discussion and especially interested to see the results in workflow runs that validates the benefits of point 1 and 2.

@lukka
Copy link
Owner

lukka commented Aug 10, 2021

Upon further testing, it seems that caching the vcpkg_installed directory doesn't actually help. My tests earlier were just running vcpkg install in manifest mode. When the vcpkg_installed directory is restored from cache, vcpkg install is indeed very fast. However, then the CMake configure step rebuilds every package. I confirmed this locally by deleting my CMake build directory then reconfiguring CMake. When this occurs, vcpkg searches for packages in its binary cache and builds them if they are not found. It does not seem to check vcpkg_installed. I do not know if this is a bug or by design.

The reason is that two different location of the vcpkg_installed directory are being set:

  • when using vcpkg install, vcpkg_installed is sibling to vcpkg.json.
  • when using cmake to start vcpkg (that is, when vcpkg.json and the vcpkgtoolchain is being used) then the location is CMAKE_BINARY_DIR/vcpkg_installed.

Basically we should not mix the "vcpkg install" scenario with "cmake to run vcpkg" scenario, or if we do, we need to drive it correctly enforcing same location (e.g. via env vars)

@Be-ing
Copy link
Author

Be-ing commented Aug 10, 2021

The reason is that two different location of the vcpkg_installed directory are being set

That is really confusing.

Basically we should not mix the "vcpkg install" scenario with "cmake to run vcpkg" scenario, or if we do, we need to drive it correctly enforcing same location (e.g. via env vars)

This action does not. I added a vcpkg install step in my GH Actions workflow separate from this action.

@lukka
Copy link
Owner

lukka commented Aug 27, 2021

@Be-ing thanks for this great contribution! I have not the time to validate nor test the changes, nor I got all the answers to my previous questions (see this comment). Anyhow the idea of using fallback keys is great, and also the one to automatically hash the vcpkg.json file. I am going to incorporate those changes ASAP directly in NPM packages and add test coverage as well.
Looking forward for feedback!

@lukka
Copy link
Owner

lukka commented Oct 17, 2021

@Be-ing this feature has been implemented in #104
Thanks for your contribution, let me know if run-vcpkg@v10 suites your needs!

@lukka lukka closed this Oct 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants