Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chocolatey Package Caching and Checksums for Performance and Security #397

Closed
CMCDragonkai opened this issue Jul 2, 2022 · 27 comments · Fixed by #394
Closed

Chocolatey Package Caching and Checksums for Performance and Security #397

CMCDragonkai opened this issue Jul 2, 2022 · 27 comments · Fixed by #394
Assignees
Labels
development Standard development r&d:polykey:supporting activity Supporting core activity

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jul 2, 2022

Specification

The build:windows and subsequently integration:windows jobs involve using chocolatey to download packages to be usable on Windows. In particular nodejs and python.

The current configuration in .gitlab-ci.yml will redownload packages and reinstall each time. This is slow, error prone and results in our CI/CD getting rate limited from chocolatey (#394 (comment)).

On top of that the chocolatey community software packages on https://community.chocolatey.org/packages, which is the default source of packages is not entirely secure, or at least not recommended for organisational usage https://docs.chocolatey.org/en-us/community-repository/community-packages-disclaimer.

Furthermore when downloading and installing chocolatey packages, we are simply relying on the checksum specified by the package maintainer. This is the case with nixpkgs, but with nixpkgs we can pin our package set from nixpkgs-overlay, with chocolatey, this is not the case. We can get halfway there by doing TOFU, and acquiring the checksum for a trusted version of the package, and then enforce that installing the package again must also have the same checksum.

This is particularly important because some packages download things over the internet during installation (this is due to chocolatey lacking distribution rights, thus the "package" is just an instruction set on how to download & install, but not the actual software runtime being installed). Having a checksum specified at our CI/CD level just ensures some level of end-to-end trust of immutability.

Additional context

  • https://docs.chocolatey.org/en-us/faqs#what-is-a-trusted-package - verify that our packages are "trusted" first
  • https://docs.chocolatey.org/en-us/information/security#security-for-the-community-package-repository - this details the process of chocolatey packaging

    Packages are pushed to the site over HTTPS. The site grabs a SHA512 checksum of the package, then forwards it on to where packages are stored securely. You can see this package checksum in 0.9.10+ if you call choco info .

  • https://docs.chocolatey.org/en-us/features/host-packages#local-folder-unc-share-cifs
  • Based on https://docs.chocolatey.org/en-us/choco/commands/install, use the --checksum and below parameters and find out which we can use to ensure integrity of our installed packages on chocolatey.
         --checksum, --downloadchecksum, --download-checksum=VALUE
         Download Checksum - a user provided checksum for downloaded resources 
           for the package. Overrides the package checksum (if it has one).  
           Defaults to empty. Available in 0.10.0+.
    
         --checksum64, --checksumx64, --downloadchecksumx64, --download-checksum-x64=VALUE
         Download Checksum 64bit - a user provided checksum for 64bit downloaded 
           resources for the package. Overrides the package 64-bit checksum (if it 
           has one). Defaults to same as Download Checksum. Available in 0.10.0+.
    
         --checksumtype, --checksum-type, --downloadchecksumtype, --download-checksum-type=VALUE
         Download Checksum Type - a user provided checksum type. Overrides the 
           package checksum type (if it has one). Used in conjunction with Download 
           Checksum. Available values are 'md5', 'sha1', 'sha256' or 'sha512'. 
           Defaults to 'md5'. Available in 0.10.0+.
    
         --checksumtype64, --checksumtypex64, --checksum-type-x64, --downloadchecksumtypex64, --download-checksum-type-x64=VALUE
         Download Checksum Type 64bit - a user provided checksum for 64bit 
           downloaded resources for the package. Overrides the package 64-bit 
           checksum (if it has one). Used in conjunction with Download Checksum 
           64bit. Available values are 'md5', 'sha1', 'sha256' or 'sha512'. 
           Defaults to same as Download Checksum Type. Available in 0.10.0+.
    
  • There is also VERIFICATION.txt files available in the installation packages. Note that we are often using meta packages that then point to a particular package under dependencies. These dependencies then have Files, one of them being VERIFICATION.txt. This can be used to acquire checksums as well, however you have to check whether this is actually different or related to the package checksum. If it is possible to rely on the package checksum which itself contains checksums on its contents, then prefer to use the package checksum.
    verification
  • https://stackoverflow.com/questions/34268673/how-do-i-configure-a-local-chocolatey-repository
  • https://4sysops.com/archives/install-internalized-chocolatey-packages-from-your-offline-repository/

Tasks

  1. Iterate on this using our matrix-win-1 computer first before applying to gitlab cicd
  2. Setup to host chocolatey packages privately, probably over a directory that is cached by gitlab - https://docs.chocolatey.org/en-us/features/host-packages#local-folder-unc-share-cifs
  3. Follow this guide https://docs.chocolatey.org/en-us/guides/create/recompile-packages to internalise community packages that we are using nodejs and python into that directory
  4. Acquire the checksum of the trusted packages using choco info <packageName> and specify these checksums when performing a installation
  5. Ensure that choco install is installing from the our local directory source by using --source option, and not from the upstream community source, this should remove any 429 too many request rate limiting
@CMCDragonkai
Copy link
Member Author

We cannot bootstrap the cache directory in gitlab directly. It has to be done through the first time run of the job on cicd. So one has to check first if a particular directory has the relevant contents, if not, acquire it from the community upstream but with a predefined checksum that we embedded locally from our own development machines and matrix-win-1, and then subsequently if the cached directory (local source), has the package, install it from there using the --source.

Ideally this would be done automatically if we add our --source pointing to the local directory, such that if it doesn't exist in there, it will look it up in the chocolatey community repository. Try and see if the choco install command can do that with variations of the --source option. And google around on the issue boards to see if anybody is asking about this.

@CMCDragonkai
Copy link
Member Author

You may need my help on the windows machine due to admin access to work with chocolatey. Let's do this in the office.

@emmacasolin
Copy link
Contributor

This tutorial https://docs.chocolatey.org/en-us/features/host-packages#local-folder-unc-share-cifs for internalising a package doesn't work for nodejs or python because they're both missing files when I download them.

It should look like this (the tools folder is needed in particular)
image

But they both look like this
image

@CMCDragonkai
Copy link
Member Author

Can you search around why?

@emmacasolin
Copy link
Contributor

I think it's because nodejs/python don't have/require a native installer to use, so they're treated slightly differently by Chocolatey https://docs.chocolatey.org/en-us/faqs#what-distinction-does-chocolatey-make-between-an-installable-and-a-portable-application. There's a command you can run on the business version of Chocolatey to do the whole process automatically and they give an example of it with nodejs, so there should be a way to do it manually, I'll just need to work out how.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jul 5, 2022 via email

@emmacasolin
Copy link
Contributor

I've found a version that has the tools subfolder now - needed to download the "install" version https://community.chocolatey.org/packages/nodejs.install/16.14.2

@emmacasolin
Copy link
Contributor

Now that I'm using the correct version I've successfully created a package for nodejs

PS C:\Users\Emma Casolin\Downloads> choco pack
Chocolatey v1.1.0
Attempting to build package from 'nodejs.install.nuspec'.
Successfully created package 'C:\Users\Emma Casolin\Downloads\nodejs.install.16.14.2.nupkg' 

I haven't tested it yet though since I can't do so without overriding the currently installed version, but this .nupkg is the internalised package that can be stored.

@emmacasolin
Copy link
Contributor

What version of python are we using? I can't see it in the CI/CD.

@emmacasolin
Copy link
Contributor

Package checksums:

nodejs.install 16.14.2: 'hDCI5y4Xv1SlvW6cjl3ysKweD9YmVy608OK6oOZKRwSalmmHZpsEvCNCF2m8R3xPB1/+s3SQGlx31BNZ9BQkJg==' (SHA512)
python3 3.9.12: 'USRJHHgIwZiiAPb0bFgQBxrNlUnq4mqQVqma12sP8WO89sfo9t7cKNzi5oqIZuPV/AbHchTDXsPzYbahZuIHEA==' (SHA512)

@emmacasolin
Copy link
Contributor

I've internalised the packages for both nodejs and python now, as well as found the checksums for the versions of both packages that we're using. All that's left is configuring a folder as a source, but I'm not sure where this folder should go. I also still haven't installed either of the packages to test that they work - I think it would be better to do this on the CI/CD rather than on matrix-win-1 since they would override the installations there and I don't know if that can be undone if anything is broken.

@emmacasolin
Copy link
Contributor

You'd think it would be easier to "internalize" a portable application and harder to "internalize" a package with native installer. This is quite strange...

In the end it was indeed easier to internalise nodejs and python compared to packages with native installers since the installers were already bundled with the downloaded package rather than needing to be downloaded separately. So I didn't need to make any changes other than deleting the files that get regenerated when calling choco pack.

@CMCDragonkai
Copy link
Member Author

This internalizing process will need to occur in the CICD in order to prime the cache. Basically upon first run it would need to check a folder source, if not exists, download and internalised to the directory. Then subsequent runs install from the directory.

You would check if you can install from folder if it exists there and look for an error or conditional to switch to downloading from chocolatey.

@emmacasolin
Copy link
Contributor

I think with the conditionals and internalisation we might need a PowerShell script to do everything. Since we're not using Chocolatey Pro we don't have access to Chocolatey commands and options like download and install --download-cache that would make it easier to do these things without a script.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jul 6, 2022 via email

@emmacasolin
Copy link
Contributor

Some notes while I'm working on the script:

  1. The internalisation is done for the *.nupkg file, not the actual program/program installer, so that's the only thing we care about caching
  2. A lot of options aren't available to us without the licensed version of Chocolatey, for example, internalising packages from the command line and changing the install location for packages
  3. Because of the above point, we'll need to work out the default location for the *.nupkg files. On matrix-win-1 I think it's C:\ProgramData\chocolatey\lib\[packageName], however, the nupkg files there don't include the version number (the ones I downloaded manually do), so I need to double-check they're the right ones.
  4. The source can be specified directly during the choco install command. There's also a choco source command - I'm not sure if there's a benefit to using the source command over just doing it in the install command so I still need to look into this
  5. Also related to the above point, I'm not sure if choco install will fall back on using the default source if the specified one fails. If it doesn't then we'll need a second install command if the first one fails. Alternatively we could do a check for the existence of the nupkg files before attempting to install and only specifying the path as a source if we know it exists. If it doesn't exist then we also run the logic for internalising it after we download everything.
  6. I'm not sure if the checksum actually needs to be specified on the command line - using --checksum seems to me like it overrides the existing checksum, whereas --requirechecksum just requires packages to have a checksum, which I think is all we need. I would assume that if we require the package to have a checksum then it will use the checksum/perform a check against it, but I can look into this.

@CMCDragonkai
Copy link
Member Author

By internalizing the nupkg file is that also internalizing all the package contents, the goal is to avoid redownloading things over the internet.

Also I suspect that by adding a folder source as a separate command that it would enable falling back to chocolatey community source. That's how most package managers work. Not sure about the source option in install though.

@CMCDragonkai
Copy link
Member Author

I also provided some options required during install where you can pass the desired checksum. I posted it up somewhere.

@emmacasolin
Copy link
Contributor

By internalizing the nupkg file is that also internalizing all the package contents, the goal is to avoid redownloading things over the internet.

Also I suspect that by adding a folder source as a separate command that it would enable falling back to chocolatey community source. That's how most package managers work. Not sure about the source option in install though.

It internalises the installer. So nothing needs to be redownloaded over the internet but it does still need to re-create and reinstall the package contents.

I also provided some options required during install where you can pass the desired checksum. I posted it up somewhere.

Yeah --checksum is one of those, but all of them say they override the package checksum.

@emmacasolin
Copy link
Contributor

I've got the internalising part working on the ci/cd, now I just need to move the created file into our ./tmp directory to be cached. According to the gitlab logs the file is located here after it's created: C:\GitLab-Runner\builds\MatrixAI\open-source\js-polykey\nodejs.install.16.14.2.nupkg, but running the command Move-Item -Path "C:\GitLab-Runner\builds\MatrixAI\open-source\js-polykey\nodejs.install.16.14.2.nupkg" -Destination "..\tmp\chocolatey\nodejs.install" results in the following error:

Move-Item : Could not find a part of the path.
At C:\GitLab-Runner\builds\MatrixAI\open-source\js-polykey\scripts\choco-install.ps1:11 char:3
+   Move-Item -Path "C:\GitLab-Runner\builds\MatrixAI\open-source\js-po ...
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : WriteError: (C:\GitLab-Runne...l.16.14.2.nupkg:FileInfo) [Move-Item], DirectoryNotFoundE 
   xception
    + FullyQualifiedErrorId : MoveFileInfoItemIOError,Microsoft.PowerShell.Commands.MoveItemCommand

Would it be possible to use a gitlab variable there instead of writing out the path?

@CMCDragonkai
Copy link
Member Author

Isn't it possible to directly internalize into that directory in the first place? As for env variables, yea just use $CI_PROJECT_PATH or something. There's a list of predefined variables you can find on the gitlab wiki.

I'd suggest avoiding embedding any gitlab variables into our scripts. Our scripts should work without the presence of those variables. So all paths should be made relative to the script's directory.

In bash, I have a snippet that acquires the script's directory. Powershell will probably have their own mechanism. Google around for it.

@emmacasolin
Copy link
Contributor

Yes it looks like I can internalise directly into the directory we want using

--out, --outdir, --outputdirectory, --output-directory=VALUE
OutputDirectory - Specifies the directory for the created Chocolatey
  package file. If not specified, uses the current directory.

when running choco pack. I'll see if that works.

@emmacasolin
Copy link
Contributor

I've got the nodejs.install.16.14.2.nupkg now being created directly inside ./tmp/chocolatey/nodejs.install and that works on the CI/CD. I'll have to wait for the job to finish and upload the cache before I can check if we can rebuild from it.

Side note that the default mode for caching on the CI/CD is to only upload the cache if the job succeeds, but this means that the Jest cache (and also Homebrew/Chocolatey) would only be kept if all of the tests passed. Failed test data is still useful for running the tests on the next run, and we still want to keep other cache data even if tests fail, so I've set when to always for the cache of jobs generated by the build-platforms-generate.sh script, and I'll probably do the same for check-test-generate.sh.

# Cached directories shared between jobs & pipelines per-branch per-runner
cache:
  key: $CI_COMMIT_REF_SLUG
  when: 'always'
  paths:
    - ./tmp/npm/
    - ./tmp/ts-node-cache/
    # Homebrew cache is only used by the macos runner
    - ./tmp/Homebrew
    # Chocolatey cache is only used by the windows runner
    - ./tmp/chocolatey/
    # `jest` cache is configured in jest.config.js
    - ./tmp/jest/

@emmacasolin
Copy link
Contributor

The cache works and the CI/CD will attempt to build from it if it exists! There are some errors that I need to look into but they can most likely be fixed by adding command line options.

$ .\scripts\choco-install.ps1
The use of .nupkg or .nuspec in for package name or source is known to cause issues. Please use the package id from the nuspec `<id />` with `-s .` (for local folder where nupkg is found).
Chocolatey v0.10.15
Installing the following packages:
nodejs.install
By installing you accept licenses for the packages.
nodejs.install v12.10.0 already installed.
 Use --force to reinstall, specify a version to install, or try upgrade.
Chocolatey installed 0/1 packages. 
 See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).
Warnings:
 - nodejs.install - nodejs.install v12.10.0 already installed.
 Use --force to reinstall, specify a version to install, or try upgrade.
The use of .nupkg or .nuspec in for package name or source is known to cause issues. Please use the package id from the nuspec `<id />` with `-s .` (for local folder where nupkg is found).
Chocolatey v0.10.15
Installing the following packages:
python3
By installing you accept licenses for the packages.
python3 not installed. The package was not found with the source(s) listed.
 Source(s): 'C:\GitLab-Runner\builds\MatrixAI\open-source\tmp\chocolatey\python3'
 NOTE: When you specify explicit sources, it overrides default sources.
If the package version is a prerelease and you didn't specify `--pre`,
 the package may not be found.
Please see https://chocolatey.org/docs/troubleshooting for more 
 assistance.
Chocolatey installed 0/1 packages. 1 packages failed.
 See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).
Failures
 - python3 - python3 not installed. The package was not found with the source(s) listed.
 Source(s): 'C:\GitLab-Runner\builds\MatrixAI\open-source\tmp\chocolatey\python3'
 NOTE: When you specify explicit sources, it overrides default sources.
If the package version is a prerelease and you didn't specify `--pre`,
 the package may not be found.
Please see https://chocolatey.org/docs/troubleshooting for more 
 assistance.

For nodejs I think I just need to add the --version=16.14.2 option, and for python I may just have the wrong filename.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jul 7, 2022 via email

@emmacasolin
Copy link
Contributor

Chocolatey caching is now set up and working! All I needed to fix the errors above was to specify the version when installing (this has a similar effect to --force) and to fix the source path for the installations. They need the path to the folder containing the package, not the path to the package.

When a source is specified it overrides the default source, so because of this (and also since we only want to do the internalisation if it hasn't been done already), I'm running a different install command depending on whether the cache exists.

if ( Test-Path -Path ".\tmp\chocolatey\nodejs.install\nodejs.install.16.14.2.nupkg" -PathType Leaf ) {
  choco install nodejs.install --version="16.14.2" --source=".\tmp\chocolatey\nodejs.install" -y
} else {
  choco install nodejs --version="16.14.2" --require-checksums -y
  Rename-Item -Path "C:\ProgramData\chocolatey\lib\nodejs.install\nodejs.install.nupkg" -NewName "nodejs.install.nupkg.zip"
  Expand-Archive -LiteralPath "C:\ProgramData\chocolatey\lib\nodejs.install\nodejs.install.nupkg.zip" -DestinationPath "C:\ProgramData\chocolatey\lib\nodejs.install" -Force
  Remove-Item "C:\ProgramData\chocolatey\lib\nodejs.install\_rels" -Recurse
  Remove-Item "C:\ProgramData\chocolatey\lib\nodejs.install\package" -Recurse
  Remove-Item "C:\ProgramData\chocolatey\lib\nodejs.install\[Content_Types].xml"
  New-Item -Path ".\tmp\chocolatey\nodejs.install" -ItemType "directory"
  choco pack "C:\ProgramData\chocolatey\lib\nodejs.install\nodejs.install.nuspec" --outdir ".\tmp\chocolatey\nodejs.install"
}

if ( Test-Path -Path ".\tmp\chocolatey\python3\python3.3.9.12.nupkg" -PathType Leaf ) {
  choco install python3 --version="3.9.12" --source=".\tmp\chocolatey\python3" -y
} else {
  choco install python --version="3.9.12" --require-checksums -y
  Rename-Item -Path "C:\ProgramData\chocolatey\lib\python3\python3.nupkg" -NewName "python3.nupkg.zip"
  Expand-Archive -LiteralPath "C:\ProgramData\chocolatey\lib\python3\python3.nupkg.zip" -DestinationPath "C:\ProgramData\chocolatey\lib\python3" -Force
  Remove-Item "C:\ProgramData\chocolatey\lib\python3\_rels" -Recurse
  Remove-Item "C:\ProgramData\chocolatey\lib\python3\package" -Recurse
  Remove-Item "C:\ProgramData\chocolatey\lib\python3\[Content_Types].xml"
  New-Item -Path ".\tmp\chocolatey\python3" -ItemType "directory"
  choco pack "C:\ProgramData\chocolatey\lib\python3\python3.nuspec" --outdir ".\tmp\chocolatey\python3"
}

@emmacasolin
Copy link
Contributor

I've now switched to using choco source so that we only have one install command for each package. Adding a source is done as choco source add --name="nodejs" --source=".\tmp\chocolatey\nodejs.install" --priority=1 (setting the priority to 1 ensures that it is used before the default source, which has priority 0). Now, the install command is simply choco install nodejs.install --version="16.14.2" --require-checksums -y. Regardless of whether the package was already cached or not we're internalising it on every run. Most of the procedures needed for this are noops, and if not they just emit an error, which I'm silencing by setting -ErrorAction:SilentlyContinue for those commands (they're all PowerShell commands).

emmacasolin added a commit that referenced this issue Jul 11, 2022
Optimised Windows CI/CD setup by internalising and caching chocolatey packages

Fixes #397
emmacasolin added a commit that referenced this issue Jul 11, 2022
Optimised Windows CI/CD setup by internalising and caching chocolatey packages

Fixes #397
emmacasolin added a commit that referenced this issue Jul 12, 2022
Optimised Windows CI/CD setup by internalising and caching chocolatey packages

Fixes #397
emmacasolin added a commit that referenced this issue Jul 12, 2022
Optimised Windows CI/CD setup by internalising and caching chocolatey packages

Fixes #397
@CMCDragonkai CMCDragonkai added the r&d:polykey:supporting activity Supporting core activity label Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:supporting activity Supporting core activity
Development

Successfully merging a pull request may close this issue.

2 participants