Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor out reproducibility flags for tar and gzip #6884

Merged
merged 8 commits into from
Aug 5, 2020
Merged

Factor out reproducibility flags for tar and gzip #6884

merged 8 commits into from
Aug 5, 2020

Conversation

aherrmann-da
Copy link
Contributor

@aherrmann-da aherrmann-da commented Jul 27, 2020

The results of tar and gzip are not reproducible by default due to file ordering or metadata like timestamps, see https://reproducible-builds.org/docs/archives/ for details. These issues can be avoided with the right set of command-line flags to tar and gzip. This PR factors these flags out into dedicated sh_binary targets that can be invoked by genrules and other rules producing archives. This way we don't need to repeat the same set of command line flags over and over and don't run the risk of accidentally forgetting a flag in some place.

Additionally, package-app is converted into a sh_binary target as well to ensure that runfiles are handled correctly.

I have compared the SDK release tarball before and after this change. The only differences were found in daml-sdk/daml-sdk.jar and studio/daml-bundled.vsix due to file ordering and differences in source maps in the navigator bundle. I've manually verified that the navigator and vscode extension work by going through the steps under point 13 in release/RELEASE.md. (daml-sdk-head and sdk-version: 0.0.0)

Pull Request Checklist

  • Read and understand the contribution guidelines
  • Include appropriate tests
  • Set a descriptive title and thorough description
  • Add a reference to the issue this PR will solve, if appropriate
  • Include changelog additions in one or more commit message bodies between the CHANGELOG_BEGIN and CHANGELOG_END tags
  • Normal production system change, include purpose of change in description

NOTE: CI is not automatically run on non-members pull-requests for security
reasons. The reviewer will have to comment with /AzurePipelines run to
trigger the build.

@aherrmann-da aherrmann-da force-pushed the reprotar branch 5 times, most recently from 232390b to 926f1e6 Compare July 30, 2020 08:04
Comment on lines +30 to +40
case "$(uname -s)" in
Darwin)
abspath() { python -c 'import os.path, sys; sys.stdout.write(os.path.abspath(sys.argv[1]))' "$@"; }
canonicalpath() { python -c 'import os.path, sys; sys.stdout.write(os.path.realpath(sys.argv[1]))' "$@"; }
;;
*)
abspath() { realpath -s "$@"; }
canonicalpath() { readlink -f "$@"; }
;;
esac
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to NixOS/nixpkgs#94222 readlink -f doesn't work on MacOS within ctx.actions.run.

Comment on lines -219 to -225
# Remove once we upgrade to Bazel >=3.0. Until then `nix-build` output
# confuses the JAR query in `daml-sdk-head`.
quiet = True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now using Bazel 3.3.1.

Copy link
Contributor

@cocreature cocreature left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Sorry had somehow forgotten to click approve.

--owner=0 --group=0 --numeric-owner --mtime=2000-01-01\\ 00:00Z --sort=name \\
| gzip -n > $@
# Avoid file path too long errors on Windows of the form
# .../tar_dev_env/usr/bin/tar: node_modules/.cache/terser-webpack-plugin/...: file name is too long (cannot be split); not dumped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed here? Shouldn’t the paths still be the same as before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a consequence of --format=ustar. Looking at IEEE Std 1003.1-1988 the filename is limited to 256 characters split into two components separated by /: a prefix of at most 155 characters and the filename of at most 100 characters. In this case the filename 97e9576e421fc74fc3b3eed0f6f0ab3be937d0f110bacca2151119236a1a6fc2a5262b044ec008be7300e3489eb49631d1f0572d08ca99384f2f9182ab57 exceeds the 100 character limit.

If need longer file names at some point then reproducible-builds.org recommends

To avoid this, either unset POSIXLY_CORRECT (only works with tar>1.32) or add to the tar call --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime or --format=gnu (both only available in GNU tar) or use --format=ustar if the limitations in that format are not a problem.

to avoid non-deterministic PAX headers. However, it places some restrictions on the tar version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for the explanation!

As stated in the comment this is no longer required with Bazel >= 3.0.
This way Bazel will manage the runtime dependencies tar, gzip, mktgz,
and patchelf.

package-app.sh changes directory so it needs to make sure that all paths
are absolute and that the runfiles tree/manifest location is forwarded
to programs called by package-app.sh.
changelog_begin
changelog_end
@mergify mergify bot merged commit cf24597 into master Aug 5, 2020
@mergify mergify bot deleted the reprotar branch August 5, 2020 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants