-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(core): Normalize hashing algorithm between different OSs #14555
Comments
What's in your The size is part of the hash:
|
@TRANTANKHOA let me know if the advice given by jogold was helpful! |
thanks @jogold and @peterwoodworth . I wasn't interested in finding out why they are different since there may be more than one reasons. Ideally for our use case, the hash should be based on the textual content of the files only, not os-related factors including the LF/CRLF stuffs. Apparently this is an issue across many hashing algos I've checked and people seem having the same trouble across many languages too. For now what I've settle with is building a hash dictionary on my dev. machine and check that dictionary into git so that my cdk app can use it when in another os. For anyone having the same problem, the gist of the hack is as follows
test('Update hash dictionary', async () => {
const osNameString = osName();
// console.log(`OS name: ${osNameString}`)
if (osNameString.includes('Linux') == false) {
// This snippet only run in dev machines
const hashOptions = {
folders: {
exclude: ['.*', '**.*', '**node_modules', '**test', '**build', '**dist', '*env', 'bin', 'lib']
},
files: {
exclude: ['**.d.ts'],
include: ['**.ts', '**.json', '**.py', '**.sh', 'Dockerfile', 'requirements.txt']
},
algo: 'md5'
}
await hashElement('.', path.join(__dirname, '..'), hashOptions)
.then(hash => {
const fs = require('fs');
const data = JSON.stringify(hash, null, 2);
fs.writeFile("hash.json", data, function(err: NodeJS.ErrnoException) {
if (err) {
console.log(err);
}
});
// console.log(hash.children)
})
.catch(error => {
return console.error('hashing failed:', error);
});
}
})
interface CustomProps extends ecr.DockerImageAssetProps {
assetHashType: AssetHashType; // AssetHashType.CUSTOM see https://github.com/aws/aws-cdk/pull/12984
assetHash: string; // must be specified
}
class CustomDockerImage extends ecr.DockerImageAsset{
private props: CustomProps
constructor(scope: Construct, id: string, props: CustomProps){
super(scope, id, props);
this.props = props;
}
} so that you can declare the docker image as new CustomDockerImage(scope, 'Image', {
directory: dockerDir,
...
assetHash: dockerHash,
assetHashType: AssetHashType.CUSTOM
}) then you can use snapshot test across os platforms. This is a bit of a hack but I wonder if you can build something similar to the core library for this? |
@eladb do you know why the file size is included in the hash? |
@eladb I'd like to hear your take on this |
I am not sure why we need the file size is included in the hash and I generally agree with @TRANTANKHOA that the hashing algorithm should normalize the inputs across OSs (for example, subdirectory paths come to mind as another thing that might be different, CRLF, etc). We should revisit the hashing algorithm and remove/normalize any potential OS-specific elements. @jogold is this something you'd be interested to look at? |
I will have a look at this, yes. Changing the hash algorithm will reupload all assets to S3/ECR and update all stacks. Not really breaking for S3 assets but potentially breaking for ECR/Docker assets that have |
Is this going to be addressed with V2 CDK? We are facing same issue with NodeJS CDK.
In latter cases |
Subscribed, interested on when this is addressed. Thanks for the work. |
Hey @jogold, I'm curious if you've made any progress on this or if you've hit any roadblocks |
Hi @peterwoodworth, not yet. I should have time to address this before the end of August. |
Replace CRLF with LF so asset hashes are identical across platforms. The hash still includes the size but it is now the size after converting line endings. Addresses #14555 (closes it?) ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@jogold should this issue be closed? |
Yes, I think so. @TRANTANKHOA feel free to reopen if v1.121.0 doesn't solve your issue. |
Replace CRLF with LF so asset hashes are identical across platforms. The hash still includes the size but it is now the size after converting line endings. Addresses aws#14555 (closes it?) ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Replace CRLF with LF so asset hashes are identical across platforms. The hash still includes the size but it is now the size after converting line endings. Addresses aws#14555 (closes it?) ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
I have a similar issue, except I'm seeing different asset hashes on arch linux and ubuntu. I've updated to v1.121.0 but I am still seeing this issue, we do snapshot tests so I'm having to manually update the snapshots to get it working. I've compared with a coworker, and the lambda resource file sizes are all the same. Any recommendations of other things to check to understand why the asset hashes are different between arch linux and ubuntu? |
@jmhossler can you share a minimal repro? Are you pulling the same files from a git repo and getting different hashes? |
@jogold I'll try to do that this evening, this is currently happening in a private repo that I can't share. It's a bit hard for me because I don't have an ubuntu machine to validate it with, but if I can try to make a sample with a git runner to replicate the issue. The files are in the git repo, yes, but I'm not pulling them down. The lambda files are in the same git repo as the CDK application. |
I have similar issue, between Ubuntu 20.04 and AmazonLinux 2. I am using cdk 1.122.0 (build ae09c16) with node 14.17.1. |
@alfianabdi what kind of asset? can you share some code? |
The assets for lambda function, I am able to reproduce with this sample |
We have the same issue with function resources |
I think I am seeing this also in |
@alfianabdi I cannot reproduce this. I'm doing the following both on Ubuntu and Amazon Linux (in Docker containers): $ git clone https://github.com/alfianabdi/cdk-lambda.git
$ cd cdk-lambda/
$ npm i
$ cd functions/
$ npm i
$ cd ..
$ npm run cdk synth
$ ls cdk.out/ In both cases I'm getting |
The hash for a specific file in a directory include its relative path. This gives different results on Linux vs Windows because of the different path separator. The solution is to normalize the relative path using forward slashes. Affects directory assets with subdirectories. Closes aws#14555 Closes aws#16928
The hash for a specific file in a directory include its relative path. This gives different results on Linux vs Windows because of the different path separator. The solution is to normalize the relative path using forward slashes. Affects directory assets with subdirectories. Closes #14555 Closes #16928 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
|
The hash for a specific file in a directory include its relative path. This gives different results on Linux vs Windows because of the different path separator. The solution is to normalize the relative path using forward slashes. Affects directory assets with subdirectories. Closes aws#14555 Closes aws#16928 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
The Question
How can I have the same asset hash when building the stack from different OS?
I'm very reluctant to implement a custom hash for the source content but the snapshot test we have will fail when run in the CI container since the snapshot was created on Windows.
Environment
Other information
For example I have a PythonFunction as belows
and the asset hash for this is stable between repeated builds in Windows, despite cleaning out the caches and docker images etc..
However the hash produced in Centos container is different.
The text was updated successfully, but these errors were encountered: