Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(core): Normalize hashing algorithm between different OSs #14555

Closed
TRANTANKHOA opened this issue May 6, 2021 · 23 comments · Fixed by #16945
Closed

(core): Normalize hashing algorithm between different OSs #14555

TRANTANKHOA opened this issue May 6, 2021 · 23 comments · Fixed by #16945
Labels
@aws-cdk/core Related to core CDK functionality effort/small Small work item – less than a day of effort feature/enhancement A new API to make things easier or more intuitive. A catch-all for general feature requests. feature-request A feature should be added or improved. p1

Comments

@TRANTANKHOA
Copy link

The Question

How can I have the same asset hash when building the stack from different OS?
I'm very reluctant to implement a custom hash for the source content but the snapshot test we have will fail when run in the CI container since the snapshot was created on Windows.

Environment

  • CDK CLI Version: 1.100.0
  • Module Version: 1.100.0
  • Node.js Version: v14.16.0
  • OS: Windows 10 Pro 19042.928
  • Language (Version): ~3.9.7

Other information

For example I have a PythonFunction as belows

new PythonFunction(this, 'TaskMonitoringLambda', {
    entry: path.join(__dirname, '..', 'lambda'),
    index: 'failed_task_notification.py',
    handler: 'push',
    runtime: Runtime.PYTHON_3_7,
    retryAttempts: this.node.tryGetContext('max_retries'),
    memorySize: 128,
    functionName: `${machineID}TaskMonitoringLambda`,
    assetHashType: AssetHashType.SOURCE, // see https://github.com/aws/aws-cdk/pull/12984
    environment: {
        'FAILED_TASK_TOPIC': failedTaskTopic.topicArn,
        'REGION': this.node.tryGetContext('region'),
        'ENV': env
    }
})

and the asset hash for this is stable between repeated builds in Windows, despite cleaning out the caches and docker images etc..
However the hash produced in Centos container is different.

@TRANTANKHOA TRANTANKHOA added guidance Question that needs advice or information. needs-triage This issue or PR still needs to be triaged. labels May 6, 2021
@github-actions github-actions bot added the @aws-cdk/core Related to core CDK functionality label May 6, 2021
@jogold
Copy link
Contributor

jogold commented May 6, 2021

What's in your lambda folder? Can you check if the file sizes are identical on Windows and CentOS?

The size is part of the hash:

return `${stat.size}:${hash.digest('hex')}`;

@peterwoodworth
Copy link
Contributor

@TRANTANKHOA let me know if the advice given by jogold was helpful!

@peterwoodworth peterwoodworth added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels May 6, 2021
@TRANTANKHOA
Copy link
Author

TRANTANKHOA commented May 7, 2021

thanks @jogold and @peterwoodworth . I wasn't interested in finding out why they are different since there may be more than one reasons. Ideally for our use case, the hash should be based on the textual content of the files only, not os-related factors including the LF/CRLF stuffs. Apparently this is an issue across many hashing algos I've checked and people seem having the same trouble across many languages too.

For now what I've settle with is building a hash dictionary on my dev. machine and check that dictionary into git so that my cdk app can use it when in another os.

For anyone having the same problem, the gist of the hack is as follows

  1. First build the docker/lambda directory hash as part of unit test
test('Update hash dictionary', async () => {
  const osNameString = osName();
  // console.log(`OS name: ${osNameString}`)
  if (osNameString.includes('Linux') == false) {
      // This snippet only run in dev machines
      const hashOptions = {
          folders: {
              exclude: ['.*', '**.*', '**node_modules', '**test', '**build', '**dist', '*env', 'bin', 'lib']
          },
          files: {
              exclude: ['**.d.ts'],
              include: ['**.ts', '**.json', '**.py', '**.sh', 'Dockerfile', 'requirements.txt']
          },
          algo: 'md5'
      }
      await hashElement('.', path.join(__dirname, '..'), hashOptions)
          .then(hash => {
              const fs = require('fs');
              const data = JSON.stringify(hash, null, 2);
              fs.writeFile("hash.json", data, function(err: NodeJS.ErrnoException) {
                  if (err) {
                      console.log(err);
                  }
              });
              // console.log(hash.children)
          })
          .catch(error => {
              return console.error('hashing failed:', error);
          });
  }
  })
  1. Lambda already allows custom hash as of v1.100.0 but not DockerImageAsset so you'd need
interface CustomProps extends ecr.DockerImageAssetProps {
    assetHashType: AssetHashType; // AssetHashType.CUSTOM see https://github.com/aws/aws-cdk/pull/12984
    assetHash: string; // must be specified
}

class CustomDockerImage extends ecr.DockerImageAsset{
    private props: CustomProps
    constructor(scope: Construct, id: string, props: CustomProps){
        super(scope, id, props);
        this.props = props;
    }
}

so that you can declare the docker image as

new CustomDockerImage(scope, 'Image', {
    directory: dockerDir,
    ...
    assetHash: dockerHash,
    assetHashType: AssetHashType.CUSTOM
})

then you can use snapshot test across os platforms. This is a bit of a hack but I wonder if you can build something similar to the core library for this?

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 7, 2021
@jogold
Copy link
Contributor

jogold commented May 7, 2021

@eladb do you know why the file size is included in the hash?

@peterwoodworth
Copy link
Contributor

@eladb I'd like to hear your take on this

@eladb
Copy link
Contributor

eladb commented May 26, 2021

I am not sure why we need the file size is included in the hash and I generally agree with @TRANTANKHOA that the hashing algorithm should normalize the inputs across OSs (for example, subdirectory paths come to mind as another thing that might be different, CRLF, etc).

We should revisit the hashing algorithm and remove/normalize any potential OS-specific elements.

@jogold is this something you'd be interested to look at?

@eladb eladb added effort/small Small work item – less than a day of effort p1 labels May 26, 2021
@jogold
Copy link
Contributor

jogold commented May 26, 2021

@jogold is this something you'd be interested to look at?

I will have a look at this, yes.

Changing the hash algorithm will reupload all assets to S3/ECR and update all stacks. Not really breaking for S3 assets but potentially breaking for ECR/Docker assets that have FROM instructions that are not pinned to a specific version. Is this for v1?

@peterwoodworth peterwoodworth added feature-request A feature should be added or improved. feature/enhancement A new API to make things easier or more intuitive. A catch-all for general feature requests. and removed guidance Question that needs advice or information. labels May 26, 2021
@peterwoodworth peterwoodworth changed the title (core): AssetHashType.SOURCE gives different hash between Windows vs. Linux (core): Normalize hashing algorithm between different OSs May 26, 2021
@Hi-Fi
Copy link
Contributor

Hi-Fi commented Jun 7, 2021

Is this going to be addressed with V2 CDK? We are facing same issue with NodeJS CDK.

  • Mac and Linux generate same asset IDs
  • With Windows, Docker (with Windows cloned repo), Docker (with Docker cloned repo) and Docker (with Docker cloned repo to directory that's not mounted from Windows) each generate different ID than *nix (and also different with each other).

In latter cases npm install is made in Docker as Projen is not working with Windows.

@ayozemr
Copy link

ayozemr commented Jul 29, 2021

Subscribed, interested on when this is addressed. Thanks for the work.

@peterwoodworth
Copy link
Contributor

Hey @jogold, I'm curious if you've made any progress on this or if you've hit any roadblocks

@jogold
Copy link
Contributor

jogold commented Aug 6, 2021

Hi @peterwoodworth, not yet. I should have time to address this before the end of August.

mergify bot pushed a commit that referenced this issue Aug 30, 2021
Replace CRLF with LF so asset hashes are identical across platforms.

The hash still includes the size but it is now the size after converting
line endings.

Addresses #14555 (closes it?)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@peterwoodworth
Copy link
Contributor

@jogold should this issue be closed?

@jogold
Copy link
Contributor

jogold commented Sep 6, 2021

@jogold should this issue be closed?

Yes, I think so.

@TRANTANKHOA feel free to reopen if v1.121.0 doesn't solve your issue.

TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this issue Sep 6, 2021
Replace CRLF with LF so asset hashes are identical across platforms.

The hash still includes the size but it is now the size after converting
line endings.

Addresses aws#14555 (closes it?)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
david-doyle-as24 pushed a commit to david-doyle-as24/aws-cdk that referenced this issue Sep 7, 2021
Replace CRLF with LF so asset hashes are identical across platforms.

The hash still includes the size but it is now the size after converting
line endings.

Addresses aws#14555 (closes it?)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@jmhossler
Copy link

I have a similar issue, except I'm seeing different asset hashes on arch linux and ubuntu. I've updated to v1.121.0 but I am still seeing this issue, we do snapshot tests so I'm having to manually update the snapshots to get it working.

I've compared with a coworker, and the lambda resource file sizes are all the same. Any recommendations of other things to check to understand why the asset hashes are different between arch linux and ubuntu?

@jogold
Copy link
Contributor

jogold commented Sep 9, 2021

@jmhossler can you share a minimal repro? Are you pulling the same files from a git repo and getting different hashes?

@jmhossler
Copy link

@jogold I'll try to do that this evening, this is currently happening in a private repo that I can't share. It's a bit hard for me because I don't have an ubuntu machine to validate it with, but if I can try to make a sample with a git runner to replicate the issue.

The files are in the git repo, yes, but I'm not pulling them down. The lambda files are in the same git repo as the CDK application.

@alfianabdi
Copy link

I have similar issue, between Ubuntu 20.04 and AmazonLinux 2. I am using cdk 1.122.0 (build ae09c16) with node 14.17.1.
I tried to clean up and clone the repo in both Ubuntu and AmazonLinux2 then install dependecies again.
Both introduce new change in assets.
Old hash: asset.f35a2260b4975da
New hash in Ubuntu: asset.d3221d8a94463af
New hash in Amazon Linux: asset.0f326c9258aa679

@jogold
Copy link
Contributor

jogold commented Sep 15, 2021

@alfianabdi what kind of asset? can you share some code?

@alfianabdi
Copy link

alfianabdi commented Sep 16, 2021

The assets for lambda function, I am able to reproduce with this sample
https://github.com/alfianabdi/cdk-lambda.git

@mlippens
Copy link

We have the same issue with function resources NodejsFunction and PythonFunction.

@ayozemr
Copy link

ayozemr commented Sep 27, 2021

I think I am seeing this also in DockerImageFunction. Doing diff in a mac throws asset change vs deployed with CI that uses ubuntu, when no code change is present. Code is a Java lambda

@jogold
Copy link
Contributor

jogold commented Oct 1, 2021

The assets for lambda function, I am able to reproduce with this sample alfianabdi/cdk-lambda.git

@alfianabdi I cannot reproduce this.

I'm doing the following both on Ubuntu and Amazon Linux (in Docker containers):

$ git clone https://github.com/alfianabdi/cdk-lambda.git
$ cd cdk-lambda/
$ npm i
$ cd functions/
$ npm i
$ cd ..
$ npm run cdk synth
$ ls cdk.out/

In both cases I'm getting asset.63c890dcd34187d75a5a75d0b691f96674cafe096cc9fe10f39168ff790a703d.

jogold added a commit to jogold/aws-cdk that referenced this issue Oct 13, 2021
The hash for a specific file in a directory include its relative path.
This gives different results on Linux vs Windows because of the
different path separator. The solution is to normalize the relative path
using forward slashes.

Affects directory assets with subdirectories.

Closes aws#14555
Closes aws#16928
@mergify mergify bot closed this as completed in #16945 Oct 13, 2021
mergify bot pushed a commit that referenced this issue Oct 13, 2021
The hash for a specific file in a directory include its relative path.
This gives different results on Linux vs Windows because of the
different path separator. The solution is to normalize the relative path
using forward slashes.

Affects directory assets with subdirectories.

Closes #14555
Closes #16928


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this issue Feb 21, 2022
The hash for a specific file in a directory include its relative path.
This gives different results on Linux vs Windows because of the
different path separator. The solution is to normalize the relative path
using forward slashes.

Affects directory assets with subdirectories.

Closes aws#14555
Closes aws#16928


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/core Related to core CDK functionality effort/small Small work item – less than a day of effort feature/enhancement A new API to make things easier or more intuitive. A catch-all for general feature requests. feature-request A feature should be added or improved. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants