Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-lambda-python-alpha): bundling fails with .venv dir in "entry" dir #22585

Closed
jamesoff opened this issue Oct 20, 2022 · 12 comments · Fixed by #23670
Closed

(aws-lambda-python-alpha): bundling fails with .venv dir in "entry" dir #22585

jamesoff opened this issue Oct 20, 2022 · 12 comments · Fixed by #23670
Labels
@aws-cdk/aws-lambda-python bug This issue is a bug. effort/small Small work item – less than a day of effort p2

Comments

@jamesoff
Copy link

Describe the bug

If a .venv directory is present in the function source directory (i.e. the value of entry to PythonFunction), the asset bundling fails after apparently trying to find the Python binary inside it. This presumably occurs because in my virtualenv, bin/python is a symlink to the original Python used as part of the virtualenv creation.

I often create a virtualenv in my function's directory to allow for code completion etc while working on the code.

I don't believe this used to happen; this project was successfully deployed on 2.35.0a0 and fails with 2.{46,47}.0a0. I have not bisected the library versions to verify.

Expected Behavior

cdk synth would succeed.

Current Behavior

% cdk synth
#1 [internal] load build definition from Dockerfile
#1 sha256:2cd2e2070c44df558ddddd66c54d2d34cc57055097780ab0811ad693df39bb6e
#1 transferring dockerfile: 1.28kB done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:452d082e50a39ce5b9ce6ad02ec379aac436b131d15ce9c1bd04817a1aab4273
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for public.ecr.aws/sam/build-python3.9:latest
#3 sha256:c037d438954a63dd514169e8b70bd8c1f25f7620154f690e39b0cde3ca7a28d9
#3 DONE 0.5s

#4 [1/2] FROM public.ecr.aws/sam/build-python3.9@sha256:0c64875b97960e69e8f17a1c780e2efd2db3a80ca50bc1abc5967f6ee9b73e4d
#4 sha256:59d114c6ecb4d0f4ce048aa96caa848d13b68f73dee357a97d26f844713fe9aa
#4 DONE 0.0s

#5 [2/2] RUN     python -m venv /usr/app/venv &&     mkdir /tmp/pip-cache &&     chmod -R 777 /tmp/pip-cache &&     pip install --upgrade pip &&     mkdir /tmp/poetry-cache &&     chmod -R 777 /tmp/poetry-cache &&     pip install pipenv==2022.4.8 poetry &&     rm -rf /tmp/pip-cache/* /tmp/poetry-cache/*
#5 sha256:f19856fe4384001478c5a68440ec33198ae19c73da3a1809ae18bd0258b16b66
#5 CACHED

#6 exporting to image
#6 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
#6 exporting layers done
#6 writing image sha256:1fa9791ea44d050b3f7fd06a80efc36eb27ecad35727b902c659fce52cefd431 done
#6 naming to docker.io/library/cdk-8a5e4a750532e5be099ef9223a66e2fb24a66b4837382d6e05feb49d7c9dfdf1 done
#6 DONE 0.0s

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
Bundling asset CdkAppconfigChangeCalendarExtnStack/check_fn/Code/Stage...
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
cp: cannot stat ‘/asset-input/.venv/bin/python’: No such file or directory
cp: cannot stat ‘/asset-input/.venv/bin/python3’: No such file or directory
cp: cannot stat ‘/asset-input/.venv/bin/python3.9’: No such file or directory
jsii.errors.JavaScriptError:
  Error: Failed to bundle asset CdkAppconfigChangeCalendarExtnStack/check_fn/Code/Stage, bundle output is located at /Users/sewardjm/src/cdk-appconfig-change-calendar-extn/cdk.out/asset.eb48a51f2a1be51e1217c2c5c80976983ad77c46566c31399d2a69357bbd1499-error: Error: docker exited with status 1
      at AssetStaging.bundle (/private/var/folders/s7/_09xb8j906j6kxr9cd_dl9840000gr/T/jsii-kernel-Q5NaOz/node_modules/aws-cdk-lib/core/lib/asset-staging.js:2:614)
      at AssetStaging.stageByBundling (/private/var/folders/s7/_09xb8j906j6kxr9cd_dl9840000gr/T/jsii-kernel-Q5NaOz/node_modules/aws-cdk-lib/core/lib/asset-staging.js:1:4506)
      at stageThisAsset (/private/var/folders/s7/_09xb8j906j6kxr9cd_dl9840000gr/T/jsii-kernel-Q5NaOz/node_modules/aws-cdk-lib/core/lib/asset-staging.js:1:1867)
      at Cache.obtain (/private/var/folders/s7/_09xb8j906j6kxr9cd_dl9840000gr/T/jsii-kernel-Q5NaOz/node_modules/aws-cdk-lib/core/lib/private/cache.js:1:242)
      at new AssetStaging (/private/var/folders/s7/_09xb8j906j6kxr9cd_dl9840000gr/T/jsii-kernel-Q5NaOz/node_modules/aws-cdk-lib/core/lib/asset-staging.js:1:2262)
      at new Asset (/private/var/folders/s7/_09xb8j906j6kxr9cd_dl9840000gr/T/jsii-kernel-Q5NaOz/node_modules/aws-cdk-lib/aws-s3-assets/lib/asset.js:1:736)
      at AssetCode.bind (/private/var/folders/s7/_09xb8j906j6kxr9cd_dl9840000gr/T/jsii-kernel-Q5NaOz/node_modules/aws-cdk-lib/aws-lambda/lib/code.js:1:4628)
      at new Function (/private/var/folders/s7/_09xb8j906j6kxr9cd_dl9840000gr/T/jsii-kernel-Q5NaOz/node_modules/aws-cdk-lib/aws-lambda/lib/function.js:1:2803)
      at new PythonFunction (/private/var/folders/s7/_09xb8j906j6kxr9cd_dl9840000gr/T/jsii-kernel-Q5NaOz/node_modules/@aws-cdk/aws-lambda-python-alpha/lib/function.js:40:9)
      at Kernel._create (/private/var/folders/s7/_09xb8j906j6kxr9cd_dl9840000gr/T/tmps0mbv5hk/lib/program.js:7485:29)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/sewardjm/src/cdk-appconfig-change-calendar-extn/app.py", line 10, in <module>
    CdkAppconfigChangeCalendarExtnStack(
  File "/Users/sewardjm/src/cdk-appconfig-change-calendar-extn/.venv/lib/python3.10/site-packages/jsii/_runtime.py", line 109, in __call__
    inst = super().__call__(*args, **kwargs)
  File "/Users/sewardjm/src/cdk-appconfig-change-calendar-extn/cdk_appconfig_change_calendar_extn/cdk_appconfig_change_calendar_extn_stack.py", line 28, in __init__
    function = aws_python.PythonFunction(
  File "/Users/sewardjm/src/cdk-appconfig-change-calendar-extn/.venv/lib/python3.10/site-packages/jsii/_runtime.py", line 109, in __call__
    inst = super().__call__(*args, **kwargs)
  File "/Users/sewardjm/src/cdk-appconfig-change-calendar-extn/.venv/lib/python3.10/site-packages/aws_cdk/aws_lambda_python_alpha/__init__.py", line 599, in __init__
    jsii.create(self.__class__, self, [scope, id, props])
  File "/Users/sewardjm/src/cdk-appconfig-change-calendar-extn/.venv/lib/python3.10/site-packages/jsii/_kernel/__init__.py", line 321, in create
    response = self.provider.create(
  File "/Users/sewardjm/src/cdk-appconfig-change-calendar-extn/.venv/lib/python3.10/site-packages/jsii/_kernel/providers/process.py", line 350, in create
    return self._process.send(request, CreateResponse)
  File "/Users/sewardjm/src/cdk-appconfig-change-calendar-extn/.venv/lib/python3.10/site-packages/jsii/_kernel/providers/process.py", line 332, in send
    raise JSIIError(resp.error) from JavaScriptError(resp.stack)
jsii.errors.JSIIError: Failed to bundle asset CdkAppconfigChangeCalendarExtnStack/check_fn/Code/Stage, bundle output is located at /Users/sewardjm/src/cdk-appconfig-change-calendar-extn/cdk.out/asset.eb48a51f2a1be51e1217c2c5c80976983ad77c46566c31399d2a69357bbd1499-error: Error: docker exited with status 1

Reproduction Steps

CDK code:

from aws_cdk import Stack
from aws_cdk import aws_lambda
from aws_cdk import aws_lambda_python_alpha as aws_python
from constructs import Construct

class CdkAppconfigChangeCalendarExtnStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        function = aws_python.PythonFunction(
            self,
            "check_fn",
            index="index.py",
            runtime=aws_lambda.Runtime.PYTHON_3_9,
            handler="lambda_handler",
            entry="lambda",
        )

In the lambda/ directory,

python -v venv .venv

Then synth.

Possible Solution

I was hoping there would be an option to exclude files/directories from the build but couldn't see one in the docs.

Or, CDK should not attempt to use the existing virtualenv during the build, or emit a warning if one is found, which would have aided my troubleshooting.

Additional Information/Context

No response

CDK CLI Version

2.46.0 (build 5a0595e)

Framework Version

No response

Node.js Version

v18.10.0

OS

macOS 12.6

Language

Python

Language Version

Python 3.10.4

Other information

No response

@jamesoff jamesoff added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 20, 2022
@pharindoko
Copy link

same issue here...

@corymhall
Copy link
Contributor

I think the issue may be with the -L option here. (Would need to test to confirm).

bundlingCommands.push(`cp -rTL ${options.inputDir}/ ${options.outputDir}`);

Since we are running this in a docker container and we have only entry directory mounted inside the container, it probably doesn't make sense to try to copy the symlink.

@corymhall corymhall added p2 effort/small Small work item – less than a day of effort and removed needs-triage This issue or PR still needs to be triaged. labels Oct 21, 2022
@corymhall corymhall removed their assignment Oct 21, 2022
@pharindoko
Copy link

@corymhall thanks for updating this issue.
The main reason we use the .venv folder inside the source directory is because it makes it very easy to set the interpreter for debugging python in visual studio code.

@ryanandonian
Copy link
Contributor

Just chiming in to say I also have an issue with the build process including the .venv. The way it manifests for my project is slightly different- it doubles my synthesized asset bundle size by copying .venv (and subsequently all installed packages). I believe the same root problem is causing both issues, so I didn't create a new Issue however.

My setup is the same, using poetry config virtualenvs.in-project true so I can use VSCode with any environment controlled by the repo itself (only the lambda codebase runs 3.9, other projects run 3.10+, etc).

├── asset.f259b5b5d475e70bd0aba226d3d5af04c09dea9e8728664e7a17437581997a1a
│   └── python
│       ├── .DS_Store
│       ├── .dockerignore
│       ├── .gitignore
│       ├── .venv # <-- very large wasted space, 2x asset bundle size
│       ├── README.md
│       ├── __pycache__ # <-- also probably wasted space, see _Note_ at the end of this comment
│       ├── anyio
│       ├── anyio-3.6.2.dist-info

I believe a "most user friendly" solution that would solve both of these two issues would be to provide an exclusion list at the Construct level. The only problem with that is cp doesn't have a clean means to do that, it would have to switch to rsync to do it "correctly". With that list of excludes we can just append --exclude foo to rsync.

I noticed that pip already does vaguely similar stuff inline but when I tried to do that with Poetry it actually just removed my source .venv 🤦

https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-lambda-python/lib/packaging.ts#L54

exportCommand: `PIPENV_VENV_IN_PROJECT=1 pipenv lock -r > ${DependenciesFile.PIP} && rm -rf .venv`,

and I was able to "fix" (re: hack) my problem by altering the locally installed node_modules/@aws-cdk/aws-lambda-python-alpha/lib/bundling.js with a manual .venv exclusion using rsync as follows:

// from: node_modules/@aws-cdk/aws-lambda-python-alpha/lib/bundling.js#createBundlingCommand
// use rsync instead of cp to provide exclusion list
console.log(`<!!Manual hack!!>: "rsync -av --exclude '.venv' -r ${options.inputDir}/ ${options.outputDir}"`);
bundlingCommands.push(`rsync -v --exclude '.venv' -r ${options.inputDir}/ ${options.outputDir}`);
// bundlingCommands.push(`cp -rTL ${options.inputDir}/ ${options.outputDir}`);

I'd be happy to throw a PR together if that helps, I just haven't contributed yet so it'll be a bit before I get set up.

Proposed fix: Changing from cp to rsync allows us to provide a list of exclusions, and we can put .venv in there to circumvent both of these issues entirely.

Note Separately, I realized that if bundling is provided to the Code.fromAsset, it ignores the exclude parameter entirely (per comments in core/lib/fs/options.ts so I'm seeing lots of *.pyc files copied over from __pyache__ etc as well, in my output

@dobeerman
Copy link

The issue still persists in 2.50.0. Any updates? 🤔

@pharindoko
Copy link

@ryanandonian - nice approach

// from: node_modules/@aws-cdk/aws-lambda-python-alpha/lib/bundling.js#createBundlingCommand
// use rsync instead of cp to provide exclusion list
console.log(`<!!Manual hack!!>: "rsync -av --exclude '.venv' -r ${options.inputDir}/ ${options.outputDir}"`);
bundlingCommands.push(`rsync -v --exclude '.venv' -r ${options.inputDir}/ ${options.outputDir}`);
// bundlingCommands.push(`cp -rTL ${options.inputDir}/ ${options.outputDir}`);

So best way would be an additional property to exclude directories, right ?

@ryanandonian
Copy link
Contributor

ryanandonian commented Nov 10, 2022

So best way would be an additional property to exclude directories, right ?

To me that feels like a reasonable solution that keeps existing behavior as to not break clients, but also lets users customize it without too much mental overhead/complexity. I believe this will also fix @jamesoff 's original issue, since the bundler will just completely disregard .venv entirely, there'll be no symlinks to miss ;)

The only two minor concerns that I have with this as a long term are:

  1. It is predicated on the fact that rsync will be available to the images moving forward since it's not part of the default executables in some images
  2. I have not tested this in Windows environments, and I know there are some issues (usually around cygwin+NTFS) that might need additional testing (I'm on a Mac, so I can't do rigorous testing there). I typically use rsync with archive mode (-a) for this very reason, but I cannot assert it won't be an issue in a production environment when using an NTFS source volume.

Besides that, I have already "tested" this flow locally on OSX and it worked fairly cleanly with deployments to real production environments. (note: in the snippet I pasted, I didn't use -a but in practice I did use -a)

@ghost
Copy link

ghost commented Nov 12, 2022

I found a workaround for this issue by adjusting the directory structure.

Pointing entry to a folder without a venv fixes the problem. So I just placed the poetry package one folder deeper into the folder structure, while keeping the venv on the original folder level. Hereby you retain the use of a local venv for IDE functionality. Only downside is that you have to navigate through one more folder in your IDE.

The directory structure that is giving me the error:

.
└── test_lambda
    ├── .venv
    ├── README.md
    ├── index.py
    ├── poetry.lock
    └── pyproject.toml

The directory structure that solved the error:

.
└── test_lambda
    ├── .venv
    test_lambda
        ├── README.md
        ├── index.py
        ├── poetry.lock
        └── pyproject.toml

@pharindoko
Copy link

Hey - any update on this topic ?

@ryanandonian
Copy link
Contributor

Hey @pharindoko , I finally got around to getting some time to work on submitting what I feel is a proper fix for our issues. I opened #23670 , so hopefully I was able to align with the contributing guidelines and can get this fix approved by the team.

@pharindoko
Copy link

Thanks @ryanandonian
Have seen your PR and like this approach.

@mergify mergify bot closed this as completed in #23670 Feb 18, 2023
mergify bot pushed a commit that referenced this issue Feb 18, 2023
…meter (#23670)

A summary of this change is: change from use of `cp` to `rsync --exclude='x'` in the bundling commands for `poetry` based lambdas.


The intention of this PR is to enable the bundling code for Poetry projects to exclude certain files and/or folders from the bundled assets. Currently, if developing a python lambda using either `virtualenv` itself or a toolchain that leverages virtual environments (re: `poetry`, specifically with `virtualenv.in-project = true`, which is strongly recommended for leveraging python tools in VSCode), the bundling code will copy the entire folder passed in. This leads to copying the entire `.venv` directory into the bundled assets, even though the directory is ignored. Ultimately this leads to inflating the assets by the size of unzipped dependencies (`numpy`, for instance, is 50Mb by itself).

I verified this concept works in another project I maintain which leverages `@aws-cdk/aws-lambda-python-alpha` by manually editing the bundling file (I mentioned it in more detail in the linked issue #22585), but this temporary approach requires manually editing files from `node_modules`, so it is not a proper fix.



fixes #22585 


----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Construct Runtime Dependencies:

~* [ ] This PR adds new construct runtime dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-construct-runtime-dependencies)~

### New Features

* [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

~~**Note**: I was unable to implement an integration test. My plan was to verify an asset was ignored from the existing poetry integration test sample directory `packages/@aws-cdk/aws-lambda-python/test/lambda-handler-poetry` by passing in `['.ignorefile']` and confirming that asset was excluded, but I was unable to get the test working due to SSM parameters missing. I wasn't sure if I could bootstrap this and get it working.~~


*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-lambda-python bug This issue is a bug. effort/small Small work item – less than a day of effort p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants