-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove AWS SDK from Lambda Layer to reduce package size by 90.5% #1164
Comments
hey @mccauleyp - Thanks a lot creating this. Could you do a +1 for this feature request on Boto Modularization to help us solve this properly? We've been thinking of two strategies here but haven't had a chance to write it extensively, so I'll write from the top of my head so I can expand them later :):
I initially thought 1 was a great option over something like implicit namespaces, but then it struck me that a Lambda Layer/SAR App would still get to the same end result. Optimizing Lambda Layer/SAR App could eventually lead to known unknowns - e.g., boto X supports a feature we're using and it's not available at Lambda Runtime, or Data Wrangler also depending on boto. This led me to think, maybe we could add multiple SAR Apps as it's easier for us to create & maintain than public Layers, but the closer I look the more I see them as workarounds really. Maybe I'm thinking this the wrong way and there's a better solution to this - I'd appreciate any idea anyone has on this topic. In the meantime, I pinged the Data Wrangler team to have a chat around this topic, and will ping the Lambda team on whether 2 could be a reality if we have enough customers asking for it. cc @cakepietoast @am29d |
hey @mccauleyp - Thanks again for creating the issue. I have been pondering on this question when I worked on the layer. The biggest concern I had was the dependencies fragmentation, which is also in the nature of the layer functionality itself. If we remove the botocore (it's an easy fix) it will result in a) break existing code, b) we need to provide exact version match between botocore version and powertools (i.e. 1.22.0 works with <1.23.x and >=1.18.x), c) keep it documented well enough so developer can spot missing dependencies or version mismatch early. It might sound like we are reinventing a dependency management solution here. One short-term solution I see is to verify that powertools works with the botocore version shipped in the Lambda runtime and remove our shipped botocore package. This would couple the powertools layer to lambda runtime, and we need to verify what impact it might have for different powertools utilities. I see option 2 mentioned by Heitor as an ideal solution to have powertools in the embedded runtime. |
Spoke with the Data Wrangler team and they are removing boto3/botocore before building the Lambda Layer along with other data to fit within the Lambda code artifact limit - however this is not being tested so boto/botocore changes or new features might not work when using Lambda Layers. I went ahead and did a quick script[1] to check the size impact on removing boto, and if we were to cut a 2.0 with optional dependencies only as it's already planned:
If we were to do a post-build optimization before publishing a Lambda Layer/SAR App, we would need a mechanism to be certain this wouldn't impact existing customers, otherwise we would need a major version -- Integration testing running within Lambda runtime could give us that safety net. [1] Shell script to test size impact #!/bin/bash
declare -a BOTO_DEPS
declare -a XRAY_DEPS
declare -a POWERTOOLS_DEPS
DEST_DIR="/tmp/powertools-no-deps"
BOTO_DEPS=("boto*" "urllib3*" "*dateutil*" "s3transfer*" "*jmespath*" "*six*")
XRAY_DEPS=("*wrapt*" "*future*" "*aws_xray_sdk*" "libfuturize" "libpasteurize" "past" "bin")
POWERTOOLS_DEPS=("fastjsonschema")
function main() {
pip install aws-lambda-powertools -t ${DEST_DIR}
pushd ${DEST_DIR}
for d in "${BOTO_DEPS[*]}"; do rm -rf ${d}; done
for d in "${XRAY_DEPS[*]}"; do rm -rf ${d}; done
for d in "${POWERTOOLS_DEPS[*]}"; do rm -rf ${d}; done
popd
du -hs ${DEST_DIR}
}
main |
Hey @heitorlessa & @am29d, thanks for your replies! I've +1'd the boto modularization ticket as above as requested. To me it seems like removing boto* from the layer bundle is the most straightforward and easy thing to do for now, as Alex suggested. That's consistent with what the Data Wrangler team is doing, and naively it seems reasonable that Lambda Powertools should guarantee compatibility with the Lambda runtime versions of boto*. What's the motivation for not sticking with the built-in versions to begin with? I guess they're not updated as regularly? Another near-term solution might be releasing an optional layer archive with boto* removed, similar to the "extra dependencies" layer version that you already have. Then perhaps make that the default if/when it can be verified that Powertools is fully compatible with the built-in boto* versions. |
It has certainly been frustrating for me in the past that the lambda runtime is not updated frequently with the current botocore/boto3 versions. I'm not sure of the current state, but there have been times when the versions are years out of date. Which certainly does make testing un-fun, when the code works locally but then fails when deployed, without also packaging boto into the lambda. :( Just an observation. There probably are good ways for this project to handle the change, ensuring coverage for the services/methods it uses. I figure it will likely involve the user being a bit more aware of and responsible for their packaging and layer configuration. |
100% @lorengordon - I think we can meet in the middle. If we can run integration tests upon merging code in I thought of us using LambCI for that but it seems no longer maintained. One of those that require going back to the "whiteboarding" to figure out a good balance - We should download the latest code from GitHub, run integration tests within the Lambda runtime, and report back if a functionality isn't working. @mccauleyp as @lorengordon mentioned, these are updated irregularly - could be weeks, months, or even 1 year+. This could have the unexpected side effects, where your Lambda function works fine locally (unit tests, etc.) while not when deployed -- the Data Wrangler team isn't testing for that, it's something they want to fix too and whatever we come up with would be helpful to both teams (and whoever comes next). |
@heitorlessa I've voted for the boto modularization - but I'm curious if there are any internal feature requests for increasing the max lambda size? Seems like another pretty good solution would be a limit increase on the max size of Lambdas - any PFRs we could throw our weight behind on that side of things? It's been 256MB as long as I can remember.. |
Hey @mwarkentin the Container Image supports up to 10G, not Zip (256M). Container Image however has a perf impact, and depending on how big the zip is it could incur Perf impact too. I'll add a +1 to the PFR to increase the zip as I already know your company name ;-) You can also use #awswishlist on Twitter for it to get automatically added/bumped. |
@heitorlessa Care to share the specific PFR? I can add weight to that one also. |
👋 @heitorlessa just been looking into this a bit and bumped into https://github.com/aws/aws-lambda-base-images/tree/python3.9 . Not sure if is like-for-like but maybe you can have a chat with the maintainers ? (sadly they are not very responsive with the GH issues hence hard to know it from outside) |
@lorengordon the PFR is not public - Lambda doesn't have a public roadmap intake. As AWS staff, we can add customer names to it and use any details you can share - e.g. is this a blocker? a critical improvement? a nice to have? @DanyC97 thanks for the link! These would be for running OCI images on Lambda. They're not like-for-like per se with what's the provided runtime has. |
The latest versions of the AWS Data Wrangler and Lambda Powertools Layers now seem to be compatible. Wrangler version 2.13.0 was released a few days ago, and I'm able to deploy that with the latest Powertools layer without hitting the size limit. I'm using the Wrangler SAR option and the Powertools region-specific layer ARN:4. I didn't actually dig into the size differences to see what exactly changed. I think it's still worth trying to trim out the Boto dependencies if possible because there's not much overhead left for additional dependencies, but my original motivation for opening this ticket is resolved for now. |
EDIT: Clarify the idea of removing boto from Lambda Layer not PyPi. I'm moving this to our official roadmap and provide some updates for posterity. Lambda team confirmed runtime SDKs are being updated a few times a year, and the Runtime docs are being updated to reflect the latest pinned versions: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html This gives us confidence it's no longer 6-12 months outdated as we experienced in the past. It also means we can safely remove boto3/botocore from Lambda Layers/SAR App given our feature set; cc @am29d.
For this reason, we will work on two major pieces of work: 1/ an E2E test mechanism upon merge on In the ideal world, I'd like to invest in modularization by investing in Implicit Namespace Packages (PEP420). However I'm concerned on the operational cost to get this working smoothly on releases, and the potential attack vectors since anyone could publish a micropackage with our namespace -- I'd estimate 6 months of work to experiment, automate, and keep the maintenance costs low. If anyone reading this later have experience with Python microlibs and Poetry, please do reach out - @heitor_lessa. Thank you everyone! |
Update: @am29d is working on migrating our internal Lambda Layers pipeline to CodePipeline in a Powertools AWS account. In parallel, @mploski is working on our E2E test framework. Both will give us the confidence we need to remove Boto at runtime. You can follow progress here: github.com/orgs/awslabs/projects/51/views/11 |
We found a solution! @mploski will be working on some rigorous tests to ensure we don't break anyone when including a ~500K botocore in the final PyPi asset and Lambda Layer, but I'm super super excited we've made progress here with the help from the AWS Python SDK team <3 |
We're enabling E2E at the merge level and will be able to test this more confidently as we increase coverage. Meantime, @mploski is finishing another project in his day-to-day role and will resume exploratory tests on the new squashed boto to see whether we accidentally cause any conflict with the Lambda's boto runtime dep |
We ran some load testing to measure the impact of the layer size in the coldstart. Findings: 1/ Layer size doesn't seems to have a meaningful impact on the coldstart time (at least when the layer doubles in size and it's < 5M) Based on this we're launching the v2 layer with compiled native code for each architecture. Current Lambda Layer size: 2.6MB |
@mccauleyp it took us a while but you'll be please to see V2 results this month ;) we now have E2E framework in place to detect regressions in case Lambda runtime changes a dependency ahead of a release. Thank you for sticking with us all this time! |
For reference here are the benchmark results we did to measure cold start impact of loading the different libraries: arm64, 128mb Baseline (empty handler, no powertools):
Just logger:
Just parser:
Logger + Parser:
Just tracer:
|
Changing it to Coming soon as we expect to launch V2 with these optimizations by EOW. Updates: #1459 (comment) |
|
Is your feature request related to a problem? Please describe.
In trying to upgrade to v1.22.0, I ran into the deployment error
Layers consume more than the available size of 262144000 bytes
. This is for a Lambda function that uses both the Lambda Powertools Layer and the latest AWS Data Wrangler Layer. I think the new Powertools layer is only a little larger than the previous version but enough to tip over the limit when combined with the AWS Data Wrangler layer.I pulled down the Powertools zip contents following the "Get the Layer .zip contents" instructions, and it looks like most of the size (~70 MB unzipped) comes from botocore (~63 MB).
Is it necessary to include botocore and boto3 in the layer's zip file given that these are available by default in the Lambda runtime? If not, it would be helpful to remove them from the pre-built layer to help avoid hitting the overall size limit.
Describe the solution you'd like
Remove botocore and boto3 from the Lambda layer zip archive.
Describe alternatives you've considered
I could build my own layer that doesn't include botocore and boto3. I tried this already by downloading the existing layer .zip, deleting those packages, zipping it back up and deploying it. That seems to work in my application, but it would be great if creating my own layer wasn't necessary.
Additional context
You could replicate the error by trying to deploy a function with both the pre-built AWS Lambda Powertools and AWS Data Wrangler layers included. The code size coming from my application is negligible compared to the layer sizes.
P.S. Thanks for the great resource! This has been a very helpful package for me and my team :)
The text was updated successfully, but these errors were encountered: