Simple PoC to implement lightweight write to Parquet on AWS Lambda.
Since there is no library that can write Parquets using pure python, it is not as straightforward as uploading a zip file to Lambda. These zip files contain pyx and pxd files (C ports) which if not given the proper environment, fails to run.
Therefore, this project uses Docker.
Create a repository in ECR and get it's URL. For example:
# Log into ECR
aws ecr get-login-password --region eu-west-2 | docker login --username AWS --password-stdin
# Build Image
$ docker build -t pyarrow_lambda .
# Tag image
docker tag pyarrow_lambda:latest
# Push Image
docker push
Create a new Lambda function
Select "Container Image" as build from option.
Give it a relevant name.
Use browse images to find the image you pushed above.
Select x86_64 architecture.
Perform additional configuration as necessary.
Create Function.
Write a test and execute.
You can start the container locally and send it events to test. If you want to see the output file created, you can mount it as a volume.
$ docker run pyarrow_lambda
# In a separate terminal
$ curl -X POST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"key": "value"}'