Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Example #34

Open
bvandermeersch opened this issue Mar 23, 2020 · 3 comments
Open

Python Example #34

bvandermeersch opened this issue Mar 23, 2020 · 3 comments

Comments

@bvandermeersch
Copy link

Seems the python example no longer works that is located here:

https://github.com/vladgolubev/serverless-libreoffice/blob/master/STEP_BY_STEP.md

Seems tar/zip is no longer available in amazon linux 2 python3.8

Ive also attempted to make it a layer, but I get this error when running it:

sh: /opt/instdir/program/soffice: No such file or directory

Even though it clearly is there.

Anyone else get this working?

@bvandermeersch
Copy link
Author

I had to use AWS Lambda Python3.6 to make this work.

Python3.8 Runtime for AWS Lambda does not include curl, tar or zip any more among other packages.

@vcrusselle
Copy link

vcrusselle commented Apr 9, 2020

While I think I got to the same spot where you got. I changed the /opt/instdir/program/soffice to /opt/instdir/program/soffice.bin but then I got

sh: instdir/program/soffice.bin: Permission denied

While this is probably laughable here is my code:

import boto3
import os

s3_bucket = boto3.resource("s3").Bucket("************")
convertCommand = "instdir/program/soffice.bin --headless --invisible --nodefault --nofirststartwizard --nolockcheck --nologo --norestore --convert-to pdf --outdir /tmp"

client = boto3.client('s3')
resource = boto3.resource('s3')

def download_dir(client, resource, dist, local='/tmp', bucket='your_bucket'):
    paginator = client.get_paginator('list_objects')
    for result in paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=dist):
        if result.get('CommonPrefixes') is not None:
            for subdir in result.get('CommonPrefixes'):
                download_dir(client, resource, subdir.get('Prefix'), local, bucket)
        for file in result.get('Contents', []):
            dest_pathname = os.path.join(local, file.get('Key'))
            if not os.path.exists(os.path.dirname(dest_pathname)):
                os.makedirs(os.path.dirname(dest_pathname))
            resource.meta.client.download_file(bucket, file.get('Key'), dest_pathname)
        def lambda_handler(event,context):
    print("Starting Process")
    print("Starting Download")
    download_dir(client, resource, 'instdir/', '/tmp', bucket='********')
    print("Download Complete")
    
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']
        
        print("Starting Conversion")
        print(os.system("cd /tmp && ls"))
        print(os.system("cd /tmp/instdir && ls"))
        print(os.system("cd /tmp/instdir/program && ls"))
        # Execute libreoffice to convert input file
        
        
        os.system(f"cd /tmp && sudo {convertCommand} {key}")
        print("Conversion Complete")

        # Save converted object in S3
        print("Starting Save")
        outputFileName, _ = os.path.splitext(key)
        outputFileName = outputFileName  + ".pdf"
        f = open(f"/tmp/{outputFileName}","rb")
        s3_bucket.put_object(Key=outputFileName,Body=f,ACL="private")
        print("Saving Complete")
        f.close()

I was not able to figure anything about how to get around the missing curl, tar and other dependencies so I decompress the file and uploaded it to an s3 bucket. I work with a company that has to have relatively not dependencies because we work with sensitive data all the time. So I went through all the steps on here but have hit a snag with the 3.8 solution. Guess I will have to settle for the 3.6 solution for now.

@vcrusselle
Copy link

So while I could have worked with the NPM module to get tar and brotli to decompress the file I decided to to decompress it locally on my machine (using peazip) and upload just a zip file of the file to a different S3 bucket from the drop buck and the output bucket.

Note: The zip file decompression is handled in memory and not saved to the system until extractall("/tmp") so you may need to allocate roughly 200-300 more memory to this function if you are under the max. I used 1600 for this code example.

Below I have working python 3.8 code:

import boto3
import os
from zipfile import ZipFile
from io import BytesIO

s3_bucket = boto3.resource("s3").Bucket("************-output") #output bucket
zip_obj = boto3.resource("s3").Object(bucket_name="*********-pdf", key="instdir.zip") #bucket that has your zip file in
buffer = BytesIO(zip_obj.get()["Body"].read())
z = ZipFile(buffer)
z.extractall("/tmp")

convertCommand = "instdir/program/soffice.bin --headless --norestore --invisible --nodefault --nofirststartwizard --nolockcheck --nologo --convert-to 'pdf:writer_pdf_Export' --outdir /tmp"
resource = boto3.resource('s3')

def lambda_handler(event,context):

    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key'].replace("+"," ")
    
        # Execute libreoffice to convert input file
        print("Elevating Permissions")
        os.system("chmod u+x /tmp/instdir/program/soffice.bin")
        print("Permissions Elevated")
        print("Downloading File")
        resource.meta.client.download_file(bucket, f"{key}", f"/tmp/{key}")
        print("File Downloaded")
        print("Starting Conversion")
        #not sure why you have to run this twice but it works on the second one consistently
        os.system(f"cd /tmp && {convertCommand} '{key}'")
        os.system(f"cd /tmp && {convertCommand} '{key}'")
        print("Conversion Complete")

        # Save converted object in S3
        print("Starting Save")
        outputFileName, path = os.path.splitext(key)
        outputFileName = outputFileName  + ".pdf"
        f = open(f"/tmp/{outputFileName}","rb")
        s3_bucket.put_object(Key=outputFileName,Body=f,ACL="private")
        print("Saving Complete")
        f.close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants