Skip to content

achimgaedke/py-aws-vault-auth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

py-aws-vault-auth

This is a wrapper for the aws-vault command.
This is not an interface to the AWS (glacier) vault or AWS secrets manager.

Introduction

(Re)-Authenticate for AWS services using aws-vault within a python session (e.g jupyter notebooks):

import py_aws_vault_auth
import boto3

boto_auth = py_aws_vault_auth.authenticate("DataScience", return_as="boto")
c = boto3.client("s3", **boto_auth)
c.list_objects_v2(Bucket="your-bucket")

In a Jupyterlab notebook this looks like this

py_aws_vault_auth dialogue in Jupyterlab notebook

or in a VSCode notebook

py_aws_vault_auth dialogue in VSCode notebook

by virtue of a context-adjusted version of the builtin input function - these are auto-magically provided by Jupyterlab and VSCode.

Credentials for S3 access

That's probably the most prominent data-science usecase...

With boto3 (that works of course for all other services):

boto_auth = py_aws_vault_auth.authenticate("DataScience", return_as="boto")
import boto3
c = boto3.client("s3", **boto_auth)
c.list_objects_v2(Bucket="your-bucket")

With s3fs

s3fs_auth = py_aws_vault_auth.authenticate("DataScience", return_as="s3fs")
import s3fs
fs = s3fs.S3FileSystem(**s3fs_auth)
fs.ls("s3://my-bucket/")

With pandas (implicitly via fsspec and s3fs)

s3fs_auth = py_aws_vault_auth.authenticate("DataScience", return_as="s3fs")
import pandas
pandas.read_csv("s3://my-bucket/my_file",
                storage_options=s3fs_auth
)

Credentials as environment variables

Just add the credentials to the environment of a supbrpocess

environ_auth = py_aws_vault_auth.authenticate("DataScience", return_as="environ")
import os, subprocess
subprocess.call(
    ["aws", "s3", "ls", "my-bucket"],
    env=os.environ | environ_auth
)

or simply update the running process environment with the (fresh) credentials

environ_auth = py_aws_vault_auth.authenticate("DataScience", return_as="environ")
import os
os.environ.update(environ_auth)

Credentials Handling

Without specifying return_as the function authenticate returns all environment variables starting with AWS_ as seen set by aws-vault - that includes credentials, their expiration time and the region of the profile.

The expiration time can be converted to a datetime object using py_aws_vault.expiration_time (this requires dateutil for python<3.11).

The functions to_boto_auth, to_environ_auth and to_s3fs_auth create the relevant authentication parameters. These can be imported from py_aws_vault_auth in order to use the same credentials for boto and s3fs, e.g.

ds_credentials = py_aws_vault_auth.authenticate("DataScience")
athena_client = boto3.client("athena", **to_boto_auth(ds_credentials))

Installation

No dependencies, just python3... and of course aws-vault

pip install -U git+https://github.com/achimgaedke/py-aws-vault-auth.git

Project Scope

Make the AWS authentication with the command line tool aws-vault easy in an interactive context different from a terminal, e.g. jupyter notebook.

This project does:

  • help with AWS authentication with aws-vault mid-session, i.e.
  • make it easy for data-scientists to avoid copying credentials into a notebook
  • avoid starting jupyter with aws-vault exec XXX -- jupyter lab (or VSCode...)
  • return the AWS credentials directly usable with popular data-science tools
  • request the MFA token via python's input context, i.e. the input built-in function
  • aims to work in Linux/MacOS (and hopefully MS Windows) wo extra dependencies and supporting a variety of python3 versions

If you prefer another window poping up somewhere, you can use prompt="osascript" (with MacOS) or similar. This won't use python's input function.

This project does not:

  • use all features of aws-vault
  • capture the input dialogues for various key-chain/password managers

To avoid too many password manager input dialogues, have a look at the aws-vault documentation.

Project Maturity

Please star this repository if you like it or use the issue-tracker to share some feedback (bug reports or use cases).

This project is born out of need for a smoother integration of devops tools/requirements with data-science tools. At the moment, it is simply factoring out some code I use privately.

The project is developed on MacOS, python-3.11 and tested on Linux, python-3.9.

Yes, the thread-based polling of the terminal communication is kind of awkward. Once upon a time this was the most portable way of waiting on output - or it was 6 years ago. I might revisit this part another time, as OS and backwards-compatibility got better. (I'm aware of select, async, or setting streams to non-blocking mode)

Ah, and tests are missing...