Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
FrancoisChaumont committed Mar 23, 2021
0 parents commit 7d40b81
Show file tree
Hide file tree
Showing 51 changed files with 4,811 additions and 0 deletions.
18 changes: 18 additions & 0 deletions .env
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
PROFILE=default
VERSION=latest
REGION=
CATALOG=AwsDataCatalog
WORKGROUP=primary
QUERY_OUTPUT=
AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL1QUERIES=5
AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL1QUERIES=10
AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL2QUERIES=5
AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL2QUERIES=20
AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL3QUERIES=20
AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL3QUERIES=40
AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL4QUERIES=20
AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL4QUERIES=80
AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL5QUERIES=100
AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL5QUERIES=200
AWS_DEFAULT_SIMULTANEOUS_DDL_QUERIES=20
AWS_DEFAULT_SIMULTANEOUS_DML_QUERIES=20
21 changes: 21 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2021 Francois Chaumont

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
122 changes: 122 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Toolkit for AWS Athena API

[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/FrancoisChaumont/aws-athena-api-tools/issues)
![GitHub release](https://img.shields.io/github/release/FrancoisChaumont/aws-athena-api-tools.svg)
[![GitHub issues](https://img.shields.io/github/issues/FrancoisChaumont/aws-athena-api-tools.svg)](https://github.com/FrancoisChaumont/aws-athena-api-tools/issues)
[![GitHub stars](https://img.shields.io/github/stars/FrancoisChaumont/aws-athena-api-tools.svg)](https://github.com/FrancoisChaumont/aws-athena-api-tools/stargazers)
![Github All Releases](https://img.shields.io/github/downloads/FrancoisChaumont/aws-athena-api-tools/total.svg)

## Introduction
**What it does?** It allows you to do the following from the command line:
- create/drop database
- execute a single query
- execute multiple queries simultaneously while remaining within your max rate limits
- create partitions on non-hive or hive formatted data
- get one or multiple queries current states
- stop a running query
- delete metadata files
- create a named query
- list & detail named queries
- list & detail databases
- list & detail database tables

## Requirements
- [PHP](https://www.php.net/releases/7_4_0.php) ^7.4
- [aws/aws-sdk-php](https://github.com/aws/aws-sdk-php) ^3.175
- [vlucas/phpdotenv](https://github.com/vlucas/phpdotenv) ^5.3
- [Composer](https://getcomposer.org)
- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html)
- AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY¹

> ¹ The SDK should detect the credentials from environment variables (via AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY), an AWS credentials INI file in your HOME directory, AWS Identity and Access Management (IAM) instance profile credentials, or credential providers
## Installation
Download a copy of this repository and run the following:
```
composer install
```

## Configuration
Modify the following variables inside the file [.env](.env) for default values to use when related options are omitted
- `PROFILE`: AWS profile from ~/.AWS/credentials
- `VERSION`: AWS webservice version
- `REGION`: AWS region to connect to
- `CATALOG`: Athena data source catalog
- `WORKGROUP`: Athena workgroup
- `QUERY_OUTPUT`: S3 bucket for query results
- `AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL1QUERIES`¹: level 1 queries max calls per second
- `AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL1QUERIES`¹⁺⁰: level 1 queries max burst capacity
- `AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL2QUERIES`²: level 2 queries max calls per second
- `AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL2QUERIES`²⁺⁰: level 2 queries max burst capacity
- `AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL3QUERIES`³: level 3 queries max calls per second
- `AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL3QUERIES`³⁺⁰: level 3 queries max burst capacity
- `AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL4QUERIES`⁴: level 4 queries max calls per second
- `AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL4QUERIES`⁴⁺⁰: level 4 queries max burst capacity
- `AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL5QUERIES`⁵: level 5 queries max calls per second
- `AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL5QUERIES`⁵⁺⁰: level 5 queries max burst capacity
- `AWS_DEFAULT_SIMULTANEOUS_DDL_QUERIES`⁶: max simultaneous DDL queries
- `AWS_DEFAULT_SIMULTANEOUS_DML_QUERIES`⁷: max simultaneous DML queries

¹BatchGetNamedQuery, ListNamedQueries, ListQueryExecutions
²CreateNamedQuery, DeleteNamedQuery, GetNamedQuery
³BatchGetQueryExecution
⁴StartQueryExecution, StopQueryExecution
⁵GetQueryExecution, GetQueryResults - `a value higher than 2 will exceed the max rate limit`
⁶create table, create table add partition
⁷select, create table as (CTAS)

⁰max burst capacity not yet implemented

## Important
- Make sure to double % inside query files for other than parameters passed to the query or they will be replaced by sprintf

Example passing year + month to constitute the table name:
```sql
SELECT DATE_FORMAT(FROM_UNIXTIME(1614716423), '%%Y-%%m-%%d %%H:%%i:%%S')
FROM database.table_name_%1$s%2$s
LIMIT 1
```

## The tools
See tools [documentation](READMEs/README.tools.md) for more details.

## Testing
See tests [documentation](READMEs/README.tests.md) for more details.

## AWS documentation
AWS documentation:
- AWS SDK [Basic Usage](https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/getting-started_basic-usage.html)
- AWS SDK [API documentation for Athena](https://docs.aws.amazon.com/aws-sdk-php/v3/api/namespace-Aws.Athena.html)
- AWS SDK for PHP v3 [Getting Started](https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/getting-started_index.html)
- AWS Athena [Service Limits](https://docs.aws.amazon.com/athena/latest/ug/service-limits.html)
- List of [AWS regions](http://docs.aws.amazon.com/general/latest/gr/rande.html)
- Data [Partitioning](https://docs.aws.amazon.com/athena/latest/ug/partitions.html)

## TODO
Methods:
- BatchGetNamedQuery
- BatchGetQueryExecution
- CreateDataCatalog
- CreatePreparedStatement
- CreateWorkGroup
- DeleteDataCatalog
- DeletePreparedStatement
- DeleteWorkGroup
- GetDataCatalog
- GetPreparedStatement
- GetQueryResults
- GetWorkGroup
- ListDataCatalogs
- ListEngineVersions
- ListPreparedStatements
- ListQueryExecutions
- ListTagsForResource
- ListWorkGroups
- TagResource
- UntagResource
- UpdateDataCatalog
- UpdatePreparedStatement
- UpdateWorkGroup

Others:
- implement burst capacity?
45 changes: 45 additions & 0 deletions READMEs/README.tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
## Tests
This [test](../tests/test.sh) script allows to tests every tools of this library.

Make sure to read **Requirements**, **Installation** and **Configuration** first.
**For safety, a confirmation to delete data on s3 and drop database/tables is required at start.**

Tested on Ubuntu 20.04 running PHP7.4.

It requires the database `sampledb` and performs the following:
1. list database `sampledb`
2. create a new database
3. create test data by extracting from `sampledb.elb_logs` table and creating tables with daily data
4. create table for test data with multiple days data
5. create day partitions on test data tables
6. select data from multiple days table
7. select several days data from single day table
8. display query result files on s3
9. delete metadata files
10. display query result files on s3 without metadata files
11. detail tables in the database
12. drop database and all tables
13. delete data from s3
14. create named query
15. detail named query
16. delete named query

Usage:
```shell
/bin/bash test.sh \
-d DATABASE_TO_CREATE \
-y YEAR_OF_DATA_TO_EXTRACT \
-m MONTH_OF_DATA_TO_EXTRACT
```

Example:
```shell
/bin/bash test.sh \
-d aws_athena_api_tools_tests \
-y 2015 \
-m 01
```

Output:
See expected [output](../tests/output.txt) for more details.

101 changes: 101 additions & 0 deletions READMEs/README.tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
## The tools
**Create/Drop a database**
See [usage](../tools/usage/database.usage.php) or
```
php database.php -h/--help
```

**Execute a single query**
select | create table [as] | create view | create database | delete table ...

See [usage](../tools/usage/query.usage.php) or
```
php query.php -h/--help
```

**Execute queries for each day in the given date range within the max rate limit**
select | create table [as] | create view | create database | delete table ...

Examples: [query-daily.sql](../examples/query-daily.sql)

See [usage](../tools/usage/query-daily.usage.php) or
```
php query-daily.php -h/--help
```

**Execute queries for each month in the given date range within the max rate limit**
select | create table [as] | create view | create database | delete table ...

Examples:
[create-table.sql](../examples/create-table.sql),
[create-table-partitioned.sql](../examples/create-table-partitioned.sql),
[create-table-partitioned-hive.sql](../examples/create-table-partitioned-hive.sql),
[drop-table.sql](../examples/drop-table.sql),
[query-monthly.sql](../examples/query-monthly.sql)

See [usage](../tools/usage/query-monthly.usage.php) or
```
php query-monthly.php -h/--help
```

**Create day partitions on a table (non-Hive formatted data)**
See [usage](../tools/usage/partitions-daily.usage.php) or
```
php partitions-daily.php -h/--help
```

**Create day partitions on a table (Hive formatted data)**
See [usage](../tools/usage/partitions-daily-hive.usage.php) or
```
php partitions-daily-hive.php -h/--help
```

**Get the execution state of a query (running, failed, succeeded, ...)**
See [usage](../tools/usage/state.usage.php) or
```
php state.php -h/--help
```

**Get the execution state of queries listed in a file (running, failed, succeeded, ...)**
Example: [query-ids-list.txt](../examples/query-ids-list.txt)

See [usage](../tools/usage/state-from-list.usage.php) or
```
php state-from-list.php -h/--help
```

**Stop a running query**
See [usage](../tools/usage/stop.usage.php) or
```
php stop.php -h/--help
```

**Delete metadata files recursively from an S3 location (bucket/prefixes)**
See [usage](../tools/usage/delete-metadata-files.usage.sh) or
```
delete-metadata-files.sh -h/--help
```

**Create or delete a named query**
See [usage](../tools/usage/named-query.usage.php) or
```
php named-query.php -h/--help
```

**Detail one or all named queries and output to json format**
See [usage](../tools/usage/list-named-queries.usage.php) or
```
php list-named-queries.php -h/--help
```

**Detail one or all databases and output to json format**
See [usage](../tools/usage/list-databases.usage.php) or
```
php list-databases.php -h/--help
```

**Detail one or all tables of a database and output to json format**
See [usage](../tools/usage/list-tables.usage.php) or
```
php list-tables.php -h/--help
```
20 changes: 20 additions & 0 deletions composer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"name": "francoischaumont/aws-athena-api-tools",
"description": "Toolkit for AWS Athena using AWS SDK for PHP v3",
"authors": [
{
"name": "Francois Chaumont",
"role": "main developer"
}
],
"require": {
"php": "^7.4",
"aws/aws-sdk-php": "^3.175",
"vlucas/phpdotenv": "^5.3"
},
"autoload": {
"psr-4": {
"FC\\AWS\\": "src/AWS/"
}
}
}
Loading

0 comments on commit 7d40b81

Please sign in to comment.