Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information about Amazon Elastic MapReduce Connection #26687

Merged
merged 7 commits into from
Oct 2, 2022

Conversation

Taragolis
Copy link
Contributor

@Taragolis Taragolis commented Sep 26, 2022

Right now there is not information how to use and Amazon Elastic MapReduce Connection.
Personally for me this connection a bit odd because it only contain parameters for single boto3 method.

However this connection exists for a long time and might be use for some one, so I add:

  • Connection page
  • Additional informatio how emr_conn_id and job_flow_overrides work together
  • Keep only Extra field in the Connection UI
  • Warnings if connection not exists or connection has unexpected conn type
  • Overwrite test_connection method, otherwise in the UI/API will test aws connection with default boto3 credential strategy

image

@Taragolis
Copy link
Contributor Author

cc: @ferruzzi @o-nikolas @vincbeck

@Taragolis Taragolis changed the title Added information about Amazon Elastic MapReduce Connection Add information about Amazon Elastic MapReduce Connection Sep 26, 2022
@ferruzzi
Copy link
Contributor

Fighting with a system test at the moment, I'll check it out tomorrow (Tuesday) 👍

@Taragolis
Copy link
Contributor Author

@ferruzzi very appreciate! Almost all of the changes related to documentation or inform users about something that we do previously silently. Like if emr_conn_id not exists that we just assume that we should use empty config without notify a user.

Also today I overwrite test_connection method, just because right now it tried to use AwsBaseHook/AwsGenericHook method and test actually ability to connect to AWS itself by default credentials.


image


image

@Taragolis Taragolis force-pushed the doc-amazon-emr-connection branch from cfa6f5f to a3576f7 Compare September 27, 2022 12:03
@ferruzzi
Copy link
Contributor

Almost all of the changes related to documentation or inform users about something that we do previously silently.

I am personally a big fan of transparency. 👍

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, great stuff as always! Just some minor changes requested 👍

airflow/providers/amazon/aws/hooks/emr.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/hooks/emr.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/hooks/emr.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/hooks/emr.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/hooks/emr.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/operators/emr.py Outdated Show resolved Hide resolved
docs/apache-airflow-providers-amazon/connections/emr.rst Outdated Show resolved Hide resolved
docs/apache-airflow-providers-amazon/connections/emr.rst Outdated Show resolved Hide resolved
@Taragolis
Copy link
Contributor Author

Thanks for the contribution, great stuff as always! Just some minor changes requested 👍

Cool. I will add this suggestion in the morning.

@Taragolis Taragolis force-pushed the doc-amazon-emr-connection branch from 2136aa6 to a36e4de Compare September 28, 2022 08:46
@Taragolis Taragolis force-pushed the doc-amazon-emr-connection branch from a36e4de to 5b72026 Compare September 29, 2022 13:43
@Taragolis Taragolis force-pushed the doc-amazon-emr-connection branch from 5b72026 to afe3a6e Compare September 29, 2022 23:38
Copy link
Contributor

@josh-fell josh-fell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid.

@@ -29,8 +30,8 @@
mock_emr = None


@unittest.skipIf(mock_emr is None, 'moto package not present')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Love the conversion to pytest here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I try to replace unittests by pytest in most cases this is not required huge afford.
However still a lot of unittests tests

# Total number of files which use `unittest.TestCase `
❯ grep -rl 'unittest.TestCase' ./tests | wc -l
528

# By tests packages
❯ grep -rl 'unittest.TestCase' ./tests | cut -d"/" -f3 | sort | uniq -c | sort -nr
    379 providers
     36 charts
     23 utils
     20 cli
     10 ti_deps
     10 api_connexion
      6 sensors
      6 operators
      6 executors
      6 core
      5 www
      5 always
      4 models
      3 api
      2 task
      2 kubernetes
      1 security
      1 plugins
      1 macros
      1 hooks
      1 dag_processing

# By provider ("apache", "microsoft", "common" has subpackages which is separate provider)
❯ grep -rl 'unittest.TestCase' ./tests/providers/ | cut -d"/" -f5 | sort | uniq -c | sort -nr
    113 google
    101 amazon
     32 apache
     26 microsoft
      6 databricks
      4 redis
      4 mysql
      4 alibaba
      3 trino
      3 tableau
      3 qubole
      3 oracle
      3 jenkins
      3 http
      3 docker
      3 atlassian
      3 arangodb
      3 airbyte
      2 yandex
      2 vertica
      2 telegram
      2 sqlite
      2 snowflake
      2 sftp
      2 segment
      2 salesforce
      2 presto
      2 postgres
      2 opsgenie
      2 neo4j
      2 mongo
      2 jdbc
      2 influxdb
      2 imap
      2 grpc
      2 ftp
      2 exasol
      2 discord
      2 dingding
      2 datadog
      2 common
      2 cncf
      2 asana
      1 ssh
      1 singularity
      1 sendgrid
      1 samba
      1 papermill
      1 openfaas
      1 elasticsearch
      1 cloudant
      1 celery

except AirflowNotFoundException:
config = {}
config = {}
if self.emr_conn_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat-picking: Should we have a separate function ?

def _validate_params_emr_conn_id(emr_conn_id: str):

IMHO it does increase readability by ditching a few level of indentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually for this purpose usually use get_conn() however we can not overwrite this method because it uses for obtain AWS credentials (aws_conn_id).

We could create this method, but only use in one place. Current implementation not contain any complex logic, so personally I do not see any benefits with this separate method.

@potiuk potiuk merged commit f3ad164 into apache:main Oct 2, 2022
@Taragolis Taragolis deleted the doc-amazon-emr-connection branch January 14, 2023 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants