Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3KeySensor 'bucket_key' instantiates as a nested list when rendered as a templated_field #28272

Closed
2 tasks done
sungwy opened this issue Dec 9, 2022 · 2 comments · Fixed by #28340
Closed
2 tasks done
Assignees
Labels
area:providers kind:bug This is a clearly a bug provider:amazon AWS/Amazon - related issues

Comments

@sungwy
Copy link
Contributor

sungwy commented Dec 9, 2022

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==6.2.0

Apache Airflow version

2.5.0

Operating System

Red Hat Enterprise Linux Server 7.6 (Maipo)

Deployment

Virtualenv installation

Deployment details

Simple virtualenv deployment

What happened

bucket_key is a template_field in S3KeySensor, which means that is expected to be rendered as a template field.

The supported types for the attribute are both 'str' and 'list'. There is also a conditional operation in the init function of the class that relies on the type of the input data, that converts the attribute to a list of strings. If a list of str is passed in through Jinja template, self.bucket_key is available as a doubly-nested list of strings, rather than a list of strings.

This is because the input value of bucket_key can only be a string type that represents the template-string when used as a template_field. These template_fields are then converted to their corresponding values when instantiated as a task_instance.

Example log from init function:
scheduler | DEBUG | type: <class 'list'> | val: ["{{ ti.xcom_pull(task_ids='t1') }}"]

Example log from poke function:
poke | DEBUG | type: <class 'list'> | val: [["s3://test_bucket/test_key1", "s3://test_bucket/test_key2"]]

This leads to the poke function throwing an exception as each individual key needs to be a string value to parse the url, but is being passed as a list (since self.bucket_key is a nested list).

What you think should happen instead

Instead of putting the input value of bucket_key in a list, we should store the value as-is upon initialization of the class, and just conditionally check the type of the attribute within the poke function.

def __init__
self.bucket_key = bucket_key
(which willstore the input values correctly as a str or a list when the task instance is created and the template fields are rendered)

def poke

def poke(self, context: Context):
        if isinstance(self.bucket_key, str):
                return self._check_key(key)
        else:
                return all(self._check_key(key) for key in self.bucket_key)

How to reproduce

  1. Use a template field as the bucket_key attribute in S3KeySensor
  2. Pass a list of strings as the rendered template input value for the bucket_key attribute in the S3KeySensor task. (e.g. as an XCOM or Variable pulled value)

Example:

with DAG(
        ...
        render_template_as_native_obj=True,
    ) as dag:
          @task(task_id="get_list_of_str", do_xcom_push=True)
                  def get_list_of_str():
                         return ["s3://test_bucket/test_key1", "s3://test_bucket/test_key1"]

          t = get_list_of_str()
          
          op = S3KeySensor(task_id="s3_key_sensor", bucket_key="{{ ti.xcom_pull(task_ids='get_list_of_str') }}")
          
          t >> op

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@sungwy sungwy added area:providers kind:bug This is a clearly a bug labels Dec 9, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Dec 9, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@Taragolis Taragolis added the provider:amazon AWS/Amazon - related issues label Dec 10, 2022
@Taragolis
Copy link
Contributor

Assigned you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants