-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create dataproc serverless spark batches operator #19248
Create dataproc serverless spark batches operator #19248
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
:type timeout: float | ||
:param metadata: Additional metadata that is provided to the method. | ||
:type metadata: Sequence[Tuple[str, str]] | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing gcp_conn_id
and impersonation_chain
from the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josh-fell I've added it
airflow/providers/google/cloud/example_dags/example_dataproc.py
Outdated
Show resolved
Hide resolved
@MaksYermak General thoughts across the new operators:
Not saying it needs to/should be implemented but thought these might be good (and possibly cheap) features. |
@josh-fell answers for your questions
|
7d1ea91
to
e9a956d
Compare
@turbaszek @josh-fell @mik-laj hi guys, could you look on this PR one more time? |
e9a956d
to
4065232
Compare
@MaksYermak Should LGTM 👍 |
4065232
to
0c1b5e5
Compare
@josh-fell make sense, I have added it to the code. |
@turbaszek @josh-fell @mik-laj hi guys, could you look and approve this PR for merge if all good? |
@turbaszek @josh-fell @mik-laj Is the a chance to merge this? |
@MaksYermak @lwyszomi There is a static check that is failing. Can you address this? FYI - I am not able to merge. A code owner will have to do that. |
Awesome work, congrats on your first merged pull request! |
(cherry picked from commit bf68b9a)
(cherry picked from commit bf68b9a)
A question here, how can I specify the docker image to run pyspark workloads in custom containers using DataprocCreateBatchOperator? |
@aoelvp94 I'm not a expert of Dataproc serverless service, but operator is only a wrapper for the SDK and we take the same parameters as we heve there. So in the Refs: |
Yeah @lwyszomi I am trying with something like that. I am trying that now so if I have news I will leave the code snippet for future references |
@lwyszomi No success when I tried to set jinja params in the |
@aoelvp94 which version of the provider you are using? The Batch object is templated field starting from 6.4.0 |
I am using |
Another question, why the operator don't generate a hash in |
I will need more investigation why jijna paramams doesn't work, I checked and you have right that for Composer 2.0.5 using 6.4.0.
We created operators based on the exisiting SDK, we didn't add any extra logic to add hash to the |
@aoelvp94 for which property you want to use jinja template? |
|
@aoelvp94 butch is tepletized so it should work, but which property inside |
can you share Batch config? |
@aoelvp94 thanks, we will check why this not work, I will back to you with update when I will have any information |
@aoelvp94, have you tried passing
also, please double check the apache-airflow-providers-google version ( |
@aoelvp94 I have checked your configuration for correct work you should use dictionary instead of Batch() object. It is because in some reasons Airflow can't template object's property.
One more thing in the last Jinja version |
Create operator for working with Batches for Google Dataproc. Includes operators, hooks, example dags, tests and docs.
Co-authored-by: Wojciech Januszek januszek@google.com
Co-authored-by: Lukasz Wyszomirski wyszomirski@google.com
Co-authored-by: Maksim Yermakou maksimy@google.com
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.