Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate removing pandas dependency from Core Airflow #12500

Closed
ashb opened this issue Nov 20, 2020 · 7 comments
Closed

Investigate removing pandas dependency from Core Airflow #12500

ashb opened this issue Nov 20, 2020 · 7 comments
Labels
area:core kind:task A task that needs to be completed as part of a larger issue

Comments

@ashb
Copy link
Member

ashb commented Nov 20, 2020

I want to take a look at removing numpy/pyarrow from core airflow. We don't really need it, and it would make installing easier.

After 2.0

Originally posted by @ashb in #11950 (comment)

@turbaszek
Copy link
Member

+1 for that, I faced some dependency issues with pyarrow too

@potiuk
Copy link
Member

potiuk commented Nov 21, 2020

Oh yeah! That will help with #11950 as well!

@turbaszek
Copy link
Member

Searching for pandas import it seems that only DbApiHook requires it explicitly for one method: get_pandas_df.
Apart from that:

root@1572924b0714:/opt/airflow# pipdeptree -r -p pandas
pandas==1.1.0
  - apache-airflow==2.0.0b3 [requires: pandas>=0.17.1,<2.0]
  - nteract-scrapbook==0.4.1 [requires: pandas]
  - pandas-gbq==0.13.2 [requires: pandas>=0.19.0]

Where nteract-scrapbook is required for papermill provider and pandas-gbq is for google. I removed pandas from setup.cfg and using clean venv installed airflow. No pandas was installed and running few airflow commands showed no errors.

@ashb
Copy link
Member Author

ashb commented Nov 21, 2020

It probably used to be used more - in the old chart view I think, but we've removed those uses now.

@ashb
Copy link
Member Author

ashb commented Nov 21, 2020

And no pandas means no numpy right?

@turbaszek
Copy link
Member

turbaszek commented Nov 21, 2020

And no pandas means no numpy right?

Yes, removing pandas will remove also numpy 👌

@jchacks
Copy link

jchacks commented Feb 9, 2021

import numpy as np

That line in utils imports numpy, not sure if this is relevant.
It was causing my webserver to crash since my airflow version did not install numpy by default, for some reason.

@vikramkoka vikramkoka added the kind:task A task that needs to be completed as part of a larger issue label Feb 9, 2021
@kaxil kaxil closed this as completed in 2c26b15 Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:task A task that needs to be completed as part of a larger issue
Projects
None yet
Development

No branches or pull requests

5 participants