-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Proposal: reduce dependencies #396
Comments
@dovahcrow do we remove the requests dependency or not? @pallavibharadwaj is just about to do the implementation though. |
Let's remove requests and give http.client a try. |
Removing the tornado dependency if possible would be helpful. Currently dataprep and jupyter notebook don't play well together because dataprep is forcing tornado=5.0.2. The issue is on jupyter, because they've taken a code dependency on higher tornado versions but didn't update their dependency requirements accordingly. As a result, a conda install today with both jupyter and dataprep causes a jupyter failure with some notebooks when it tries to access a tornado method that doesn't exist. In the meantime, I saw the dataprep issue with kaggle that caused the pinning of the tornado version. Do you know if dataprep will run ok with the 6.X versions of tornado? This is the jupyter issue I opened, if you'd like more detail: jupyter/notebook#5920 |
I also see that dataprep is also forcing pandas=1.0, numpy=1.18 and scipy=1.4. Given that these are workhorse modules for the dataprep audience, it would also be valuable to loosen these up to support the latest versions unless there's a strong reason to limit them. |
@dhuntley1023 thanks for the suggestions! We can definitely loosen pandas numpy and scipy. However, this reason for pinning tornado is because Kaggle notebook pins it. See https://github.com/Kaggle/docker-python/blob/master/Dockerfile. @jnwang @jinglinpeng what do you think of dropping the Kaggle support since it seems like more users are using the newer version of Jupyter nowadays. |
Is it possible to detect which environment it is? If it is Kaggle, we import tornado. This is similar to how pandas handles the dependency on sqlalchemy for read_sql() https://github.com/pandas-dev/pandas/blob/v1.2.0/pandas/io/sql.py#L40 |
@jnwang Theoretically yes, however that requires us to write our own package loader, installer, and other facilities, i.e. we do not install the package when installing dataprep. At the first run time, we detect the platform through their IP and install the selected version of the packages. Our case is different than the pandas case where they just decide whether to load the sqlalchemy or not but we need to switch package versions. |
Summary
Reduce unnecessary dependencies for the whole package.
Design-level Explanation Actions
NA
Design-level Explanation
NA
Implementation-level Explanation
Removing
Pillow
Pillow is used only to load the ellipse image for the wordcloud. We can remove this dependency by storing the ellipse as an array and directly read it using NumPy.
Removing
bottleneck
Need to investigate if the bottleneck's ranking method is equivalent to series.rank.
Make
tqdm
optionaltqdm is used in the progress bar. We can make this library optional and only display the progress bar if tqdm is installed.
Removing
tornado
I already opened an issue here Upgrade to tornado 6? Kaggle/docker-python#890. Once it is resolved we can remove the tornado version restriction.
Removing
requests
Removing requests requires the
connect
function to become async. The generator ui might also meet difficulties since it needs to send out requests. Let's first try the following solution to see ifhttp.client
satisfies our needs.to:
Rational and Alternatives
NA
Prior Art
NA
Future Possibilities
NA
Implementation-level Actions
Pillow
bottleneck
tqdm
optionaltornado
requests
When vendoring the following packages, make sure the license is copied and followed.
Additional Tasks
The text was updated successfully, but these errors were encountered: