Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Proposal: reduce dependencies #396

Open
9 of 15 tasks
dovahcrow opened this issue Oct 22, 2020 · 7 comments
Open
9 of 15 tasks

Feature Proposal: reduce dependencies #396

dovahcrow opened this issue Oct 22, 2020 · 7 comments
Assignees
Labels
type: enhancement New feature or request

Comments

@dovahcrow
Copy link
Member

dovahcrow commented Oct 22, 2020

Summary

Reduce unnecessary dependencies for the whole package.

Design-level Explanation Actions

NA

Design-level Explanation

NA

Implementation-level Explanation

  • Removing Pillow
    Pillow is used only to load the ellipse image for the wordcloud. We can remove this dependency by storing the ellipse as an array and directly read it using NumPy.

  • Removing bottleneck
    Need to investigate if the bottleneck's ranking method is equivalent to series.rank.

  • Make tqdm optional
    tqdm is used in the progress bar. We can make this library optional and only display the progress bar if tqdm is installed.

  • Removing tornado
    I already opened an issue here Upgrade to tornado 6? Kaggle/docker-python#890. Once it is resolved we can remove the tornado version restriction.

  • Removing requests
    Removing requests requires the connect function to become async. The generator ui might also meet difficulties since it needs to send out requests. Let's first try the following solution to see if http.client satisfies our needs.

import requests
requests.get("www.python.org").json()

to:

import http.client
import json
conn = http.client.HTTPSConnection("www.python.org")
conn.request("GET", "/")
r1 = conn.getresponse()
json.loads(r1.read())
  • Vendor list:
    • jsonpath-ng
    • nltk
    • wordcloud
    • aiohttp

Rational and Alternatives

NA

Prior Art

NA

Future Possibilities

NA

Implementation-level Actions

  • removing Pillow
  • removebottleneck
  • make tqdm optional
  • removing tornado
  • remove requests

When vendoring the following packages, make sure the license is copied and followed.

  • jsonpath-ng
  • nltk
  • wordcloud
  • aiohttp

Additional Tasks

  • This task is put into a correct pipeline (Development Backlog or In Progress).
  • The label of this task is setting correctly.
  • The issue is assigned to the correct person.
  • The issue is linked to related Epic.
  • The documentation is changed accordingly.
  • Tests are added accordingly.
@peiwangdb
Copy link
Contributor

@dovahcrow do we remove the requests dependency or not? @pallavibharadwaj is just about to do the implementation though.

@dovahcrow
Copy link
Member Author

@dovahcrow do we remove the requests dependency or not? @pallavibharadwaj is just about to do the implementation though.

Let's remove requests and give http.client a try.

pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Nov 11, 2020
pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Nov 11, 2020
pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Nov 24, 2020
pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Dec 1, 2020
pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Dec 1, 2020
pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Dec 1, 2020
@dhuntley1023
Copy link

Removing the tornado dependency if possible would be helpful. Currently dataprep and jupyter notebook don't play well together because dataprep is forcing tornado=5.0.2.

The issue is on jupyter, because they've taken a code dependency on higher tornado versions but didn't update their dependency requirements accordingly. As a result, a conda install today with both jupyter and dataprep causes a jupyter failure with some notebooks when it tries to access a tornado method that doesn't exist.

In the meantime, I saw the dataprep issue with kaggle that caused the pinning of the tornado version. Do you know if dataprep will run ok with the 6.X versions of tornado?

This is the jupyter issue I opened, if you'd like more detail: jupyter/notebook#5920

@dhuntley1023
Copy link

I also see that dataprep is also forcing pandas=1.0, numpy=1.18 and scipy=1.4. Given that these are workhorse modules for the dataprep audience, it would also be valuable to loosen these up to support the latest versions unless there's a strong reason to limit them.

@dovahcrow
Copy link
Member Author

@dhuntley1023 thanks for the suggestions! We can definitely loosen pandas numpy and scipy. However, this reason for pinning tornado is because Kaggle notebook pins it. See https://github.com/Kaggle/docker-python/blob/master/Dockerfile. @jnwang @jinglinpeng what do you think of dropping the Kaggle support since it seems like more users are using the newer version of Jupyter nowadays.

@jnwang
Copy link

jnwang commented Dec 30, 2020

Is it possible to detect which environment it is? If it is Kaggle, we import tornado. This is similar to how pandas handles the dependency on sqlalchemy for read_sql()

https://github.com/pandas-dev/pandas/blob/v1.2.0/pandas/io/sql.py#L40

dovahcrow pushed a commit to pallavibharadwaj/dataprep that referenced this issue Dec 31, 2020
@dovahcrow
Copy link
Member Author

@jnwang Theoretically yes, however that requires us to write our own package loader, installer, and other facilities, i.e. we do not install the package when installing dataprep. At the first run time, we detect the platform through their IP and install the selected version of the packages.

Our case is different than the pandas case where they just decide whether to load the sqlalchemy or not but we need to switch package versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

5 participants