Feature Proposal: reduce dependencies #396

dovahcrow · 2020-10-22T02:10:57Z

Summary

Reduce unnecessary dependencies for the whole package.

Design-level Explanation Actions

NA

Design-level Explanation

NA

Implementation-level Explanation

Removing Pillow
Pillow is used only to load the ellipse image for the wordcloud. We can remove this dependency by storing the ellipse as an array and directly read it using NumPy.
Removing bottleneck
Need to investigate if the bottleneck's ranking method is equivalent to series.rank.
Make tqdm optional
tqdm is used in the progress bar. We can make this library optional and only display the progress bar if tqdm is installed.
Removing tornado
I already opened an issue here Upgrade to tornado 6? Kaggle/docker-python#890. Once it is resolved we can remove the tornado version restriction.
Removing requests
Removing requests requires the connect function to become async. The generator ui might also meet difficulties since it needs to send out requests. Let's first try the following solution to see if http.client satisfies our needs.

import requests
requests.get("www.python.org").json()

to:

import http.client
import json
conn = http.client.HTTPSConnection("www.python.org")
conn.request("GET", "/")
r1 = conn.getresponse()
json.loads(r1.read())

Vendor list:
- jsonpath-ng
- nltk
- wordcloud
- aiohttp

Rational and Alternatives

NA

Prior Art

NA

Future Possibilities

NA

Implementation-level Actions

When vendoring the following packages, make sure the license is copied and followed.

jsonpath-ng
nltk
wordcloud
aiohttp

Additional Tasks

This task is put into a correct pipeline (Development Backlog or In Progress).
The label of this task is setting correctly.
The issue is assigned to the correct person.
The issue is linked to related Epic.

The documentation is changed accordingly.
Tests are added accordingly.

The text was updated successfully, but these errors were encountered:

peiwangdb · 2020-10-27T15:21:09Z

@dovahcrow do we remove the requests dependency or not? @pallavibharadwaj is just about to do the implementation though.

dovahcrow · 2020-10-28T21:04:02Z

@dovahcrow do we remove the requests dependency or not? @pallavibharadwaj is just about to do the implementation though.

Let's remove requests and give http.client a try.

… library

dhuntley1023 · 2020-12-29T09:02:54Z

Removing the tornado dependency if possible would be helpful. Currently dataprep and jupyter notebook don't play well together because dataprep is forcing tornado=5.0.2.

The issue is on jupyter, because they've taken a code dependency on higher tornado versions but didn't update their dependency requirements accordingly. As a result, a conda install today with both jupyter and dataprep causes a jupyter failure with some notebooks when it tries to access a tornado method that doesn't exist.

In the meantime, I saw the dataprep issue with kaggle that caused the pinning of the tornado version. Do you know if dataprep will run ok with the 6.X versions of tornado?

This is the jupyter issue I opened, if you'd like more detail: jupyter/notebook#5920

dhuntley1023 · 2020-12-29T09:09:59Z

I also see that dataprep is also forcing pandas=1.0, numpy=1.18 and scipy=1.4. Given that these are workhorse modules for the dataprep audience, it would also be valuable to loosen these up to support the latest versions unless there's a strong reason to limit them.

dovahcrow · 2020-12-30T01:23:52Z

@dhuntley1023 thanks for the suggestions! We can definitely loosen pandas numpy and scipy. However, this reason for pinning tornado is because Kaggle notebook pins it. See https://github.com/Kaggle/docker-python/blob/master/Dockerfile. @jnwang @jinglinpeng what do you think of dropping the Kaggle support since it seems like more users are using the newer version of Jupyter nowadays.

jnwang · 2020-12-30T02:15:28Z

Is it possible to detect which environment it is? If it is Kaggle, we import tornado. This is similar to how pandas handles the dependency on sqlalchemy for read_sql()

https://github.com/pandas-dev/pandas/blob/v1.2.0/pandas/io/sql.py#L40

… library

dovahcrow · 2021-01-04T18:12:02Z

@jnwang Theoretically yes, however that requires us to write our own package loader, installer, and other facilities, i.e. we do not install the package when installing dataprep. At the first run time, we detect the platform through their IP and install the selected version of the packages.

Our case is different than the pandas case where they just decide whether to load the sqlalchemy or not but we need to switch package versions.

… library

dovahcrow added the type: enhancement New feature or request label Oct 22, 2020

dovahcrow self-assigned this Oct 22, 2020

dovahcrow mentioned this issue Oct 23, 2020

Connector: remove dependency on requests library #397

Closed

dovahcrow assigned pallavibharadwaj Oct 23, 2020

pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Nov 11, 2020

refactor(connector): [sfu-db#396] Removing dependency on the Requests…

a8d1241

… library

pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Nov 11, 2020

refactor(connector): [sfu-db#396] Removing dependency on the Requests…

4cbf59b

… library

pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Nov 24, 2020

refactor(connector): [sfu-db#396] Removing dependency on the Requests…

01a149c

… library

pallavibharadwaj mentioned this issue Nov 24, 2020

feat(connector):from_key parameter validation #407

Merged

10 tasks

pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Dec 1, 2020

refactor(connector): [sfu-db#396] Removing dependency on the Requests…

04d327f

… library

pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Dec 1, 2020

refactor(connector): [sfu-db#396] Removing dependency on the Requests…

255786f

… library

pallavibharadwaj added a commit to pallavibharadwaj/dataprep that referenced this issue Dec 1, 2020

refactor(connector): [sfu-db#396] Removing dependency on the Requests…

bd63dc5

… library

dhuntley1023 mentioned this issue Dec 30, 2020

Jupyter kernel abort when plotting a column with pandas type "category" #463

Open

dovahcrow pushed a commit to pallavibharadwaj/dataprep that referenced this issue Dec 31, 2020

refactor(connector): [sfu-db#396] Removing dependency on the Requests…

a799a6b

… library

dovahcrow mentioned this issue Jan 5, 2021

chore: relax dependencies #467

Merged

10 tasks

dhuntley1023 referenced this issue Jan 6, 2021

chore: relax dependencies

496f0bc

devinllu pushed a commit to devinllu/dataprep that referenced this issue Nov 9, 2021

refactor(connector): [sfu-db#396] Removing dependency on the Requests…

b097932

… library

dovahcrow added this to DataPrep Feb 4, 2022

dovahcrow unassigned pallavibharadwaj Feb 4, 2022

dovahcrow moved this to In Progress in DataPrep Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Proposal: reduce dependencies #396

Feature Proposal: reduce dependencies #396

dovahcrow commented Oct 22, 2020 •

edited

Loading

peiwangdb commented Oct 27, 2020

dovahcrow commented Oct 28, 2020

dhuntley1023 commented Dec 29, 2020

dhuntley1023 commented Dec 29, 2020

dovahcrow commented Dec 30, 2020

jnwang commented Dec 30, 2020

dovahcrow commented Jan 4, 2021

Feature Proposal: reduce dependencies #396

Feature Proposal: reduce dependencies #396

Comments

dovahcrow commented Oct 22, 2020 • edited Loading

Summary

Design-level Explanation Actions

Design-level Explanation

Implementation-level Explanation

Rational and Alternatives

Prior Art

Future Possibilities

Implementation-level Actions

Additional Tasks

peiwangdb commented Oct 27, 2020

dovahcrow commented Oct 28, 2020

dhuntley1023 commented Dec 29, 2020

dhuntley1023 commented Dec 29, 2020

dovahcrow commented Dec 30, 2020

jnwang commented Dec 30, 2020

dovahcrow commented Jan 4, 2021

dovahcrow commented Oct 22, 2020 •

edited

Loading