Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas DataFrame in variable explorer can crash the app if it gets out of memory #1973

Closed
spyder-bot opened this issue Feb 17, 2015 · 12 comments

Comments

@spyder-bot
Copy link
Collaborator

From TheSa...@gmail.com on 2014-09-18T13:17:17Z

First of all thank you for this fantastic IDE.

I was waiting for this release(2.3.1) for the new dataframe explorer feature.
I tried it first with a large dataframe (50k rows * 10 cols) that I am working on.
Dataframe was too large and the variable explorer hanged forever. I tried to closed it but it made the all IDE to crash. I tried couple of time and everytime it was the same result.
I also tried to load only part of the DataFrame and in that case it worked perfectly and it is a really cool new feature.
Hence I will suggest either to load by default a maximum size dataframe or at least to use a separate thread for the dataframe explorer (in order to be able to kill it if not responding wihtout crashing the app)

Thank you in advance.

Spyder Version: 2.3.1
Python Version: 3.4.1
Qt Version : 5.3.1, PyQt4 (API v2) 4.11.1 on Windows
pyflakes >=0.6.0: 0.8.1 (OK)
pep8 >=0.6 : 1.5.7 (OK)
IPython >=0.13 : 2.1.0 (OK)
pygments >=1.6 : 1.6 (OK)
pandas >=0.13.1 : 0.14.0 (OK)
sphinx >=0.6.6 : 1.2.2 (OK)
rope >=0.9.2 : 0.9.4-1 (OK)
matplotlib >=1.0: 1.3.1 (OK)
sympy >=0.7.0 : 0.7.5 (OK)
pylint >=0.25 : 1.2.1 (OK)

What steps will reproduce the problem?

What is the expected output? What do you see instead?

Please provide any additional information below

.

Original issue: http://code.google.com/p/spyderlib/issues/detail?id=1973

@spyder-bot
Copy link
Collaborator Author

From ccordoba12 on 2014-09-18T15:17:08Z

Hi, thanks for your kind words. There is warning that tells users Spyder could freeze if it tries to open a DataFrame with more than 1e5 elements. Don't you see it?

@-dhoeghh91: Is there a way to run dataframeeditor (and of course the other editors) in a QThread?

Cc: dhoeg...@gmail.com
Labels: MS-v2.3.2 Cat-VariableExplorer

@spyder-bot
Copy link
Collaborator Author

From ccordoba12 on 2014-09-18T15:18:28Z

@-TheSabry: Sorry for my wording. What I tried to say is if that warning did or didn't show up for you :)

@spyder-bot
Copy link
Collaborator Author

From dhoeg...@gmail.com on 2014-09-18T22:13:11Z

Hi, I haven't investigated it, but it could be nice to fix it since all the variable editor have that tendancy. I have bummed into the same with the arrayeditor allot of times.

@spyder-bot
Copy link
Collaborator Author

From TheSa...@gmail.com on 2014-09-19T00:08:54Z

@-ccordoba12
No, I am pretty sure the warning did not show up. I can try it now because at work I am not using the latest version.

@spyder-bot
Copy link
Collaborator Author

From ccordoba12 on 2014-09-21T10:49:20Z

I think a short term solution to this problem is the following (until we find time to put every array editor in a thread):

  1. Check the array size (and/or memory?)
  2. If the size is bigger than, say, 1e5 or 5e4 elements, show a warning to the user about the array being too big and that Spyder can't load it without most probably freezing.
  3. Show the first 1e4 elements as a fallback, and tell the user in the previous warning that that's the best Spyder can do for the user.

What do you think @TheSabry and @-dhoegh91? I think this is something achievable in 2.3.2, and it'll improve the situation for the better.

Summary: Pandas DataFrame in variable explorer can crash the app if it gets out of memory (was: pandas DataFrame in variable explorer can crash the app if it gets out of memory)
Status: Accepted
Labels: Easy

@spyder-bot
Copy link
Collaborator Author

From TheSa...@gmail.com on 2014-09-22T04:09:52Z

Frankly, I cannot see a case use for someone to display 10e5 elements. If someone is really willing to work directly on the data he will have better time to export them as csv and use another software after that.
I am working a lot with dataframe and my guess would be that 99% of the time it is more than enough to display first 10 and last 10 rows (difaut behaviour of python and Ipython I reckon). But as a future development really nice addon would be a possiblty to CTR+F within the variable editor: e.g possibility to disply data for a specific date, ....
What do you think?

@spyder-bot
Copy link
Collaborator Author

From ccordoba12 on 2014-11-01T16:46:28Z

Labels: -MS-v2.3.2 MS-v2.3.3

@spyder-bot
Copy link
Collaborator Author

From ccordoba12 on 2014-11-15T10:08:19Z

I fixed this one by instructing the viewer to load only 500 rows at a time, if the DataFrame has more than 1e5 rows.

I also fixed a huge bottleneck in the viewer that was consuming lots of CPU cycles for big DataFrames.

@-TheSabry: Thanks for your help with this one, and please remember that to see the tail of the DataFrame, you can sort it in descending order by pressing the "Index" column.

Labels: -MS-v2.3.3 MS-v2.3.2

@spyder-bot
Copy link
Collaborator Author

From ccordoba12 on 2014-11-15T10:22:25Z

This issue was updated by revision 8e7b70b8e3ba .

  • Now we initially load a small number of rows and load the rest on demand

Status: Started

@spyder-bot
Copy link
Collaborator Author

From ccordoba12 on 2014-11-15T10:22:26Z

This issue was updated by revision 5f81557aed15 .

  • The problem was the DataFrame index was not cached and updated only when
    needed, which generated terrible high CPU usage and delays when browsing its
    contents for large DataFrames

Status: Fixed

@spyder-bot
Copy link
Collaborator Author

From dhoeg...@gmail.com on 2014-11-15T10:29:40Z

It is awesome you got rid of that bottleneck in both array and dataframe views.

@spyder-bot
Copy link
Collaborator Author

From TheSa...@gmail.com on 2014-11-17T00:22:16Z

@-ccordoba12: really nice to have this fixed ... well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant