-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas 0.19 read_csv with header=[0, 1] on an empty df throws error #14515
Comments
@kaloramik So the change is not in In versions < 0.19.0, the file looks like:
while in 0.19.0 it looks like (what you showed above):
So previously there was an extra line with empty values. Reading this in with 0.19.0 still gives your desired result of an empty frame:
(however, something could be said this should actually give you one row of NaNs) So the change is in
while in 0.18.0 there was an extra line with comma's:
This was a bug (since you don't have any data, there should not be a line of missing values), and this bug was fixed in 0.19.0, see #6618 |
@jorisvandenbossche hmm really? That's not what I'm seeing at all. Is it possible I have a package thats screwing something up? Can you post your pd.show_versions? But looking at the behavior, shouldn't the expected behavior be what I posted? As in, if you read in a file of length 2, and your headers are taken up to by 2 lines, then it should return an empty df with those columns. I believe the same behavior applies for a single header. The error message doesn't seem to make sense
it DOES have 2 lines in the file, so it should be able to construct the header. In addition, the source code has the following comment
According to the comment, the function should fail if the file has less than len(header) lines, implying that the function should succeed if len(header) == len(lines). Does that sound right? |
Oh actually, scratch that, you are right about 0.18.1 returning an extra line of commas (And so the read_csv succeeds I guess) But this breaks behavior now, as in my data pipelines, I am unable to write then read empty dataframes as before. I think the above behavior I described is still the desired one? Unless you have better workarounds? ( I don't think replicating the old behavior by forcibly adding a row of commas would be a good idea) |
Possibly. But I am just pointing out that it is not a change in Apart from that, it is worth discussing if we should allow this. IMO returning an empty frame is indeed more logical to do. |
The bug fix in
Note that also for a single header, once you pass the
|
Got it. Thanks for the clarification! Actually as a temporary workaround I guess forcing a write of an empty row on empty data frames should be ok. Do you know if there are any other workarounds, perhaps from the read side? |
Hmm, I don't directly see a workaround on the read side. If you want to end up with the multi-index, I don't think there is an easy solution. Probably easier to temporarily fix on the write side as you point out. |
closes pandas-dev#14515 This commit fixes a bug where `read_csv` failed when given a file with a multiindex header and empty content. Because pandas reads index names as a separate line following the header lines, the reader looks for the line with index names in it. If the content of the dataframe is empty, the reader will choke. This bug surfaced after pandas-dev#6618 stopped writing an extra line after multiindex columns, which led to a situation where pandas could write CSV's that it couldn't then read. This commit changes that behavior by explicitly checking if the index name row exists, and processing it correctly if it doesn't. Author: Ben Kandel <ben.kandel@gmail.com> Closes pandas-dev#14596 from bkandel/fix-parse-empty-df and squashes the following commits: 32e3b0a [Ben Kandel] lint e6b1237 [Ben Kandel] lint fedfff8 [Ben Kandel] fix multiindex column parsing 518982d [Ben Kandel] move to 0.19.2 fc23e5c [Ben Kandel] fix errant this_columns 3d9bbdd [Ben Kandel] whatsnew 68eadf3 [Ben Kandel] Modify test. 17e44dd [Ben Kandel] fix python parser too 72adaf2 [Ben Kandel] remove unnecessary test bfe0423 [Ben Kandel] typo 2f64d57 [Ben Kandel] pep8 b8200e4 [Ben Kandel] BUG: read_csv with empty df (cherry picked from commit f862b52)
Pandas 0.19 incorrectly handles empty dataframe files with multi index columns
What the file looks like
Expected Output
yields what we expect, an empty MultiIndex data frame
Throws
Expected Output
Output of
pd.show_versions()
For pandas 0.81
For pandas 0.19
The text was updated successfully, but these errors were encountered: