-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: read_excel return empty dataframe when using usecols #20480
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20480 +/- ##
=======================================
Coverage 91.89% 91.89%
=======================================
Files 153 153
Lines 49596 49596
=======================================
Hits 45576 45576
Misses 4020 4020
Continue to review full report at Codecov.
|
pandas/io/excel.py
Outdated
@@ -479,6 +482,9 @@ def _excel2num(x): | |||
return i <= usecols | |||
elif isinstance(usecols, compat.string_types): | |||
return i in _range2cols(usecols) | |||
elif all(isinstance(x, compat.string_types) for x in usecols) is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the is True
pandas/io/excel.py
Outdated
@@ -479,6 +482,9 @@ def _excel2num(x): | |||
return i <= usecols | |||
elif isinstance(usecols, compat.string_types): | |||
return i in _range2cols(usecols) | |||
elif all(isinstance(x, compat.string_types) for x in usecols) is True: | |||
usecols_str = ",".join(usecols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a 1-line comment on this case (and the case above to differentiate)
pandas/tests/io/test_excel.py
Outdated
@@ -179,6 +179,42 @@ def test_usecols_str(self, ext): | |||
tm.assert_frame_equal(df2, df1, check_names=False) | |||
tm.assert_frame_equal(df3, df1, check_names=False) | |||
|
|||
def test_usecols_str_list(self, ext): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the gh issue as a comment
pandas/tests/io/test_excel.py
Outdated
def test_usecols_str_list(self, ext): | ||
|
||
dfref = self.get_csv_refdf('test1') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thank you for your feedback! =)
I'm not sure I've got it right. Do you want me to parametrize the objects passed as argument to parse_cols argument to each inside of test_usecols_str_list test because its tests are a copy of test_usecols_str or you want me to parametrize something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are testing multiple cases that are pretty similar. the test becomes much simpler if you can parameterize over the possible cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I got it. Do you want me to do the same to the test above (test_usecols_str)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jreback. I implemented all requested changes in the code.
I would like to know your opinion about a suggestion made by @chris-b1 at #20480 (comment).
171c9b6
to
fca0ae0
Compare
This looks OK, but re-reading it, #18273 is actually two somewhat separate problems and we have a bit of an API tangle here
Number 2 wasn't part of the Line 529 in 1915ffc
Not sure what the solution is, there could be ambiguous cases (column titled Not saying you need to address item 2 in this PR, just need a new issue if not. |
Hi @chris-b1, thank you for your comment. I tried to test what you said, but I don't get it. | Day | Number | Animal |
|----------|--------|--------|
| 10/12/18 | 5 | Tatu |
| 10/13/18 | 4 | Paca | I tried the following:
If someone create feature or issue I may can work to make possible to pass col names as arguments to read just them. |
Right, that doesn't work on print(pd.__version__)
# 0.20.3
df = pd.DataFrame({'foo': [1, 2, 3], 'bar': [4, 5, 6]})
df.to_excel('tmp.xlsx', index=False)
pd.read_excel('tmp.xlsx', usecols=['bar'])
bar
0 4
1 5
2 6 |
4aa25c2
to
445d94a
Compare
Hello @jacksonjos! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on June 09, 2018 at 22:45 Hours UTC |
02c4867
to
beb5b2c
Compare
Ok, I got it, @chris-b1. It isn't better to create another issue to implement the possibility to choose columns you want to load by column name using another named argument? I think that might be confusing use |
Is there anything wrong with the requested changes that I implemented to make the pull requested fulfill the requirements to be accepted? I'm asking because I don't understand why this pull requested it wasn't accepted yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waiting for @chris-b1 to have a look for this. I am concerned that usecols is now ambiguous, yes?
pandas/io/excel.py
Outdated
column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of | ||
column ranges (e.g. "A:E" or "A,C,E:F") to be parsed. Ranges are | ||
inclusive of both sides. | ||
* If list of strings each string shall be a Excel column letter or column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a -> an
Yeah, I've been a bit stuck on what to do here. Current (0.22) behavior where I suppose the only reasonable solution is two keywords.
|
|
OK, @jacksonjos here's what I propose. Most painful part will be backwards compat & testing. If you don't want to take it all the way, just let me know and I'll finish, would like to get this in for 0.23.
|
Hi @chris-b1 and @jreback.
Thank you for your answers. i'm already working on this issue and I'll
update the pull request in the coming days following your suggestions.
If I have some trouble I ask for help.
Best regards
2018-04-16 16:56 GMT-03:00 chris-b1 <notifications@github.com>:
… OK, @jacksonjos <https://github.com/jacksonjos> here's what I propose.
Most painful part will be backwards compat & testing. If you don't want to
take it all the way, just let me know and I'll finish, would like to get
this in for 0.23.
1. Add a new kwarg usecols_excel which replicates the current usecols
+ your fix here
2. Since we're breaking compat, inspect usecols and if it is string
AND parses to an excel region, issue a warning that the arg was renamed.
This could raise a false positive for a single letter (e.g. 'A') but
don't see a way around it.
3. usecols should get passed to the TextParser object, which will use
it in the manner @jreback <https://github.com/jreback> suggests.
https://github.com/pandas-dev/pandas/blob/
1915ffc/pandas/io/excel.py#L529
<https://github.com/pandas-dev/pandas/blob/1915ffc53ea60494f24d83844bbff00efa392c82/pandas/io/excel.py#L529>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20480 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AANuQu9txp5CMJXR_WSI3THEn5GK_RKXks5tpPdngaJpZM4S6AzV>
.
|
Hi @jreback and @chris-b1, I'm about to finish the new version of this pull request and I would like to know your opinion about what to do when someone pass both arguments, usecols_excel and usecols, to read_excel function? Should I raise an exception, print some warning or do nothing? Thanks in advance. |
I would raise.
…On Sun, Apr 22, 2018, 8:16 PM Jackson Souza ***@***.***> wrote:
Hi @jreback <https://github.com/jreback> and @chris-b1
<https://github.com/chris-b1>, I'm about to finish the new version of
this pull request and I would like to know your opinion about what to do
when someone pass both arguments, usecols_excel and usecols, to read_excel
function?
Should I raise an exception, print some warning or do nothing?
Thanks in advance.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20480 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB1b_HG6hbf_x2bJTX9HfXQf7H6ieEtnks5trUdwgaJpZM4S6AzV>
.
|
a747961
to
6143a0f
Compare
Hey, guys! I did a new push. All requested changes are in the commit. I didn't understand why one of CI tests didn't pass because I do not see any relationship between the tests that did not pass and the feature I implemented/corrected. Do you have any idea? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good will try it out tonight, can you also add a note to the excel section in io.rst
pandas/io/excel.py
Outdated
example of a valid callable argument would be ``lambda x: x.upper() in | ||
['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster | ||
parsing time and lower memory usage. | ||
usecols_excel : int or list, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string, int, or list
I'd also a bit of narrative before the bullets, something like:
Columns to parse from the spreadsheet, specified as Excel location references
pandas/io/excel.py
Outdated
row.append(_parse_cell(value, typ)) | ||
data.append(row) | ||
|
||
# Check if some string in usecols may be interpreted as a Excel | ||
# positional column | ||
if (usecols is not None) and (not callable(usecols)) and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try it out later, but I think this warning is over-specified. Since usecols
only used to work with excel ranges as strings, not lists of strings, I would limit the warning to that case.
pandas/tests/io/test_excel.py
Outdated
df1 = self.get_csv_refdf('test1')[['B', 'D']] | ||
|
||
with tm.assert_produces_warning(UserWarning): | ||
df2 = self.get_exceldf('test1', ext, 'Sheet1', usecols=['B', 'D']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As state above I'm not sure this should warn, since it didn't previously work. But I would like a test for the warning, something like usecols='B,D'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't pass a string to usecols
because if I do it throws an error. I have already tested it.
So, does not make sense maintain the warning just to a string type parameter unless you do not want the error be thrown.
Considering this I wrote a test to check this behavior.
Do I should remove the warning, then?
I may enable backward compatibility for usecols
if you want.
I could check if usecols
contains a string which contains Excel index columns and ranges and if ti does I would I could do something like usecols_excel=usecols
, usecols=None
, throw the warning I've already written and place a comment that this would be removed in 0.24 pandas version.
What do you think?
The other changes you asked me are done. I'm just waiting to know what to do about this warning to commit again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I mistakenly thought we needed to worry about the case like usecols='B'
.
Yeah, I agree with what you proposed, handle the back compat case with a warning, we can eventually deprecate that path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused here. IIUC usecols_excel was only going to accep the column ranges, and usecols for everything else? these HAVE to be orthogonal parametes, otherwise this is very confusing.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -856,6 +856,7 @@ Other API Changes | |||
- Constructing a Series from a list of length 1 no longer broadcasts this list when a longer index is specified (:issue:`19714`, :issue:`20391`). | |||
- :func:`DataFrame.to_dict` with ``orient='index'`` no longer casts int columns to float for a DataFrame with only int and float columns (:issue:`18580`) | |||
- A user-defined-function that is passed to :func:`Series.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, :func:`DataFrame.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, or its expanding cousins, will now *always* be passed a ``Series``, rather than a ``np.array``; ``.apply()`` only has the ``raw`` keyword, see :ref:`here <whatsnew_0230.enhancements.window_raw>`. This is consistent with the signatures of ``.aggregate()`` across pandas (:issue:`20584`) | |||
- Changed the named argument `usecols` at :func:`read_excel` to `usecols_excel` that receives a list of index numbers or A1 index to select the columns that must be in the DataFrame, so the `usecols` argument can serve its purpose to select the columns that must be in the DataFrame using column labels (:issue:`18273`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please simplify this to make it more readable.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -1166,6 +1167,7 @@ I/O | |||
- Bug in :func:`DataFrame.to_latex()` where a ``MultiIndex`` with an empty string as its name would result in incorrect output (:issue:`18669`) | |||
- Bug in :func:`read_json` where large numeric values were causing an ``OverflowError`` (:issue:`18842`) | |||
- Bug in :func:`DataFrame.to_parquet` where an exception was raised if the write destination is S3 (:issue:`19134`) | |||
- Bug in :func:`read_excel` where `usecols_excel` named argument as a list of strings were returning a empty DataFrame (:issue:`18273`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this a bug? usecols_excel
doesn't exist yet
pandas/io/excel.py
Outdated
`usecols` parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element | ||
order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]`` and | ||
``usecols=['foo', 'bar']`` is the same as ``['bar', 'foo']``. | ||
To instantiate a DataFrame from ``data`` with element order preserved use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed here, starting with keeping ordering (or you can put in the Notes section if you really want)
pandas/io/excel.py
Outdated
example of a valid callable argument would be ``lambda x: x.upper() in | ||
['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster | ||
parsing time and lower memory usage. | ||
usecols_excel : int or list, default None | ||
* If None then parse all columns, | ||
* If int then indicates last column to be parsed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldusecols_excel
only accept column ranges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd be in favor of letting usecols
handle the integer location case and making usecols_excel
only handle column ranges
capability of passing column labels for columns to be read - [x] closes pandas-dev#18273 - [x] tests added / passed - [x] passes git diff master --name-only -- "*.py" | grep "pandas/" | xargs -r flake8 - [x] whatsnew entry Created 'usecols_excel' that receives a string containing comma separated Excel ranges and columns. Changed 'usecols' named argument, now it receives a list of strings containing column labels or a list of integers representing column indexes or a callable for 'read_excel' function. Created and altered tests to reflect the new usage of these named arguments. 'index_col' keyword used to indicated which columns in the subset of selected columns by 'usecols' or 'usecols_excel' that should be the index of the DataFrame read. Now 'index_col' indicates which columns of the DataFrame will be the index even if that column is not in the subset of the selected columns.
6c6eede
to
e257100
Compare
I re-read your explanation but I'm honestly still not following why the change to In [6]: pd.read_excel('pandas/pandas/tests/io/data/test1.xlsx').to_csv('tmp.csv')
In [7]: !head tmp.csv
,A,B,C,D
2000-01-03,0.980268513777,3.68573087906,-0.364216805298,-1.15973806169
2000-01-04,1.04791624281,-0.0412318367011,-0.16181208307,0.212549316967
2000-01-05,0.498580885705,0.731167677815,-0.537677223318,1.34627041952
2000-01-06,1.12020151869,1.56762092543,0.00364077397681,0.67525259227
2000-01-07,-0.487094399463,0.571454623474,-1.6116394093,0.103468562917
2000-01-10,0.836648671666,0.246461918642,0.588542635376,1.0627820613
2000-01-11,-0.157160753327,1.34030689438,1.19577795622,-1.09700699751
In [8]: pd.read_csv('tmp.csv', usecols=[0, 2, 3], index_col=0)
Out[8]:
B C
2000-01-03 3.685731 -0.364217
2000-01-04 -0.041232 -0.161812
2000-01-05 0.731168 -0.537677
2000-01-06 1.567621 0.003641
2000-01-07 0.571455 -1.611639
2000-01-10 0.246462 0.588543
2000-01-11 1.340307 1.195778
In [9]: pd.read_csv('tmp.csv', usecols=[0, 2, 3], index_col=0, engine='python')
Out[9]:
B C
2000-01-03 3.685731 -0.364217
2000-01-04 -0.041232 -0.161812
2000-01-05 0.731168 -0.537677
2000-01-06 1.567621 0.003641
2000-01-07 0.571455 -1.611639
2000-01-10 0.246462 0.588543
2000-01-11 1.340307 1.195778 |
Take a look at the examples below comparing the current pandas behavior to the new one,
|
@jacksonjos can you show the proposed right next to the existing. and use separators between so its clear. |
can you rebase / update |
Hi @jreback,
Unfortunately, I don't have time to work on this project this semester. I
tried to hard to contribute to it last semester, but my efforts weren't
enough to finish the job.
So, I advise you to ask for someone else to continue this work.
Best regards
Le mar. 25 sept. 2018 à 13:41, Jeff Reback <notifications@github.com> a
écrit :
… can you rebase / update
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20480 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AANuQo_oAkZ8I_yXV8SzqfvIn2WcmlCiks5uelzHgaJpZM4S6AzV>
.
|
@gfyoung if you wouldn't mind rebasing this |
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified two major bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
closing in favor of #23544 |
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes gh-18273. Closes gh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.
As mentioned read_excel returns an empty DataFrame when usecols argument is a list of strings.
Now lists of strings are correctly interpreted by read_excel function.