-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Better error message if usecols doesn't match columns #17310
Changes from 2 commits
b0e102a
15d4786
841a6cc
dced1b7
5bf89a8
ba93833
8a06cee
1afb4c1
5a8a852
2209eae
93185f5
5dfccdb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1662,14 +1662,22 @@ def __init__(self, src, **kwds): | |
# GH 14671 | ||
if (self.usecols_dtype == 'string' and | ||
not set(usecols).issubset(self.orig_names)): | ||
raise ValueError("Usecols do not match names.") | ||
missing = [c for c in usecols if c not in self.orig_names] | ||
raise ValueError( | ||
"Usecols do not match columns, " | ||
"columns expected but not found: {}".format(missing) | ||
) | ||
|
||
if len(self.names) > len(usecols): | ||
self.names = [n for i, n in enumerate(self.names) | ||
if (i in usecols or n in usecols)] | ||
|
||
if len(self.names) < len(usecols): | ||
raise ValueError("Usecols do not match names.") | ||
missing = [c for c in usecols if c not in self.names] | ||
raise ValueError( | ||
"Usecols do not match columns, " | ||
"columns expected but not found: {}".format(missing) | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same |
||
|
||
self._set_noconvert_columns() | ||
|
||
|
@@ -2442,6 +2450,14 @@ def _handle_usecols(self, columns, usecols_key): | |
raise ValueError("If using multiple headers, usecols must " | ||
"be integers.") | ||
col_indices = [] | ||
|
||
missing = [c for c in self.usecols if c not in usecols_key] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm unsure what design patterns Pandas follows for this kind of thing, but would you rather me drop this down into a try/catch for I worry this initial approach adds needless computation time - but unsure if it's more readable? Let me know 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you elaborate further on that logic you provided: Secondly, don't worry about computation time. Get a working implementation first. Then we'll worry about optimizing, if need be. Chances are that won't be an issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, at the moment the error is being raised a few lines further down when we're looking for the index of the column in the The alternate proposal would be something like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, understood. Yes, absolutely, let's do |
||
if len(missing) > 0: | ||
raise ValueError( | ||
"Usecols do not match columns, " | ||
"columns expected but not found: {}".format(missing) | ||
) | ||
|
||
for col in self.usecols: | ||
if isinstance(col, string_types): | ||
col_indices.append(usecols_key.index(col)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use keyword arguments in the string formatting. Same for your other error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I see a lot of duplicate code here. Let's abstract into a method (you can just create a private method outside of the class). That will make your life easier.