Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize whitespace in column names from statcast calls #280

Merged
merged 2 commits into from
Aug 3, 2022

Conversation

tjburch
Copy link
Collaborator

@tjburch tjburch commented Jul 27, 2022

This addresses #279, in which there's a whitespace in first_name. The issue was flagged for statcast_batter_exitvelo_barrels, however it permeated to most the statcast calls. I implemented an effectively one-line solution in utils that strips any whitespace from the column names of the statcast calls, and imported the new function and used where needed.

@tjburch
Copy link
Collaborator Author

tjburch commented Jul 27, 2022

Hm. Seems to be failing the test test_statcast_pitcher_exitvelo_barrels (and similar tests) on:

assert len(result.columns) == 19

Seems to be getting 18 columns now.

I just tried commenting out my changes to see if I could get back the 19 and still managed to get 18. Can do some more digging into this later but if anyone has an idea where the 19 came from, that'd be helpful. For reference I see:

['last_name', 'first_name', 'player_id', 'attempts', 'avg_hit_angle',
       'anglesweetspotpercent', 'max_hit_speed', 'avg_hit_speed', 'fbld', 'gb',
       'max_distance', 'avg_distance', 'avg_hr_distance', 'ev95plus',
       'ev95percent', 'barrels', 'brl_percent', 'brl_pa']

@tjburch
Copy link
Collaborator Author

tjburch commented Jul 31, 2022

Ok, so it seems like even in the last version the Statcast calls only return 18 columns:

>>> from pybaseball.statcast_pitcher import statcast_pitcher_exitvelo_barrels
>>> from importlib.metadata import version
>>> version('pybaseball')
'2.2.1'
>>> df = statcast_pitcher_exitvelo_barrels('2020')
>>> len(df.columns)
18
>>> df.columns
Index(['last_name', ' first_name', 'player_id', 'attempts', 'avg_hit_angle',
       'anglesweetspotpercent', 'max_hit_speed', 'avg_hit_speed', 'fbld', 'gb',
       'max_distance', 'avg_distance', 'avg_hr_distance', 'ev95plus',
       'ev95percent', 'barrels', 'brl_percent', 'brl_pa'],
      dtype='object')

So I'm just going to update the tests and move on with life

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants