Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fangraphs missing pitching data #214

Closed
michaelmdresser opened this issue Jun 7, 2021 · 3 comments
Closed

Fangraphs missing pitching data #214

michaelmdresser opened this issue Jun 7, 2021 · 3 comments

Comments

@michaelmdresser
Copy link

Fangraphs pitching data appears to be missing many players. Here's a code snippet:

import pybaseball

pstats = pybaseball.pitching_stats(2021)
print(f"Rows for fangraphs pitching stats for 2021: {len(pstats)}")

pstats_bref = pybaseball.pitching_stats_bref(2021)
print(f"Rows for Baseball Reference pitching stats for 2021: {len(pstats_bref)}")

player_ids = pybaseball.playerid_reverse_lookup([593334], key_type="mlbam").iloc[0]
print(player_ids)

fg_stats = pstats.loc[pstats["IDfg"] == player_ids["key_fangraphs"]]
print(f"Fangraphs rows for pitcher Domingo Germán: {len(fg_stats)}")

bref_stats = pstats_bref.loc[pstats_bref["Name"].str.contains("Domingo")]
print(f"Baseball Reference rows for pitcher Domingo Germán: {len(bref_stats)}")

And the output I get:

Rows for fangraphs pitching stats for 2021: 66
Rows for Baseball Reference pitching stats for 2021: 646
Gathering player lookup table. This may take a moment.
name_last              german
name_first            domingo
key_mlbam              593334
key_retro            germd001
key_bbref           germado01
key_fangraphs           17149
mlb_played_first       2017.0
mlb_played_last        2021.0
Name: 0, dtype: object
Fangraphs rows for pitcher Domingo Germán: 0
Baseball Reference rows for pitcher Domingo Germán: 2

66 pitchers for the whole season of 2021 is way too low. Fangraphs has Domingo's data: https://www.fangraphs.com/players/domingo-german/17149/stats?position=P as well.

It looks like the code is just querying the "Leaders" section of Fangraphs

QUERY_ENDPOINT: str = _FG_LEADERS_URL
, which if I visit https://www.fangraphs.com/leaders.aspx?pos=all&stats=pit&lg=all&qual=y&type=8&season=2021&month=0&season1=2021&ind=0&page=3_30 has 66 players just like what I'm experiencing in the code.

Based on the documentation, I'd expect pitching_stats to have data for all players from the season. Is that wrong?

@schorrm
Copy link
Collaborator

schorrm commented Jun 7, 2021

Whoops! Out of date documentation, but see #213 which fixes this.

@schorrm schorrm closed this as completed Jun 7, 2021
@johnclary
Copy link
Contributor

and so @michaelmdresser here's your fix:

pstats = pybaseball.pitching_stats(2021, qual=1)

@michaelmdresser
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants