-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapting Daniel's code for contributor graphs #944
Conversation
OK! This can now create two figures and the stats file about the commits. I was trying to get a third figure to show contribution by career stage, but I'm going to give up on that for now because it's non-trivial and doesn't seem like something we completely need. There is some issue with one of the date packages that is causing a lot of issues and made it so that third graph really isn't usable, regardless (I was hoping if I switched to career stage, it would be more readable, but I don't think I have time for the data manipulation required to do that!) |
Very cool! I think the colors in figure 1 may be a bit hard to interpret. It's looks like most people only deleted code, when in reality I suppose the number of additions/deletions is equal, and red it just plotted last? Maybe replacing each circle with a small bar plot may work, although no idea how hard to implement. If this is either/or between the two plots, I think the first one is more expressive. One small nit, maybe Ashwin N. Skelly (dot after N) for consistency. |
Yes, that's what's happening. If contributors primarily edit existing lines, those edited lines count as both added and deleted in the commit. One solution would be to plot additions only and caption the figure appropriately to describe that as edited and added lines. I liked the ridge plot in the original deep review summary, but I like the dot plot more here. Even with the log scaling, the ridge plot has a lot of lines that look mostly horizontal. I think the dot plot more accurately reflects the activity on the manuscript. There was a big burst in spring 2020 that established the foundation. Then, the number of active contributors diminished over time, but several contributors remained active editing the text. I'm curious what my massive deletion was in August 2020 😄 |
@cgreene @cbrueffer @agitter Here is a plot of just additions! |
I like it! |
Nice! I agree that the number of deletions is probably not so interesting here, so leaving them out is a good option. |
@agitter I think this code would now work to be integrated with our workflow. I added a The one trick here is that it won't work in the base covid19-review environment, so I think I will need to also modify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool analysis and visualization! I'm not sure if I can follow all the scaping calculations performed, but the everything seems reasonable.
I know you log-scaled the values, did you also log-scale the size of the circle marker too?
Just to clarify, is this a visual meant to help inform authorship order? If so, I wonder if there are additional metrics that would be useful like "number of reviews/comments" in addition to text edits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I have a few small suggestions, but it is nearly ready to merge. I reviewed the main analysis code lightly because it is derived from Daniel's code that we used successfully in another project.
contrib-viz/03.contrib-stats.ipynb
Outdated
] | ||
} | ||
], | ||
"source": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this block? We already have different ways to track the number of authors that might conflict with this count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed!
@agitter here is the updated figure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look great! Let's merge it and add it.
I will also work on adding this to a workflow at some point post-submission.
We could probably remove "username" from the y-axis label on the dot plot now that we have full names. Let's leave it for now. I can manually crop it out if we want to remove it for the DISCO submission.
I merged this so I can test it within the manuscript. |
Description of the proposed additions or changes
This still needs some work, but it generates something resembling the graphs that Daniel made for the Deep Review. There are a lot of compatibility issues that I'm whack-a-mole-ing but hopefully in the homestretch now. One figure worked, the other two there is some ggplot error error I'll work on tomorrow.
Also just some silly stuff that needs to be cleaned up, like I didn't update the output file names and I can't remember a lot of people's github username to real name mappings, so some of those are very very wrong (like just people's usernames a second time!)
Related issues
Suggested reviewers (optional)
Checklist