Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapting Daniel's code for contributor graphs #944

Merged
merged 27 commits into from
Sep 14, 2021

Conversation

rando2
Copy link
Contributor

@rando2 rando2 commented Apr 28, 2021

Description of the proposed additions or changes

This still needs some work, but it generates something resembling the graphs that Daniel made for the Deep Review. There are a lot of compatibility issues that I'm whack-a-mole-ing but hopefully in the homestretch now. One figure worked, the other two there is some ggplot error error I'll work on tomorrow.

Also just some silly stuff that needs to be cleaned up, like I didn't update the output file names and I can't remember a lot of people's github username to real name mappings, so some of those are very very wrong (like just people's usernames a second time!)

Related issues

Suggested reviewers (optional)

Checklist

  • Text is formatted so that each sentence is on its own line.
  • Pre-prints cited in this pull request have a GitHub issue opened so that they can be reviewed.

@rando2 rando2 marked this pull request as draft April 28, 2021 00:47
@rando2 rando2 added the Methods Strategies for review label Apr 28, 2021
@rando2 rando2 marked this pull request as ready for review April 28, 2021 16:49
@rando2
Copy link
Contributor Author

rando2 commented Apr 28, 2021

OK! This can now create two figures and the stats file about the commits. I was trying to get a third figure to show contribution by career stage, but I'm going to give up on that for now because it's non-trivial and doesn't seem like something we completely need. There is some issue with one of the date packages that is causing a lot of issues and made it so that third graph really isn't usable, regardless (I was hoping if I switched to career stage, it would be more readable, but I don't think I have time for the data manipulation required to do that!)

Here are the figs that worked for easy viewing:
covid19-review-contribution-dot
covid19-review-contribution-ridge

@rando2 rando2 requested review from agitter and mprobson April 28, 2021 16:51
@cbrueffer
Copy link
Collaborator

Very cool! I think the colors in figure 1 may be a bit hard to interpret. It's looks like most people only deleted code, when in reality I suppose the number of additions/deletions is equal, and red it just plotted last? Maybe replacing each circle with a small bar plot may work, although no idea how hard to implement.

If this is either/or between the two plots, I think the first one is more expressive.

One small nit, maybe Ashwin N. Skelly (dot after N) for consistency.

@rando2
Copy link
Contributor Author

rando2 commented Apr 28, 2021

Quick update, based on convo with Casey I changed to log scale -- this does look better but we would need to figure out how to toggle the y-axis scaling to get rid of the overlap
covid19-review-contribution-ridge

@agitter
Copy link
Collaborator

agitter commented Apr 29, 2021

It's looks like most people only deleted code, when in reality I suppose the number of additions/deletions is equal, and red it just plotted last?

Yes, that's what's happening. If contributors primarily edit existing lines, those edited lines count as both added and deleted in the commit. One solution would be to plot additions only and caption the figure appropriately to describe that as edited and added lines.

I liked the ridge plot in the original deep review summary, but I like the dot plot more here. Even with the log scaling, the ridge plot has a lot of lines that look mostly horizontal. I think the dot plot more accurately reflects the activity on the manuscript. There was a big burst in spring 2020 that established the foundation. Then, the number of active contributors diminished over time, but several contributors remained active editing the text.

I'm curious what my massive deletion was in August 2020 😄

@rando2
Copy link
Contributor Author

rando2 commented Apr 30, 2021

@cgreene @cbrueffer @agitter Here is a plot of just additions!
covid19-review-contribution-dot

@agitter
Copy link
Collaborator

agitter commented Apr 30, 2021

I like it!

@cbrueffer
Copy link
Collaborator

Nice! I agree that the number of deletions is probably not so interesting here, so leaving them out is a good option.

@agitter agitter mentioned this pull request Aug 27, 2021
28 tasks
@rando2
Copy link
Contributor Author

rando2 commented Aug 30, 2021

@agitter I think this code would now work to be integrated with our workflow. I added a .sh script to run the notebooks (adapted from Daniel's Deep Review notebooks) and also added these figures to the versioning. However, I have generally only used jupyter notebooks for teaching, so had to look up a lot of these solutions -- I'm happy to change anything that could be handled better.

The one trick here is that it won't work in the base covid19-review environment, so I think I will need to also modify .github/workflows/update-external-resources.yaml to run these scripts first in a separate environment, then switch to the normal external-resources environment to run the rest of the figure scripts and then version everything. This page does seem to outline a solution to a similar problem, but it seems like more of an overhaul than we probably want since this is a one-off at present.

@rando2 rando2 requested a review from ajlee21 August 30, 2021 20:00
Copy link
Contributor

@ajlee21 ajlee21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool analysis and visualization! I'm not sure if I can follow all the scaping calculations performed, but the everything seems reasonable.

I know you log-scaled the values, did you also log-scale the size of the circle marker too?

Just to clarify, is this a visual meant to help inform authorship order? If so, I wonder if there are additional metrics that would be useful like "number of reviews/comments" in addition to text edits?

@agitter agitter mentioned this pull request Sep 2, 2021
2 tasks
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. I have a few small suggestions, but it is nearly ready to merge. I reviewed the main analysis code lightly because it is derived from Daniel's code that we used successfully in another project.

]
}
],
"source": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this block? We already have different ways to track the number of authors that might conflict with this count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

@rando2
Copy link
Contributor Author

rando2 commented Sep 13, 2021

@agitter here is the updated figure:
image

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great! Let's merge it and add it.

I will also work on adding this to a workflow at some point post-submission.

We could probably remove "username" from the y-axis label on the dot plot now that we have full names. Let's leave it for now. I can manually crop it out if we want to remove it for the DISCO submission.

@agitter agitter merged commit e12aa31 into greenelab:external-resources Sep 14, 2021
@agitter
Copy link
Collaborator

agitter commented Sep 14, 2021

I merged this so I can test it within the manuscript.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Methods Strategies for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants