CitationProfileR

About

CitationProfileR is an R package and Shiny web app that allows users to upload a PDF or citation file and to get statistics on the gender and geographic distribution of the citations they include. These visualizations will be provided for download, and summarized and visualized in a form that is publication-ready. The package uses data from various web service, like Crossref API, GROBID API, Gender-API, and Open Street Map, as well as the data extracted from the uploaded files.

Contributors

Contributions	Name
🔢 💻 🤔	Adriana Beltran Andrade
🔢 💻 🤔	Lika Mikhelashvili
🔢 💻 🤔	Mackie Zhou
🔢 💻 🤔	Rithika Devarakonda
🔢 🧑‍🏫	Lukas Wallrich

Definitions

Citations - A reference to a source of information in a academic paper. Citations typically include information such as author names, article title, DOI, date of publication, etc.

Diversity Statement - A diversity statement of an academic journal is a statement that acknowledges the gender and/or racial imbalance within a scientific field. The diversity statement motivates researchers to pay particular attention to the gender and racial breakdown of the authors cited in their work. It recognizes existing biases and aims for greater inclusivity in the field.

How to Access

CitationProfileR Shiny dashboard can be accessed through downloading the package along with an external hosting on an html website that will be accessible through search engines.

The link for the hosted dashboard is: http://127.0.0.1:4955

A user can launch the Shiny dashboard by first finding the app.R script in which the respective file path is: citationProfileR/inst/CitationProfileR/app.R. Once opening the file, all one needs to do is click on the run app tab at the top of the file in Rstudio.

Dependencies

There are no special dependencies. All one needs is Rstudio downloaded and installed in the latest version.

How to Install

You can install the development version of CitationProfileR from GitHub with:

# install.packages("devtools")
devtools::install_github("LukasWallrich/citationProfileR")

Functions/Datasets Included

Our package includes the following functions, which allows the user to extract information from all authors included in the paper uploaded to our app along with returning the gender prediction per every name as well. Also, they can retrieve a diversity statement and see a bar plot with the count per gender in the web app as well.

first_name_check takes in data frame of extracted citations returned from GROBID API and returns first name of every author
get_author_info takes in data frame that contains every cited author's name, paper title, and date published and returns first and last name of all cited authors from Crossref API
guess_gender takes in a cited author's name, geographic location based on country code, as well as if the user wants to use the cache feature which remembers previous predictions based on a name used in earlier iterations in order to return a data frame containing the author's name, location, and associated gender prediction and accuracy measure from GenderAPI
parse_pdf_refs takes in a pdf uploaded from a user containing a works cited page and returns the isolated references of every cited author and their respective work from GROBID
get_location takes in a data frame of all cited author's affiliations and uses Crossref API in order to return a data frame with all associated countries and country codes in the ISO 3166 standardized format for every given author

Examples

These are some basic examples for every function in our package.

First, load CitationProfileR R package:

library(CitationProfileR)

In order to use the first_name_check function, a user needs to upload a csv file to their Rstudio dashboard. After the csv file has been saved locally on one's file, they can call the function successfully. We already have some example csv files in the inst folder within the test-data sub folder that a user can access.

file_path <- system.file("test-data", "test_citations_table2.csv", package = "CitationProfileR")
sample_data_frame <- read.csv(file_path)
first_name_check(sample_data_frame)

Likewise, we follow the same procedure for the get_author_infoimplementation as we did for the first_name_check function. The example csv files within our package will also work with this implementation.

file_path <- system.file("test-data", "test_citations_table2.csv", package = "CitationProfileR")
sample_data_frame <- read.csv(file_path)
get_author_info(sample_data_frame)

For the guess_genderfunction, a user needs to replace the name parameter with one of their own in " " along with a country code of their choice also in " ."

#Standardized format for any use
#guess_gender(name, countrycode)

#Example of how to call the function using a name and country of their choice. In this case, the name is Rithika and the country is the United States where the associated code is the US.
guess_gender("Rithika", "US")

The parse_pdf_refs takes in a pdf uploaded into Rstudio, and there is also an example pdf available for a user to access in order to run the function

file_path <- system.file("test-data", "Wallrich_et_al_2020.pdf", package = "CitationProfileR")
parse_pdf_refs(file_path)

The get_location() function takes in a data frame with affiliations and outputs the country names and country codes of where the affiliations are located. The function has a default affiliations column name set to "affiliation.name", but the user can set a different column name. The sample_data_frame dataframe is an example data object available in our package that the user can examine the function on.

file_path <- system.file("test-data", "test_citations_table2.csv", package = "CitationProfileR")
sample_data_frame <- read.csv(file_path)
get_location(sample_data_frame)

Data Sources

CitationProfileR source of data is any academic article in a pdf version that is uploaded to the Shiny UI by users of the package. After the pdf is uploaded, the parse_pdf_refs() function will parse the contents of the file and output a data frame with all the cited authors along with their affiliations and DOI if applicable. Then, the guess_gender() function takes in this data frame and outputs a new one including the predicted gender and probability of accuracy of every given name using the Gender-API.

Data Collection and Update Process

The data does not need to be either manually or automatically updated as the user inputs the academic article on their own.

Repo Architecture

This repository follows the standard R package structure. The R folder contains the code to the functions available in CitationProfileR separated into different R scripts. The code for the Shiny UI dashboard is in the inst folder in the repository. A user can access the final dashboard by using the link provided above or through accessing the cloned version of the repository contents on their local device.

License

How to Provide Feedback

Questions, bug reports, and feature requests can be submitted to this repo's issue queue.

Have Questions?

Contact us at l.wallrich@bbk.ac.uk or lmikhelashvili@smith.edu.

Name	Name	Last commit message	Last commit date
Latest commit rithika-d Merge branches 'main', 'main', 'main', 'main', 'main' and 'main' of h… May 4, 2023 6415071 · May 4, 2023 History 207 Commits
.github	.github	Add test coverage GH action	Apr 20, 2023
R	R	Merge branches 'main', 'main', 'main', 'main', 'main' and 'main' of h…	May 4, 2023
data	data	Encode sample data to UTF-8	May 4, 2023
inst	inst	Merge pull request #37 from LukasWallrich/get_location	May 4, 2023
man	man	Merge branch 'main' into get_location	May 4, 2023
team-resources	team-resources	Added .on_load function that will use dev tools to install the github…	Apr 26, 2023
tests	tests	Merge branch 'main' into get_location	May 4, 2023
.DS_Store	.DS_Store	Adding extra information to our first draft function and adding libra…	Mar 5, 2023
.Rbuildignore	.Rbuildignore	Accidentally reverted one more back than I should've after failed mer…	Apr 26, 2023
.gitignore	.gitignore	Merge branch 'main' into guessGender	Apr 23, 2023
CitationProfileR.Rproj	CitationProfileR.Rproj	Set up basic package infrastructure	Feb 12, 2023
CitationProfileR_logo.png	CitationProfileR_logo.png	filled out the readme file with all the information needed along with…	May 3, 2023
DESCRIPTION	DESCRIPTION	Merge branch 'main' into get_location	May 4, 2023
LICENSE	LICENSE	Set up basic package infrastructure	Feb 12, 2023
LICENSE.md	LICENSE.md	Set up basic package infrastructure	Feb 12, 2023
NAMESPACE	NAMESPACE	Merge branch 'main' into get_location	May 4, 2023
README.md	README.md	Remove the To do part from readme	May 4, 2023
api_keys	api_keys	filled out the readme file with all the information needed along with…	May 3, 2023
codecov.yml	codecov.yml	Add codecov badge	Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

CitationProfileR

About

Contributors

Definitions

How to Access

Dependencies

How to Install

Functions/Datasets Included

Examples

Data Sources

Data Collection and Update Process

Repo Architecture

License

How to Provide Feedback

Have Questions?

About

Licenses found

Releases

Packages

Contributors 5

Languages

License

LukasWallrich/citationProfileR

Folders and files

Latest commit

History

Repository files navigation

CitationProfileR

About

Contributors

Definitions

How to Access

Dependencies

How to Install

Functions/Datasets Included

Examples

Data Sources

Data Collection and Update Process

Repo Architecture

License

How to Provide Feedback

Have Questions?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages