All the p-values with tidypvals

The p-value is the most widely-known statistic. P-values are reported in a large majority of scientific publications that measure and report data. R.A. Fisher is widely credited with inventing the p-value. If he was cited every time a p-value was reported his paper would have, at the very least, 3 million citations* - making it the most highly cited paper of all time.

The tidypvals package organizes a large subset of these published p-values. They have been collected and synthesized from thousands of studies across multiple fields. The resulting data sets can be easily merged, combined, and analyzed.

install

This package will (hopefully) end up on Bioconductor soon, but for now you can install it with the devtools package

install.packages('devtools')
library(devtools)
devtools::install_github('jtleek/tidypvals')

description

The currently available p-value data sets in this package are:

jager2014 - This data set comes from the paper: An estimate of the science-wise false discovery rate and application to the top medical literature that first proposed p-value scraping from the medical literature for re-analysis.
brodeur2016 - This data set comes from the paper Star Wars: The empirics strike back which collected p-values from the economics literature.
head2015 - This data set comes from the paper The Extent and Consequences of P-Hacking in Science and is an extension of the jager2014 idea to a much larger collection of biological papers.
chavalarias2016 - This data set comes from the paper Evolution of Reporting P Values in the Biomedical Literature, 1990-2015 and is an extension of the jager2014 idea to a much larger collection of medical papers.
allp - merges the head2015, chavalarias2016, and brodeur2016 while removing duplicates. To see how it is created view the merging vignette.

Each data set is "tidy" data frame and has the following columns:

pvalue - The reported p-value
year - The year of the publication where the p-value appeared
journal - The journal where the publication appeared
field - The field of the paper, using the categorization in Head et al. 2015.
abstract - Whether the p-value was in the abstract of the paper
operator - Whether the p-value was reported as "lessthan", "greaterthan", or "equals".
doi - When available the digital object identifier.
pmid - The pubmed ID for the paper when available

use

Load the library and then access each data set by name.

library(tidypvals)
jager2014

Data sets can be easily merged, but be careful to avoid duplicated p-values across different data sets. You can see how each data set was obtained and tidied by viewing the corresponding vignette.

vignette("jager-2014",package="tidypvals")

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
R		R
data		data
inst/doc		inst/doc
man		man
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
tidypvals.Rproj		tidypvals.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

All the p-values with tidypvals

install

description

use

About

Releases

Packages

Contributors 2

Languages

jtleek/tidypvals

Folders and files

Latest commit

History

Repository files navigation

All the p-values with tidypvals

install

description

use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages