pubMunch-docker

This respository contains a docker container for the Literature Searching pipeline for BRCA Exchange. The pipeline does the following:

Download data files
Get a list of PMIDs from PubMed for papers that mention "BRCA" in the title or abstract
Run pubMunch on the PMIDs that have not previously been processed:
- The crawler downloads HTML and PDF files for each PMID via publication APIs
- The converter scrapes the HTML and PDFS files, creating raw text
- The mutation finder uses regular expressions to find gene names (BRCA1 and BRCA2 in this case) and variant descriptions.
Get a current list of BRCA variants from BRCA Exchange via its GA4GH instance
Match variants found by pubMunch to the variants in BRCA Exchange
Outputs a JSON file with the list of papers containing each variant, with their PMID, title, abstract, and other information.

To run the pipeline, docker must be installed on your sysetem. You can build the container by running

make

The docker container can be invoked with the below command, where <username> and <password> are Synapse credentials that have write access to the BRCA Exchange Literature Searching folder

docker run quay.io/almussel/pubmunch-docker -u <username> -p <password>

In order to get full functionality of the pipeline, it should be run on a server with institutional credentials to allow access to publications. A proxy can also be passed to docker run with the option

--env http_proxy="http://user:password@host:port"

The container can also be run with Synapse credentials so the results are uploaded to the BRCA Exchange Literature Searching folder, as follows:

docker run quay.io/almussel/pubmunch-docker -u <username> -p <password>

Without these credentials, the output of the pipeline will be stored locally.

To run the pipeline on a small sample of 20 pmids, use the -t option.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockstore.cwl		Dockstore.cwl
Makefile		Makefile
README.md		README.md
download.sh		download.sh
getpubs.py		getpubs.py
pubs_json.py		pubs_json.py
wrapper.sh		wrapper.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pubMunch-docker

About

Releases

Packages

Languages

almussel/pubMunch-docker

Folders and files

Latest commit

History

Repository files navigation

pubMunch-docker

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages