diff --git a/BibliographicStudies.Rmd b/BibliographicStudies.Rmd index 30fa7f9..eac1b2a 100644 --- a/BibliographicStudies.Rmd +++ b/BibliographicStudies.Rmd @@ -1,5 +1,5 @@ --- -title: "Bibliographic Studies" +title: "Bibliometric Studies" subtitle: "Reproducible Bibliometric Analysis of Pathology Articles Using PubMed, E-direct, WoS, Google Scholar" author: "Serdar Balcı, MD, Pathologist" date: '`r format(Sys.Date())`' @@ -25,6 +25,14 @@ output: toc_float: yes --- +Follow @serdarbalci +[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/sbalci/PubMed/issues) +[![Say Thanks!](https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg)](https://saythanks.io/to/sbalci) +[![HitCount](http://hits.dwyl.io/sbalci/PubMed.svg)](http://hits.dwyl.io/sbalci/PubMed) + + + # Introduction It is a very common bibliometric study type to retrospectively analyse the number of peer reviewed articles written from a country to view the amount of contribution made in a specific scientific discipline. diff --git a/BibliographicStudies.nb.html b/BibliographicStudies.nb.html index ab0371a..cbb6ef9 100644 --- a/BibliographicStudies.nb.html +++ b/BibliographicStudies.nb.html @@ -11,9 +11,9 @@ - + -Bibliographic Studies +Bibliometric Studies +

contributions welcome Say Thanks! HitCount

1 Introduction

It is a very common bibliometric study type to retrospectively analyse the number of peer reviewed articles written from a country to view the amount of contribution made in a specific scientific discipline.

@@ -2916,7 +2927,7 @@

1 Introduction


If you want to see the code used in the analysis please click the code button on the right upper corner or throughout the page.

I would like to hear your feedback: https://goo.gl/forms/YjGZ5DHgtPlR1RnB3

-

This document will be continiously updated and the last update was on 2019-04-13.

+

This document will be continiously updated and the last update was on 2019-06-02.


@@ -3012,7 +3023,7 @@

3 Sources Used For Analysis

4 Feedback

Serdar Balcı, MD, Pathologist would like to hear your feedback: https://goo.gl/forms/YjGZ5DHgtPlR1RnB3

-

This document will be continiously updated and the last update was on 2019-04-13.

+

This document will be continiously updated and the last update was on 2019-06-02.


+[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/sbalci/PubMed/issues) +[![Say Thanks!](https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg)](https://saythanks.io/to/sbalci) +[![HitCount](http://hits.dwyl.io/sbalci/PubMed.svg)](http://hits.dwyl.io/sbalci/PubMed) + diff --git a/README.md b/README.md index 16fc1fd..8d67415 100644 --- a/README.md +++ b/README.md @@ -1,28 +1,59 @@ -# Reproducible Bibliometric Analysis of Pathology Articles -### PubMed Indexed Peer Reviewed Articles in Pathology Journals: A country based comparison -It is a very common bibliometric study to retrospectively analyse the number of peer reviewed articles written from a country to view the amount of contribution made in a specific scientific discipline. + +# Reproducible Bibliometric Analysis of Pathology Articles -These studies require too much effort, since the data is generally behind paywalls and restrictions. +### PubMed Indexed Peer Reviewed Articles in Pathology Journals: A country based comparison +It is a very common bibliometric study to retrospectively analyse the +number of peer reviewed articles written from a country to view the +amount of contribution made in a specific scientific discipline. -I have previously contributed to a research to identify the Articles from Turkey Published in Pathology Journals Indexed in International Indexes; which is published here: http://www.turkjpath.org/summary_en.php3?id=1423 DOI: 10.5146/tjpath.2010.01006 +These studies require too much effort, since the data is generally +behind paywalls and restrictions. +I have previously contributed to a research to identify the Articles +from Turkey Published in Pathology Journals Indexed in International +Indexes; which is published here: + DOI: +10.5146/tjpath.2010.01006 -This study required manually investigating many excel files, which was time consuming and redoing and updating the data and results also require a similar amount of effort. +This study required manually investigating many excel files, which was +time consuming and redoing and updating the data and results also +require a similar amount of effort. +In order to automatize this analysis, I have used PubMed data from +National Library of Medicine (). +This collection has the most comprehensive information about peer +reviewed articles in medicine. It also has an API +(), and R packages are available for +getting and fetching data from the server. -In order to automatize this analysis, I have used PubMed data from National Library of Medicine (https://www.ncbi.nlm.nih.gov/pubmed/). This collection has the most comprehensive information about peer reviewed articles in medicine. It also has an API (https://dataguide.nlm.nih.gov/), and R packages are available for getting and fetching data from the server. +Pathology Journal ISSN List data was retrieved from “in cites +Clarivate”, and Journal Data Filtered as follows: JCR Year: 2016 +Selected Editions: SCIE,SSCI Selected Categories: ‘PATHOLOGY’ Selected +Category Scheme: WoS +Using these data I would like to make reproducible reports and shiny +apps, not only on pathology field but also in other areas of medicine. +This will be very useful to compare disciplines and different nations. -Pathology Journal ISSN List data was retrieved from "in cites Clarivate", and Journal Data Filtered as follows: JCR Year: 2016 Selected Editions: SCIE,SSCI Selected Categories: 'PATHOLOGY' Selected Category Scheme: WoS +----- +For updated analysis see: + -Using these data I would like to make reproducible reports and shiny apps, not only on pathology field but also in other areas of medicine. This will be very useful to compare disciplines and different nations. +I would like to hear your feedback: + ---- + -For updated analysis see: https://sbalci.github.io/pubmed/BibliographicStudies.html + -I would like to hear your feedback: https://goo.gl/forms/YjGZ5DHgtPlR1RnB3 \ No newline at end of file +[![contributions +welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/sbalci/PubMed/issues) +[![Say +Thanks\!](https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg)](https://saythanks.io/to/sbalci) +[![HitCount](http://hits.dwyl.io/sbalci/PubMed.svg)](http://hits.dwyl.io/sbalci/PubMed) diff --git a/Sources.Rmd b/Sources.Rmd index 5896046..b01285b 100644 --- a/Sources.Rmd +++ b/Sources.Rmd @@ -183,6 +183,10 @@ https://www.sciencemag.org/news/2017/05/vast-set-public-cvs-reveals-world-s-most ## Semantic Scholar +https://www.semanticscholar.org/ + + +### Semantic Scholar Open Research Corpus ``` Semantic Scholar Open Research Corpus @@ -197,6 +201,26 @@ caffeinate ``` +### AllenNLP + +https://allennlp.org/ + + +### An open-source NLP research library, built on PyTorch. + +http://www.allennlp.org/ + +https://github.com/allenai/allennlp + + +### citeomatic + +https://allenai.org/semantic-scholar/citeomatic/ + + + + + ## Microsoft Academic https://labs.cognitive.microsoft.com/en-us/project-academic-knowledge @@ -292,6 +316,27 @@ https://www.lens.org/lens/scholar/search/results?q=(author.affiliation.name:%20% https://www.lens.org/lens/scholar/search/analysis?q=(author.affiliation.name:%20%22neurosurgery%22%20%20OR%20author.affiliation.name:%20%22sport%22%20)%20AND%20(abstract:%20%22sport%22%20%20OR%20title:%20%22sport%22)&page=0&limit=10&orderBy=%2Bscore&dateFilterField=year_published&preview=false®exEnabled=false +``` +my_data_frame <- readr::read_delim("~/downloads/pubmed_result.txt", delim = "\t", col_names = FALSE) + +chunk <- 5000 +mylist <- split(my_data_frame, rep(1:ceiling(nrow(my_data_frame)/chunk), each=chunk, length.out=nrow(my_data_frame))) + +X1 <- mylist$`1` +X2 <- mylist$`2` +X3 <- mylist$`3` +X4 <- mylist$`4` + +readr::write_csv(X1, "~/downloads/1.txt") +readr::write_csv(X2, "~/downloads/2.txt") +readr::write_csv(X3, "~/downloads/3.txt") +readr::write_csv(X4, "~/downloads/4.txt") + +``` + + + + #### PatCite @@ -346,10 +391,62 @@ https://europepmc.org/downloads https://elixir-europe.org/platforms/data/core-data-resources +--- + +## scite + + +https://scite.ai/ + + +--- + +## TÜBİTAK Destekli Projeler Veri Tabanı + +Ülkemizdeki araştırma altyapısına katkı sağlamak amacıyla, Araştırma Destek Programları Başkanlığı (ARDEB) bünyesinde, 1965 yılından günümüze kadar sonuçlanmış olan 17.808 adet projenin sonuç raporunun tam metinleri, TÜBİTAK Ulusal Akademik Ağ ve Bilgi Merkezi (ULAKBİM) “TÜBİTAK Destekli Projeler Veri Tabanı”nda yayımlanmaktadır. + + +Söz konusu veri tabanına https://trdizin.gov.tr/search/projectSearch.xhtml linkinden erişim sağlanabilmekte ve sonuç raporlarına ilişkin proje no, başlık, yürütücü/araştırmacı/danışman adı, yıl ve anahtar kelime bazında tarama yapılabilmektedir. + + +## TRDizin + +https://trdizin.gov.tr/ + + --- # Software + +## R-project + +https://github.com/schochastics/graphlayouts + + +### rentrez + +``` +https://github.com/ropensci/rentrez/issues/134#event-2313355730 + +library(rentrez) +library(XML) + +MeSH_from_pmid <- function(pmid){ + rec <- entrez_fetch(db="pubmed", id=pmid, rettype = "xml", parsed=TRUE) + m_names <- xpathSApply(rec, "//MeshHeadingList/MeshHeading/DescriptorName", xmlValue) + m_ui <- xpathSApply(eg_rec, "//MeshHeadingList/MeshHeading/DescriptorName", xmlAttrs)[1,] + data.frame(mesh_ui = m_ui, descriptor = m_names) +} + +MeSH_from_pmid(27591765) + + + +``` + + + ## CiteSpace - CiteSpace Tutorial diff --git a/Sources.nb.html b/Sources.nb.html index a835cb3..9befe8c 100644 --- a/Sources.nb.html +++ b/Sources.nb.html @@ -11,7 +11,7 @@ - + Bibliographic Studies @@ -1340,8 +1340,6 @@ margin-left: 2%; position: fixed; border: 1px solid #ccc; -webkit-border-radius: 6px; -moz-border-radius: 6px; border-radius: 6px; } @@ -1369,10 +1367,15 @@ .tocify-subheader .tocify-subheader { text-indent: 30px; } - .tocify-subheader .tocify-subheader .tocify-subheader { text-indent: 40px; } +.tocify-subheader .tocify-subheader .tocify-subheader .tocify-subheader { +text-indent: 50px; +} +.tocify-subheader .tocify-subheader .tocify-subheader .tocify-subheader .tocify-subheader { +text-indent: 60px; +} .tocify .tocify-item > a, .tocify .nav-list .nav-header { margin: 0px; @@ -1775,13 +1778,13 @@ item.append($("", { - "text": self.text() + "html": self.html() })); } else { - item.text(self.text()); + item.html(self.html()); } @@ -2851,8 +2854,6 @@ .tocify-subheader .tocify-item { font-size: 0.90em; - padding-left: 25px; - text-indent: 0; } .tocify .list-group-item { @@ -2901,7 +2902,7 @@

Bibliographic Studies

Sources Used For Analysis

Serdar Balcı, MD, Pathologist

-

2019-04-13

+

2019-05-27

@@ -2998,6 +2999,9 @@

2.2 ORCID

2.3 Semantic Scholar

+

https://www.semanticscholar.org/

+
+

2.3.1 Semantic Scholar Open Research Corpus

Semantic Scholar Open Research Corpus
 
 https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/index.html
@@ -3009,6 +3013,20 @@ 

2.3 Semantic Scholar

caffeinate
+
+

2.3.2 AllenNLP

+

https://allennlp.org/

+
+
+

2.3.3 An open-source NLP research library, built on PyTorch.

+

http://www.allennlp.org/

+

https://github.com/allenai/allennlp

+
+
+

2.3.4 citeomatic

+

https://allenai.org/semantic-scholar/citeomatic/

+
+

2.4 Microsoft Academic

https://labs.cognitive.microsoft.com/en-us/project-academic-knowledge

@@ -3080,6 +3098,21 @@

2.14 lens.org

https://www.lens.org/lens/scholar/search/results?dateFilterField=year_published&filterMap=%7B%7D&orcids=0000-0002-7852-3851&orderBy=%2Bpublished&preview=true&previewType=SCHOLAR_ANALYSIS&regexEnabled=false

https://www.lens.org/lens/scholar/search/results?q=(author.affiliation.name:%20%22pathology%22%20%20OR%20author.affiliation.name:%20%22patoloji%22)%20%20AND%20(author.affiliation.name:%20%22Turkey%22%20%20OR%20author.affiliation.name:%20%22T%C3%BCrkiye%22)

https://www.lens.org/lens/scholar/search/analysis?q=(author.affiliation.name:%20%22neurosurgery%22%20%20OR%20author.affiliation.name:%20%22sport%22%20)%20AND%20(abstract:%20%22sport%22%20%20OR%20title:%20%22sport%22)&page=0&limit=10&orderBy=%2Bscore&dateFilterField=year_published&preview=false&regexEnabled=false

+
my_data_frame <- readr::read_delim("~/downloads/pubmed_result.txt", delim = "\t", col_names = FALSE)
+
+chunk <- 5000
+mylist <-  split(my_data_frame, rep(1:ceiling(nrow(my_data_frame)/chunk), each=chunk, length.out=nrow(my_data_frame)))
+
+X1 <- mylist$`1`
+X2 <- mylist$`2`
+X3 <- mylist$`3`
+X4 <- mylist$`4`
+
+readr::write_csv(X1, "~/downloads/1.txt")
+readr::write_csv(X2, "~/downloads/2.txt")
+readr::write_csv(X3, "~/downloads/3.txt")
+readr::write_csv(X4, "~/downloads/4.txt")
+

2.14.0.1 PatCite

@@ -3131,11 +3164,49 @@

2.17 ELIXIR Core Data Resourceshttps://elixir-europe.org/platforms/data/core-data-resources


+
+

2.18 scite

+

https://scite.ai/

+
+
+
+

2.19 TÜBİTAK Destekli Projeler Veri Tabanı

+

Ülkemizdeki araştırma altyapısına katkı sağlamak amacıyla, Araştırma Destek Programları Başkanlığı (ARDEB) bünyesinde, 1965 yılından günümüze kadar sonuçlanmış olan 17.808 adet projenin sonuç raporunun tam metinleri, TÜBİTAK Ulusal Akademik Ağ ve Bilgi Merkezi (ULAKBİM) “TÜBİTAK Destekli Projeler Veri Tabanı”nda yayımlanmaktadır.

+

Söz konusu veri tabanına https://trdizin.gov.tr/search/projectSearch.xhtml linkinden erişim sağlanabilmekte ve sonuç raporlarına ilişkin proje no, başlık, yürütücü/araştırmacı/danışman adı, yıl ve anahtar kelime bazında tarama yapılabilmektedir.

+
+
+

2.20 TRDizin

+

https://trdizin.gov.tr/

+
+

3 Software

+
+

3.1 R-project

+

https://github.com/schochastics/graphlayouts

+
+

3.1.1 rentrez

+
https://github.com/ropensci/rentrez/issues/134#event-2313355730
+
+library(rentrez)
+library(XML)
+
+MeSH_from_pmid <- function(pmid){
+   rec <- entrez_fetch(db="pubmed", id=pmid, rettype = "xml", parsed=TRUE)
+   m_names <- xpathSApply(rec, "//MeshHeadingList/MeshHeading/DescriptorName", xmlValue)
+   m_ui <- xpathSApply(eg_rec, "//MeshHeadingList/MeshHeading/DescriptorName", xmlAttrs)[1,]
+   data.frame(mesh_ui = m_ui, descriptor = m_names)
+}
+
+MeSH_from_pmid(27591765)
+
+
+
+
+
-

3.1 CiteSpace

+

3.2 CiteSpace

@@ -3207,7 +3278,7 @@

6 Organisations of Bibliometrics<

7 Feedback

Serdar Balcı, MD, Pathologist would like to hear your feedback: https://goo.gl/forms/YjGZ5DHgtPlR1RnB3

-

This document will be continiously updated and the last update was on 2019-04-13.

+

This document will be continiously updated and the last update was on 2019-05-27.


-
- + +
endometriosis_articlesPerTotalArticles <- europepmc::epmc_hits_trend(query = "endometriosis", period = 1980:2018)
+
+endometriosis_articlesPerTotalArticles
+
+# View(endometriosis_articlesPerTotalArticles)
+
+xlsx::write.xlsx(endometriosis_articlesPerTotalArticles, here::here("data/endometriosis_articlesPerTotalArticles.xlsx")
+)
+
+ + + + + + +
library(ggplot2)
+ggplot(endometriosis_articlesPerTotalArticles, aes(year, query_hits / all_hits)) + 
+  geom_point() + 
+  geom_line() +
+  xlab("Year published") + 
+  ylab("Proportion of Endometriois \n articles in Europe PMC")
+ + + +
+
("endometriosis" AND "inflammation") AND (SRC:"MED")
+ + + +
dvcs <- c('("endometriosis" AND "inflammation") AND (SRC:"MED")', '("endometriosis" AND "infertility") AND (SRC:"MED")', '("endometriosis" AND "fertility") AND (SRC:"MED")' , '("endometriosis") AND (SRC:"MED")'
+          )
+ + + + + + +
my_df <- purrr::map_df(dvcs, function(x) {
+  # get number of publications with indexed reference lists
+  refs_hits <- 
+    europepmc::epmc_hits_trend(x, period = 1980:2018, synonym = FALSE)$query_hits
+  # get hit count querying for code repositories 
+  europepmc::epmc_hits_trend(x, period = 1980:2018, synonym = FALSE) %>% 
+    dplyr::mutate(query_id = x) %>%
+    dplyr::mutate(refs_hits = refs_hits) %>%
+    dplyr::select(year, all_hits, refs_hits, query_hits, query_id)
+}) 
+my_df
+ + + + + + +
## Recoding my_df$query_id into my_df$Query
+my_df$Query <- recode(my_df$query_id,
+               "(\"endometriosis\" AND \"inflammation\") AND (SRC:\"MED\")" = "endometriosis AND inflammation",
+               "(\"endometriosis\" AND \"infertility\") AND (SRC:\"MED\")" = "endometriosis AND infertility",
+               "(\"endometriosis\" AND \"fertility\") AND (SRC:\"MED\")" = "endometriosis AND fertility",
+               "(\"endometriosis\") AND (SRC:\"MED\")" = "endometriosis")
+my_df$Query <- factor(my_df$Query)
+
+ + + + + + +
library(ggplot2)
+ggplot(my_df, aes(x = year, 
+                  y = query_hits / all_hits,
+                  group = Query, 
+                  color = Query)) + 
+  geom_point() + 
+  geom_line() +
+  xlab("Year published") + 
+  ylab("Proportion of articles in PubMed \n Data from: Europe PMC") +
+  theme(legend.position = "bottom",
+        legend.direction = "vertical")
+ + + + + + +
library(ggplot2)
+ggplot(my_df, aes(x = year, 
+                  y = scales::percent(query_hits / all_hits, accuracy = 0.02),
+                  group = Query, 
+                  color = Query)) + 
+  geom_point() + 
+  geom_line() +
+  xlab("Year published") + 
+  ylab("Proportion of articles in PubMed \n Data from: Europe PMC") +
+  theme(legend.position = "bottom",
+        legend.direction = "vertical")
+ + + + + + +
library(ggplot2)
+ggplot(my_df, aes(factor(year), query_hits / refs_hits, group = query_id, 
+                  color = query_id)) +
+  geom_line(size = 1, alpha = 0.8) +
+  geom_point(size = 2) +
+  scale_color_brewer(name = "Query", palette = "Set1")+
+  xlab("Year published") +
+  ylab("Proportion of articles in PubMed \n Data from: Europe PMC")
+ + + +
+ + + +
library("handlr")
+deneme <- handlr::bibtex_reader("data/europepmc_endometriosisinflammation.bib")
+
+# handlr::citeproc_writer(deneme)
+
+# handlr::codemeta_writer(deneme)
+
+
+jsonlite::write_json(handlr::codemeta_writer(deneme, pretty = FALSE), path = "data/europepmc_endometriosisinflammation.json")
+
+ + + + + + +
z <- system.file("data/europepmc_endometriosisinflammation.bib", package = "handlr")
+x <- HandlrClient$new(x = z)
+x$read("bibtex")
+x$write("citeproc")
+ + + +
+ + + +
endometriosis_articles1 <- europepmc::epmc_hits_trend(query = "endometriosis AND fertility", period = 1980:2018)
+
+endometriosis_articles1
+
+# View(endometriosis_articlesPerTotalArticles)
+
+xlsx::write.xlsx(endometriosis_articlesPerTotalArticles, here::here("data/endometriosis_articlesPerTotalArticles.xlsx")
+)
+
+ @@ -1790,26 +1938,261 @@

endometriosis

xlab("Year published") + ylab("Proportion of Endometriois \n articles in Europe PMC") - -

- +

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5031306/

knitcitations::citep("10.1186/s12905-016-0336-0")
- -
[1] "(Brüggmann, Elizabeth-Martinez, Klingelhöfer, Quarcoo, Jaque, and Groneberg, 2016)"
-

vosviewer

+
+
+

Bibliometrix Package Analysis Last 10 year Endometriosis

+
+

Search PubMed

+
"endometriosis"[MeSH Major Topic] AND "english and humans"[Filter] AND ("2009/03/10"[PDat] : "2019/03/07"[PDat])
+ + + + + +
myTerm <- rstudioapi::terminalCreate(show = FALSE)
+rstudioapi::terminalSend(
+myTerm,
+"xtract -input data/Last10YearEndometriosis.xml -pattern PubmedArticle -tab \"|\" -sep \";\" -def \"NA\" -element MedlineCitation/PMID -block ArticleId -if ArticleId@IdType -equals doi -element ArticleId > data/Last10YearEndometriosis.csv \n"
+)
+Sys.sleep(1)
+repeat {
+Sys.sleep(0.1)
+if (rstudioapi::terminalBusy(myTerm) == FALSE) {
+print("Code Executed")
+break
+}
+}
+ + +
[1] "Code Executed"
+ + + + + + +
library(readr)
+Last10YearEndometriosis <- read_delim(here::here("data/Last10YearEndometriosis.csv"), 
+    "|",
+    escape_double = FALSE,
+    col_names = FALSE, 
+    trim_ws = TRUE)
+ + +
Parsed with column specification:
+cols(
+  X1 = col_double(),
+  X2 = col_character()
+)
+538 parsing failures.
+row col  expected    actual                                                                         file
+ 19  -- 2 columns 1 columns '/Users/serdarbalciold/RepTemplates/pubmed/data/Last10YearEndometriosis.csv'
+165  -- 2 columns 1 columns '/Users/serdarbalciold/RepTemplates/pubmed/data/Last10YearEndometriosis.csv'
+218  -- 2 columns 1 columns '/Users/serdarbalciold/RepTemplates/pubmed/data/Last10YearEndometriosis.csv'
+262  -- 2 columns 1 columns '/Users/serdarbalciold/RepTemplates/pubmed/data/Last10YearEndometriosis.csv'
+320  -- 2 columns 1 columns '/Users/serdarbalciold/RepTemplates/pubmed/data/Last10YearEndometriosis.csv'
+... ... ......... ......... ............................................................................
+See problems(...) for more details.
+ + +
# View(Last10YearEndometriosis)
+
+names(Last10YearEndometriosis) <- c("PMID", "DOI")
+
+ + + + + + +

+PMID_List <- paste0("PMID=(", Last10YearEndometriosis$PMID[!is.na(Last10YearEndometriosis$PMID)], ") OR")
+# DOI_List <- paste0("DO=(", Last10YearEndometriosis$DOI[!is.na(Last10YearEndometriosis$DOI)], ") OR")
+
+
+write(PMID_List,
+      here::here("data/endometriosis/Last10YearEndometriosis_pmid_ListforWOS.txt")
+)
+
+# write(DOI_List,
+#       here::here("data/NeurosurgeryFromTurkey_doi_ListforWOS.txt")
+# )
+
+ + + + + + +
library(tidyverse)
+ + +
── Attaching packages ──────────────────────────────────────── tidyverse 1.2.1 ──
+✔ ggplot2 3.1.0       ✔ purrr   0.3.1  
+✔ tibble  2.0.1       ✔ dplyr   0.8.0.1
+✔ tidyr   0.8.3       ✔ stringr 1.4.0  
+✔ readr   1.3.1       ✔ forcats 0.4.0  
+── Conflicts ─────────────────────────────────────────── tidyverse_conflicts() ──
+✖ dplyr::filter() masks stats::filter()
+✖ dplyr::lag()    masks stats::lag()
+ + +
library(bibliometrix)
+ + +
To cite bibliometrix in publications, please use:
+
+Aria, M. & Cuccurullo, C. (2017) bibliometrix: An R-tool for comprehensive science mapping analysis, Journal of Informetrics, 11(4), pp 959-975, Elsevier.
+                        
+
+http:\\www.bibliometrix.org
+
+                        
+To start with the shiny web-interface, please digit:
+biblioshiny()
+ + +
bibliometrix::biblioshiny()
+ + +
Loading required package: shiny
+
+Listening on http://127.0.0.1:7866
+Loading required package: rio
+Loading required package: DT
+
+Attaching package: ‘DT’
+
+The following objects are masked from ‘package:shiny’:
+
+    dataTableOutput, renderDataTable
+
+Loading required package: shinycssloaders
+Loading required package: shinythemes
+Loading required package: wordcloud2
+Loading required package: colourpicker
+
+Attaching package: ‘colourpicker’
+
+The following object is masked from ‘package:shiny’:
+
+    runExample
+
+Loading required package: treemap
+Loading required package: ggmap
+Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
+Please cite ggmap if you use it! See citation("ggmap") for details.
+Loading required package: visNetwork
+Loading required package: plotly
+
+Attaching package: ‘plotly’
+
+The following object is masked from ‘package:ggmap’:
+
+    wind
+
+The following object is masked from ‘package:rio’:
+
+    export
+
+The following object is masked from ‘package:ggplot2’:
+
+    last_plot
+
+The following object is masked from ‘package:stats’:
+
+    filter
+
+The following object is masked from ‘package:graphics’:
+
+    layout
+
+
+Attaching package: ‘Matrix’
+
+The following object is masked from ‘package:tidyr’:
+
+    expand
+ + +

+Converting your isi collection into a bibliographic dataframe
+
+Articles extracted   100 
+Articles extracted   200 
+Articles extracted   300 
+Articles extracted   400 
+Articles extracted   500 
+Articles extracted   600 
+Articles extracted   700 
+Articles extracted   800 
+Articles extracted   900 
+Articles extracted   1000 
+Articles extracted   1100 
+Articles extracted   1200 
+Articles extracted   1300 
+Articles extracted   1400 
+Articles extracted   1500 
+Articles extracted   1600 
+Articles extracted   1700 
+Articles extracted   1800 
+Articles extracted   1900 
+Articles extracted   2000 
+Articles extracted   2100 
+Articles extracted   2200 
+Articles extracted   2300 
+Articles extracted   2400 
+Articles extracted   2500 
+Articles extracted   2600 
+Articles extracted   2700 
+Articles extracted   2800 
+Articles extracted   2900 
+Articles extracted   3000 
+Articles extracted   3100 
+Articles extracted   3200 
+Articles extracted   3300 
+Articles extracted   3400 
+Articles extracted   3500 
+Articles extracted   3600 
+Articles extracted   3700 
+Articles extracted   3800 
+Articles extracted   3900 
+Articles extracted   4000 
+Articles extracted   4100 
+Articles extracted   4200 
+Articles extracted   4300 
+Articles extracted   4400 
+Articles extracted   4500 
+Articles extracted   4600 
+Articles extracted   4700 
+Articles extracted   4800 
+Articles extracted   4900 
+Articles extracted   5000 
+Articles extracted   5061 
+Done!
+
+
+Generating affiliation field tag AU_UN from C1:  Done!
+ + + +
+
-
LS0tCnRpdGxlOiAiZW5kb21ldHJpb3NpcyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKYGBge3J9CmxpYnJhcnkoZXVyb3BlcG1jKQpgYGAKCgpgYGB7cn0KZW5kb21ldHJpb3Npc19hcnRpY2xlc1BlclRvdGFsQXJ0aWNsZXMgPC0gZXVyb3BlcG1jOjplcG1jX2hpdHNfdHJlbmQocXVlcnkgPSAiZW5kb21ldHJpb3NpcyIsIHBlcmlvZCA9IDE5ODA6MjAxNykKCmVuZG9tZXRyaW9zaXNfYXJ0aWNsZXNQZXJUb3RhbEFydGljbGVzClZpZXcoZW5kb21ldHJpb3Npc19hcnRpY2xlc1BlclRvdGFsQXJ0aWNsZXMpCmBgYAoKCmBgYHtyfQpsaWJyYXJ5KGdncGxvdDIpCmdncGxvdChlbmRvbWV0cmlvc2lzX2FydGljbGVzUGVyVG90YWxBcnRpY2xlcywgYWVzKHllYXIsIHF1ZXJ5X2hpdHMgLyBhbGxfaGl0cykpICsgCiAgZ2VvbV9wb2ludCgpICsgCiAgZ2VvbV9saW5lKCkgKwogIHhsYWIoIlllYXIgcHVibGlzaGVkIikgKyAKICB5bGFiKCJQcm9wb3J0aW9uIG9mIEVuZG9tZXRyaW9pcyBcbiBhcnRpY2xlcyBpbiBFdXJvcGUgUE1DIikKYGBgCgoKCgpodHRwczovL3d3dy5uY2JpLm5sbS5uaWguZ292L3BtYy9hcnRpY2xlcy9QTUM1MDMxMzA2LwoKCmBgYHtyfQprbml0Y2l0YXRpb25zOjpjaXRlcCgiMTAuMTE4Ni9zMTI5MDUtMDE2LTAzMzYtMCIpCmBgYAoKCgp2b3N2aWV3ZXIKCgo=
+
---
title: "endometriosis"
output: html_notebook
---

```{r}
library(europepmc)
```


```{r}
endometriosis_articlesPerTotalArticles <- europepmc::epmc_hits_trend(query = "endometriosis", period = 1980:2018)

endometriosis_articlesPerTotalArticles

# View(endometriosis_articlesPerTotalArticles)

xlsx::write.xlsx(endometriosis_articlesPerTotalArticles, here::here("data/endometriosis_articlesPerTotalArticles.xlsx")
)

```


```{r}
library(ggplot2)
ggplot(endometriosis_articlesPerTotalArticles, aes(year, query_hits / all_hits)) + 
  geom_point() + 
  geom_line() +
  xlab("Year published") + 
  ylab("Proportion of Endometriois \n articles in Europe PMC")
```

---


```
("endometriosis" AND "inflammation") AND (SRC:"MED")
```


```{r}
dvcs <- c('("endometriosis" AND "inflammation") AND (SRC:"MED")', '("endometriosis" AND "infertility") AND (SRC:"MED")', '("endometriosis" AND "fertility") AND (SRC:"MED")' , '("endometriosis") AND (SRC:"MED")'
          )
```


```{r}
my_df <- purrr::map_df(dvcs, function(x) {
  # get number of publications with indexed reference lists
  refs_hits <- 
    europepmc::epmc_hits_trend(x, period = 1980:2018, synonym = FALSE)$query_hits
  # get hit count querying for code repositories 
  europepmc::epmc_hits_trend(x, period = 1980:2018, synonym = FALSE) %>% 
    dplyr::mutate(query_id = x) %>%
    dplyr::mutate(refs_hits = refs_hits) %>%
    dplyr::select(year, all_hits, refs_hits, query_hits, query_id)
}) 
my_df
```


```{r}
## Recoding my_df$query_id into my_df$Query
my_df$Query <- recode(my_df$query_id,
               "(\"endometriosis\" AND \"inflammation\") AND (SRC:\"MED\")" = "endometriosis AND inflammation",
               "(\"endometriosis\" AND \"infertility\") AND (SRC:\"MED\")" = "endometriosis AND infertility",
               "(\"endometriosis\" AND \"fertility\") AND (SRC:\"MED\")" = "endometriosis AND fertility",
               "(\"endometriosis\") AND (SRC:\"MED\")" = "endometriosis")
my_df$Query <- factor(my_df$Query)

```

```{r}
library(ggplot2)
ggplot(my_df, aes(x = year, 
                  y = query_hits / all_hits,
                  group = Query, 
                  color = Query)) + 
  geom_point() + 
  geom_line() +
  xlab("Year published") + 
  ylab("Proportion of articles in PubMed \n Data from: Europe PMC") +
  theme(legend.position = "bottom",
        legend.direction = "vertical")
```

```{r}
library(ggplot2)
ggplot(my_df, aes(x = year, 
                  y = scales::percent(query_hits / all_hits, accuracy = 0.02),
                  group = Query, 
                  color = Query)) + 
  geom_point() + 
  geom_line() +
  xlab("Year published") + 
  ylab("Proportion of articles in PubMed \n Data from: Europe PMC") +
  theme(legend.position = "bottom",
        legend.direction = "vertical")
```



```{r}
library(ggplot2)
ggplot(my_df, aes(factor(year), query_hits / refs_hits, group = query_id, 
                  color = query_id)) +
  geom_line(size = 1, alpha = 0.8) +
  geom_point(size = 2) +
  scale_color_brewer(name = "Query", palette = "Set1")+
  xlab("Year published") +
  ylab("Proportion of articles in PubMed \n Data from: Europe PMC")
```

























---



```{r}
library("handlr")
deneme <- handlr::bibtex_reader("data/europepmc_endometriosisinflammation.bib")

# handlr::citeproc_writer(deneme)

# handlr::codemeta_writer(deneme)


jsonlite::write_json(handlr::codemeta_writer(deneme, pretty = FALSE), path = "data/europepmc_endometriosisinflammation.json")

```


```{r}
z <- system.file("data/europepmc_endometriosisinflammation.bib", package = "handlr")
x <- HandlrClient$new(x = z)
x$read("bibtex")
x$write("citeproc")
```











---

```{r}
endometriosis_articles1 <- europepmc::epmc_hits_trend(query = "endometriosis AND fertility", period = 1980:2018)

endometriosis_articles1

# View(endometriosis_articlesPerTotalArticles)

xlsx::write.xlsx(endometriosis_articlesPerTotalArticles, here::here("data/endometriosis_articlesPerTotalArticles.xlsx")
)

```







```{r}
library(ggplot2)
ggplot(endometriosis_articlesPerTotalArticles, aes(year, query_hits / all_hits)) + 
  geom_point() + 
  geom_line() +
  xlab("Year published") + 
  ylab("Proportion of Endometriois \n articles in Europe PMC")
```





---

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5031306/


```{r}
knitcitations::citep("10.1186/s12905-016-0336-0")
```



vosviewer

---

# Bibliometrix Package Analysis Last 10 year Endometriosis

## Search PubMed

```
"endometriosis"[MeSH Major Topic] AND "english and humans"[Filter] AND ("2009/03/10"[PDat] : "2019/03/07"[PDat])
```


```{r Search PubMed download xml Last 10 year Endometriosis, eval=FALSE, include=FALSE}
myTerm <- rstudioapi::terminalCreate(show = FALSE)
rstudioapi::terminalSend(
    myTerm,
    "esearch -db pubmed -query \"endometriosis[MeSH Major Topic] AND english and humans[Filter] AND (2009/03/10[PDat] : 2019/03/07[PDat]) \" -datetype PDAT -mindate 1800 -maxdate 3000 | \ efetch -format xml > data/endometriosis/Last10YearEndometriosis.xml \n"
)
Sys.sleep(1)
repeat {
    Sys.sleep(0.1)
    if (rstudioapi::terminalBusy(myTerm) == FALSE) {
        print("Code Executed")
        break
    }
}

file.info(here::here("data/endometriosis/Last10YearEndometriosis.xml"))$ctime
```



```{r extract pmid doi from xml Last 10 year Endometriosis, message=FALSE, warning=FALSE}
myTerm <- rstudioapi::terminalCreate(show = FALSE)
rstudioapi::terminalSend(
myTerm,
"xtract -input data/endometriosis/Last10YearEndometriosis.xml -pattern PubmedArticle -tab \"|\" -sep \";\" -def \"NA\" -element MedlineCitation/PMID -block ArticleId -if ArticleId@IdType -equals doi -element ArticleId > data/endometriosis/Last10YearEndometriosis.csv \n"
)
Sys.sleep(1)
repeat {
Sys.sleep(0.1)
if (rstudioapi::terminalBusy(myTerm) == FALSE) {
print("Code Executed")
break
}
}
```


```{r read extracted data Last 10 year Endometriosis}
library(readr)
Last10YearEndometriosis <- read_delim(here::here("data/endometriosis/Last10YearEndometriosis.csv"), 
    "|",
    escape_double = FALSE,
    col_names = FALSE, 
    trim_ws = TRUE)
# View(Last10YearEndometriosis)

names(Last10YearEndometriosis) <- c("PMID", "DOI")

```


```{r WOS search file write PMID DOI with OR Last 10 year Endometriosis}

PMID_List <- paste0("PMID=(", Last10YearEndometriosis$PMID[!is.na(Last10YearEndometriosis$PMID)], ") OR")
# DOI_List <- paste0("DO=(", Last10YearEndometriosis$DOI[!is.na(Last10YearEndometriosis$DOI)], ") OR")


write(PMID_List,
      here::here("data/endometriosis/Last10YearEndometriosis_pmid_ListforWOS.txt")
)

# write(DOI_List,
#       here::here("data/NeurosurgeryFromTurkey_doi_ListforWOS.txt")
# )

```




```{r}
library(tidyverse)
library(bibliometrix)
bibliometrix::biblioshiny()
```









diff --git a/pubmed_authors_to_csv.R b/pubmed_authors_to_csv.R new file mode 100644 index 0000000..b404437 --- /dev/null +++ b/pubmed_authors_to_csv.R @@ -0,0 +1,37 @@ +# https://github.com/SurgicalInformatics/pubmed_xml_to_csv + +library(easyPubMed) +library(dplyr) +library(stringr) +library(xml2) + +# these 3 lines from the easyPubMed vignette +myid = get_pubmed_ids("27145169") +fetch_pubmed_data(myid, format = "xml") %>% +write_xml("data/gs1_pubmed.xml") +investigators = unlist(xpathApply(gs1_pubmed, "//Investigator", saveXML)) + +mydata = data_frame(pubmed = investigators) + +# and now some library(stringr) magic +# str_match returns 2 columns - the complete match and the capture group +# we only want the capturing group, hence the [, 2] + +authors = mydata %>% + mutate( + lastname = str_match(pubmed, "(.*?)")[,2], + initials = str_match(pubmed, "(.*?)")[,2] + ) %>% + select(-pubmed) + +# Excel has a bug when displaying utf8 coded CSVs so need to use openxlsx instead +# reader::write_csv(authors, "gs1_pubmed_authors.csv") + +openxlsx::write.xlsx(authors, file = "gs1_pubmed_authors.xlsx") + +# dups = authors %>% +# filter(duplicated(authors)) +# +# openxlsx::write.xlsx(dups, file = "gs1_pubmed_duplicated.xlsx") + + diff --git a/regex-string-similarity.R b/regex-string-similarity.R new file mode 100644 index 0000000..3cb6ea5 --- /dev/null +++ b/regex-string-similarity.R @@ -0,0 +1,50 @@ +PMID_26832882 <- RefManageR::ReadPubMed('26832882', database = 'PubMed') + +PMID_26362048 <- RefManageR::ReadPubMed('26362048', database = 'PubMed') + + +PMID_26832882$abstract +PMID_26832882$title +PMID_26832882$author + +PMID_26362048$abstract +PMID_26362048$title +PMID_26362048$author + + +stringdist::stringsim(PMID_26832882$abstract, PMID_26362048$abstract) + +stringdist::stringsim(PMID_26832882$title, PMID_26362048$title) + + +a <- as.vector(unlist(PMID_26832882$author)) + +a <- paste0( + gsub(pattern = "\\W*\\b\\w\\b\\W*", replacement = "", x = a), collapse = " " +) + + +b <- as.vector(unlist(PMID_26362048$author)) + +b <- paste0( + gsub(pattern = "\\W*\\b\\w\\b\\W*", replacement = "", x = b), collapse = " " +) + +stringdist::stringsim(a, b) + +c <- "Calculation of the Ki67 index in pancreatic neuroendocrine tumors: a comparative analysis of four counting methodologies" + +d <- "a comparative analysis of four counting Calculation of the in pancreatic neuroendocrine tumors" + + +stringmethods <- c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex") + +for (i in stringmethods) { + z <- stringdist::stringsim(c, d, method = i) + print(z) + } + + + + + diff --git a/scite_ai.Rmd b/scite_ai.Rmd new file mode 100644 index 0000000..4001d4a --- /dev/null +++ b/scite_ai.Rmd @@ -0,0 +1,68 @@ +--- +title: "scite.ai" +output: + html_document: + df_print: paged +--- + + +```{r load library} +library("tidyverse") +library("rjson") +library("magicfor") +``` + + +```{r read DOI} +myDOI <- readr::read_csv(here::here("data/BalciSdoi.txt"), col_names = "DOI", col_types = "c") +``` + + +```{r add api code} +myDOI <- myDOI %>% + mutate(apitallies = paste0("https://api.scite.ai/tallies/", DOI)) %>% + mutate(apipapers = paste0("https://api.scite.ai/papers/", DOI)) %>% + mutate(reportpages = paste0("https://scite.ai/reports/", DOI)) %>% + rownames_to_column() +``` + + +```{r get json data} +magicfor::magic_for(silent = TRUE) +json_data <- for (i in 1:(dim(myDOI)[1]-1)) { + json_name <- paste0("Article", myDOI$rowname[i]) + json_data <- rjson::fromJSON(file = myDOI$apitallies[i]) + Sys.sleep(1) + put(json_name, json_data) +} +jsonDF <- magicfor::magic_result_as_dataframe() +magicfor::magic_free() + +jsonDF <- dplyr::bind_rows(jsonDF$json_data, .id = "meta_information") + +``` + + + +```{r ggplot} +df <- jsonDF %>% + filter(total > 0) %>% + select(doi, + contradicting, + mentioning, + supporting, + unclassified + ) %>% + gather(key = feature, value = number, -doi) + +library(ggplot2) + +ggplot(data = df) + + aes(x = doi, fill = feature, color = feature, weight = number) + + geom_bar(position = 'fill') + + labs(x = 'DOI', + y = 'Percentage Of Article Citation Features') + + theme_minimal() + + coord_flip() +``` + diff --git a/scite_ai.nb.html b/scite_ai.nb.html new file mode 100644 index 0000000..5d6c8dd --- /dev/null +++ b/scite_ai.nb.html @@ -0,0 +1,1864 @@ + + + + + + + + + + + + + +scite.ai + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + +
library("tidyverse")
+library("rjson")
+library("magicfor")
+ + + + + + +
myDOI <- readr::read_csv(here::here("data/BalciSdoi.txt"), col_names = "DOI", col_types = "c")
+ + + + + + +
myDOI <- myDOI %>% 
+    mutate(
+        apitallies = paste0("https://api.scite.ai/tallies/", DOI)
+    ) %>% 
+    rownames_to_column()
+ + + + + + +
magicfor::magic_for(silent = TRUE)
+json_data <- for (i in 1:(dim(myDOI)[1]-1)) {
+    json_name <- paste0("Article", myDOI$rowname[i])
+    json_data <- rjson::fromJSON(file = myDOI$apitallies[i])
+    put(json_name, json_data)
+}
+jsonDF <- magicfor::magic_result_as_dataframe()
+magicfor::magic_free()
+
+jsonDF <- dplyr::bind_rows(jsonDF$json_data, .id = "meta_information")
+
+ + + + + + +

+ + + + + +
LS0tCnRpdGxlOiAic2NpdGUuYWkiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCgpgYGB7ciBsb2FkIGxpYnJhcnl9CmxpYnJhcnkoInRpZHl2ZXJzZSIpCmxpYnJhcnkoInJqc29uIikKbGlicmFyeSgibWFnaWNmb3IiKQpgYGAKCgpgYGB7ciByZWFkIERPSX0KbXlET0kgPC0gcmVhZHI6OnJlYWRfY3N2KGhlcmU6OmhlcmUoImRhdGEvQmFsY2lTZG9pLnR4dCIpLCBjb2xfbmFtZXMgPSAiRE9JIiwgY29sX3R5cGVzID0gImMiKQpgYGAKCgpgYGB7ciBhZGQgYXBpIGNvZGV9Cm15RE9JIDwtIG15RE9JICU+JSAKICAgIG11dGF0ZSgKICAgICAgICBhcGl0YWxsaWVzID0gcGFzdGUwKCJodHRwczovL2FwaS5zY2l0ZS5haS90YWxsaWVzLyIsIERPSSkKICAgICkgJT4lIAogICAgcm93bmFtZXNfdG9fY29sdW1uKCkKYGBgCgoKYGBge3IgZ2V0IGpzb24gZGF0YX0KbWFnaWNmb3I6Om1hZ2ljX2ZvcihzaWxlbnQgPSBUUlVFKQpqc29uX2RhdGEgPC0gZm9yIChpIGluIDE6KGRpbShteURPSSlbMV0tMSkpIHsKICAgIGpzb25fbmFtZSA8LSBwYXN0ZTAoIkFydGljbGUiLCBteURPSSRyb3duYW1lW2ldKQogICAganNvbl9kYXRhIDwtIHJqc29uOjpmcm9tSlNPTihmaWxlID0gbXlET0kkYXBpdGFsbGllc1tpXSkKICAgIHB1dChqc29uX25hbWUsIGpzb25fZGF0YSkKfQpqc29uREYgPC0gbWFnaWNmb3I6Om1hZ2ljX3Jlc3VsdF9hc19kYXRhZnJhbWUoKQptYWdpY2Zvcjo6bWFnaWNfZnJlZSgpCgpqc29uREYgPC0gZHBseXI6OmJpbmRfcm93cyhqc29uREYkanNvbl9kYXRhLCAuaWQgPSAibWV0YV9pbmZvcm1hdGlvbiIpCgpgYGAKCgoKYGBge3J9CmRmIDwtIGpzb25ERiAlPiUgCiAgICBmaWx0ZXIodG90YWwgPiAwKSAlPiUgCiAgICBzZWxlY3QoZG9pLAogICAgICAgICAgIGNvbnRyYWRpY3RpbmcsCiAgICAgICAgICAgbWVudGlvbmluZywKICAgICAgICAgICBzdXBwb3J0aW5nLAogICAgICAgICAgIHVuY2xhc3NpZmllZAogICAgICAgICAgICkgJT4lIAogICAgZ2F0aGVyKGtleSA9IGZlYXR1cmUsIHZhbHVlID0gbnVtYmVyLCAtZG9pKQoKbGlicmFyeShnZ3Bsb3QyKQoKZ2dwbG90KGRhdGEgPSBkZikgKwogIGFlcyh4ID0gZG9pLCBmaWxsID0gZmVhdHVyZSwgY29sb3IgPSBmZWF0dXJlLCB3ZWlnaHQgPSBudW1iZXIpICsKICBnZW9tX2Jhcihwb3NpdGlvbiA9ICdmaWxsJykgKwogIGxhYnMoeCA9ICdET0knLAogICAgeSA9ICdQZXJjZW50YWdlIE9mIEFydGljbGUgQ2l0YXRpb24gRmVhdHVyZXMnKSArCiAgdGhlbWVfbWluaW1hbCgpICsKICBjb29yZF9mbGlwKCkKYGBgCgo=
+ + + +
+ + + + + + + +