diff --git a/docs/CONTRIBUTING.html b/docs/CONTRIBUTING.html index 4d6d8d5..a4deeb1 100644 --- a/docs/CONTRIBUTING.html +++ b/docs/CONTRIBUTING.html @@ -58,7 +58,7 @@
@@ -92,6 +92,9 @@vignettes/BacDive-ing-in.Rmd
BacDive-ing-in.Rmd
vignettes/Semi-automatic-approach.Rmd
Semi-automatic-approach.Rmd
vignettes/adr-001-JSON-not-XML.Rmd
adr-001-JSON-not-XML.Rmd
vignettes/adr-002-two-download-functions-returning-datasets.Rmd
adr-002-two-download-functions-returning-datasets.Rmd
vignettes/logic-checking-and-unit-testing-datasets.Rmd
+ logic-checking-and-unit-testing-datasets.Rmd
Just as the correctness of data analysis code should be tested automatically, the consistency of data should be evaluated and monitored as well. Using BacDive’s advanced search and BacDiveR’s retrieve_search_results()
several examples of geographic inconsistencies have been found. Presumably due to an overly strict location-to-country-to-continent mapping, several samples collected from seas neighbouring Russia (like the Sea of Japan), were assigned to Europe.
While one may debate where exactly border between Asia and Europe runs through Russia, it is clear that its Eastern shoreline is located well within Asia. These and other datasets with East Russian locations have been reported to the BacDive team and a portion of those was corrected in BacDive’s 04.07.2018 release.
+If a BacDive user finds an inconsistency within the datasets they use, BacDiveR’s retrieve_search_results()
can be used to construct a test-case for such a problem. In the following example, the test fails as long as BacDive contains datasets with the above-described discrepancy between the geo_loc_name
and continent
fields.
library(BacDiveR)
+library(testthat)
+
+test_that("Inconsistent datasets get downloaded as list", {
+
+ inconsistent_data <- retrieve_search_results(
+ "https://bacdive.dsmz.de/advsearch?advsearch=search&site=advsearch&searchparams[20][contenttype]=text&searchparams[20][typecontent]=contains&searchparams[20][searchterm]=Sea+of+Japan&searchparams[17][searchterm]=Europe"
+ )
+ expect_null(inconsistent_data)
+})
+#> Error: Test failed: 'Inconsistent datasets get downloaded as list'
+#> * `inconsistent_data` is not null.
Once the inconsistency is corrected in BacDive, the advanced search returns no results any more, and the above test passes. It can thus be used to monitor the resolution of such a problem after reporting it. Furthermore, the users is alerted (by the test failing again) in case new datasets appear in BacDive with the same inconsistency.
+See testthat.R-lib.org and the related “R Packages” chapter to learn more about testing in R (Wickham 2011; Wickham 2015).
+Wickham, Hadley. 2011. “Testthat: Get Started with Testing.” The R Journal 3: 5–10. https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
+———. 2015. R Packages: Organize, Test, Document, and Share Your Code. 1st edition. Sebastopol, CA: O’Reilly Media. http://r-pkgs.had.co.nz/.
+vignettes/logic-checking-bacdive-datasets.Rmd
+ logic-checking-bacdive-datasets.Rmd
Just as the correctness of data analysis code should be tested automatically, the consistency of data should be evaluated and monitored as well. Using BacDive’s advanced search and BacDiveR’s retrieve_search_results()
several examples of geographic inconsistencies have been found. Presumably due to an overly strict location-to-country-to-continent mapping, several samples collected from seas neighbouring Russia (like the Sea of Japan), were assigned to Europe.
While one may debate where exactly border between Asia and Europe runs through Russia, it is clear that its Eastern shoreline is located well within Asia. These and other datasets with East Russian locations have been reported to the BacDive team and a portion of those was corrected in BacDive’s 04.07.2018 release.
+library(BacDiveR)
+
+inconsistent_data <- retrieve_search_results(
+ "https://bacdive.dsmz.de/advsearch?advsearch=search&site=advsearch&searchparams[20][contenttype]=text&searchparams[20][typecontent]=contains&searchparams[20][searchterm]=Sea+of+Japan&searchparams[17][searchterm]=Europe"
+ )
+#> Data download in progress for BacDive-IDs: 131115 139987
As long as this specific inconsistency is not fixed, the above should display: Data download in progress for BacDive-IDs: 131115 139987
.
If a BacDive user finds an inconsistency within the datasets they use, BacDiveR’s retrieve_search_results()
can be used to construct a test-case for such a problem. In the following example, the test fails as long as BacDive contains datasets with the above-described discrepancy between the geo_loc_name
and continent
fields.
library(testthat)
+
+test_that("No inconsistent datasets exist", {
+ expect_null(inconsistent_data)
+})
+#> Error: Test failed: 'No inconsistent datasets exist'
+#> * `inconsistent_data` is not null.
Once the inconsistency is corrected in BacDive, the advanced search returns no results any more, and the above test passes. It can thus be used to monitor the resolution of such a problem after reporting it. Furthermore, the users is alerted (by the test failing again) in case new datasets appear in BacDive with the same inconsistency.
+See testthat.R-lib.org and the related “R Packages” chapter to learn more about testing in R (Wickham 2011; Wickham 2015).
+Wickham, Hadley. 2011. “Testthat: Get Started with Testing.” The R Journal 3: 5–10. https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
+———. 2015. R Packages: Organize, Test, Document, and Share Your Code. 1st edition. Sebastopol, CA: O’Reilly Media. http://r-pkgs.had.co.nz/.
+NEWS.md
- retrieve_search_results()
now returns NULL
when no results are found, in order to ease integration of datasets into testthat
tests.\r
, \n
and \t
are repaired to \\r
, \\n
and \\t
, which jsonlite expects. This leads to different output (newline & tabs, where previously only spaces occured)! Thus, if you are parsing BacDiveR output in any way, you may need to adjust that. Because I consider this unlikely given the “maturing” status, and because no API surface was changed, I don’t consider this a major change in the SemVer.org sense.retrieve_data()
now downloads the dataset(s) by default, not only the ID(s), see #54 & #59+dataset_717 <- retrieve_data(searchTerm = 717, searchType = "bacdive_id")#>dataset_DSM_319 <- retrieve_data(searchTerm = "DSM 319", searchType = "culturecollectionno")#>dataset_AJ000733 <- retrieve_data(searchTerm = "AJ000733", searchType = "sequence")#>datasets_Bh <- retrieve_data(searchTerm = "Bacillus halotolerans")#>#>#>#>
dataset_717 <- retrieve_data(searchTerm = 717, searchType = "bacdive_id")#>#> Error in if (payload$detail == "Not found") { stop("Your search returned no result, sorry. Please make sure that you provided a searchTerm, and specified the correct searchType. Please type '?retrieve_data' and read through the 'searchType' section to learn more.")} else if (is_dataset(payload)) { payload <- list(payload) names(payload) <- searchTerm return(payload)} else if (!is.null(payload$count)) { if (payload$count > 100) warn_slow_download(payload$count) aggregate_datasets(payload)}: argument is of length zerodataset_DSM_319 <- retrieve_data(searchTerm = "DSM 319", searchType = "culturecollectionno")#>#> Error in if (payload$detail == "Not found") { stop("Your search returned no result, sorry. Please make sure that you provided a searchTerm, and specified the correct searchType. Please type '?retrieve_data' and read through the 'searchType' section to learn more.")} else if (is_dataset(payload)) { payload <- list(payload) names(payload) <- searchTerm return(payload)} else if (!is.null(payload$count)) { if (payload$count > 100) warn_slow_download(payload$count) aggregate_datasets(payload)}: argument is of length zerodataset_AJ000733 <- retrieve_data(searchTerm = "AJ000733", searchType = "sequence")#>#> Error in if (payload$detail == "Not found") { stop("Your search returned no result, sorry. Please make sure that you provided a searchTerm, and specified the correct searchType. Please type '?retrieve_data' and read through the 'searchType' section to learn more.")} else if (is_dataset(payload)) { payload <- list(payload) names(payload) <- searchTerm return(payload)} else if (!is.null(payload$count)) { if (payload$count > 100) warn_slow_download(payload$count) aggregate_datasets(payload)}: argument is of length zerodatasets_Bh <- retrieve_data(searchTerm = "Bacillus halotolerans")#>#> Error in if (payload$detail == "Not found") { stop("Your search returned no result, sorry. Please make sure that you provided a searchTerm, and specified the correct searchType. Please type '?retrieve_data' and read through the 'searchType' section to learn more.")} else if (is_dataset(payload)) { payload <- list(payload) names(payload) <- searchTerm return(payload)} else if (!is.null(payload$count)) { if (payload$count > 100) warn_slow_download(payload$count) aggregate_datasets(payload)}: argument is of length zero
data_miller <- retrieve_search_results(queryURL = "https://bacdive.dsmz.de/advsearch?site=advsearch&searchparams[78][contenttype]=text&searchparams[78][typecontent]=contains&searchparams[78][searchterm]=Miller&advsearch=search")#>#>#>#>+data_miller <- retrieve_search_results(queryURL = "https://bacdive.dsmz.de/advsearch?site=advsearch&searchparams[78][contenttype]=text&searchparams[78][typecontent]=contains&searchparams[78][searchterm]=Miller&advsearch=search")#>#>#>#>