diff --git a/docs/CONTRIBUTING.html b/docs/CONTRIBUTING.html index 4d6d8d5..a4deeb1 100644 --- a/docs/CONTRIBUTING.html +++ b/docs/CONTRIBUTING.html @@ -58,7 +58,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -92,6 +92,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/ISSUE_TEMPLATE.html b/docs/ISSUE_TEMPLATE.html index e5de56b..1b4cea6 100644 --- a/docs/ISSUE_TEMPLATE.html +++ b/docs/ISSUE_TEMPLATE.html @@ -58,7 +58,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -92,6 +92,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index f326c63..4ed8617 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -58,7 +58,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -92,6 +92,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/PULL_REQUEST_TEMPLATE.html b/docs/PULL_REQUEST_TEMPLATE.html index aaee4bf..11b7a62 100644 --- a/docs/PULL_REQUEST_TEMPLATE.html +++ b/docs/PULL_REQUEST_TEMPLATE.html @@ -58,7 +58,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -92,6 +92,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/articles/BacDive-geo-logic-fault.png b/docs/articles/BacDive-geo-logic-fault.png new file mode 100644 index 0000000..4ec289c Binary files /dev/null and b/docs/articles/BacDive-geo-logic-fault.png differ diff --git a/docs/articles/BacDive-ing-in.html b/docs/articles/BacDive-ing-in.html index 32bc43d..908761a 100644 --- a/docs/articles/BacDive-ing-in.html +++ b/docs/articles/BacDive-ing-in.html @@ -29,7 +29,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -63,6 +63,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • @@ -92,7 +95,7 @@

    BacDive-ing in

    Retrieving data(sets) from BacDive

    Katrin Leinweber

    -

    2018-08-30

    +

    2018-09-03

    Source: vignettes/BacDive-ing-in.Rmd diff --git a/docs/articles/BacDive-ing-in_files/figure-html/ggplot-1.png b/docs/articles/BacDive-ing-in_files/figure-html/ggplot-1.png index 088bd0e..860e83c 100644 Binary files a/docs/articles/BacDive-ing-in_files/figure-html/ggplot-1.png and b/docs/articles/BacDive-ing-in_files/figure-html/ggplot-1.png differ diff --git a/docs/articles/Semi-automatic-approach.html b/docs/articles/Semi-automatic-approach.html index b459fc9..013b41b 100644 --- a/docs/articles/Semi-automatic-approach.html +++ b/docs/articles/Semi-automatic-approach.html @@ -29,7 +29,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -63,6 +63,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • @@ -92,7 +95,7 @@

    The Semi-Automatic Approach

    Pre-Configuring an Advanced Search and Retrieving the Results

    Katrin Leinweber

    -

    2018-08-30

    +

    2018-09-03

    Source: vignettes/Semi-automatic-approach.Rmd diff --git a/docs/articles/adr-001-JSON-not-XML.html b/docs/articles/adr-001-JSON-not-XML.html index b2eb82c..27fdd73 100644 --- a/docs/articles/adr-001-JSON-not-XML.html +++ b/docs/articles/adr-001-JSON-not-XML.html @@ -29,7 +29,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -63,6 +63,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • @@ -91,7 +94,7 @@

    ADR 1: Using JSON internally, instead of XML

    Katrin Leinweber

    -

    2018-08-30

    +

    2018-09-03

    Source: vignettes/adr-001-JSON-not-XML.Rmd diff --git a/docs/articles/adr-002-two-download-functions-returning-datasets.html b/docs/articles/adr-002-two-download-functions-returning-datasets.html index 9450653..3de9c8f 100644 --- a/docs/articles/adr-002-two-download-functions-returning-datasets.html +++ b/docs/articles/adr-002-two-download-functions-returning-datasets.html @@ -29,7 +29,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -63,6 +63,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • @@ -91,7 +94,7 @@

    ADR 2: Two download functions that return full datasets by default

    Katrin Leinweber

    -

    2018-08-30

    +

    2018-09-03

    Source: vignettes/adr-002-two-download-functions-returning-datasets.Rmd diff --git a/docs/articles/index.html b/docs/articles/index.html index e607a94..f3f669a 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -58,7 +58,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -92,6 +92,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • @@ -130,6 +133,7 @@

    All vignettes

  • The Semi-Automatic Approach
  • ADR 1: Using JSON internally, instead of XML
  • ADR 2: Two download functions that return full datasets by default
  • +
  • Logic-Checking BacDive Datasets
  • diff --git a/docs/articles/logic-checking-and-unit-testing-datasets.html b/docs/articles/logic-checking-and-unit-testing-datasets.html new file mode 100644 index 0000000..d9f8211 --- /dev/null +++ b/docs/articles/logic-checking-and-unit-testing-datasets.html @@ -0,0 +1,168 @@ + + + + + + + +Logic-Checking BacDive Datasets • BacDiveR + + + + + + + + + +
    +
    + + + +
    +
    + + + + +
    +

    +Example of a data inconsistency

    +

    Just as the correctness of data analysis code should be tested automatically, the consistency of data should be evaluated and monitored as well. Using BacDive’s advanced search and BacDiveR’s retrieve_search_results() several examples of geographic inconsistencies have been found. Presumably due to an overly strict location-to-country-to-continent mapping, several samples collected from seas neighbouring Russia (like the Sea of Japan), were assigned to Europe.

    +
    +Two datasets with a geo-logic fault (pun intended)

    Two datasets with a geo-logic fault (pun intended)

    +
    +

    While one may debate where exactly border between Asia and Europe runs through Russia, it is clear that its Eastern shoreline is located well within Asia. These and other datasets with East Russian locations have been reported to the BacDive team and a portion of those was corrected in BacDive’s 04.07.2018 release.

    +
    +
    +

    +How to test datasets

    +

    If a BacDive user finds an inconsistency within the datasets they use, BacDiveR’s retrieve_search_results() can be used to construct a test-case for such a problem. In the following example, the test fails as long as BacDive contains datasets with the above-described discrepancy between the geo_loc_name and continent fields.

    +
    library(BacDiveR)
    +library(testthat)
    +
    +test_that("Inconsistent datasets get downloaded as list", {
    +  
    + inconsistent_data <- retrieve_search_results(
    +      "https://bacdive.dsmz.de/advsearch?advsearch=search&site=advsearch&searchparams[20][contenttype]=text&searchparams[20][typecontent]=contains&searchparams[20][searchterm]=Sea+of+Japan&searchparams[17][searchterm]=Europe"
    +  )
    +  expect_null(inconsistent_data)
    +})
    +#> Error: Test failed: 'Inconsistent datasets get downloaded as list'
    +#> * `inconsistent_data` is not null.
    +

    Once the inconsistency is corrected in BacDive, the advanced search returns no results any more, and the above test passes. It can thus be used to monitor the resolution of such a problem after reporting it. Furthermore, the users is alerted (by the test failing again) in case new datasets appear in BacDive with the same inconsistency.

    +
    +
    +

    +References

    +

    See testthat.R-lib.org and the related “R Packages” chapter to learn more about testing in R (Wickham 2011; Wickham 2015).

    +
    +
    +

    Wickham, Hadley. 2011. “Testthat: Get Started with Testing.” The R Journal 3: 5–10. https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

    +
    +
    +

    ———. 2015. R Packages: Organize, Test, Document, and Share Your Code. 1st edition. Sebastopol, CA: O’Reilly Media. http://r-pkgs.had.co.nz/.

    +
    +
    +
    +
    + + + +
    + + + +
    + + + + + diff --git a/docs/articles/logic-checking-bacdive-datasets.html b/docs/articles/logic-checking-bacdive-datasets.html new file mode 100644 index 0000000..801f098 --- /dev/null +++ b/docs/articles/logic-checking-bacdive-datasets.html @@ -0,0 +1,170 @@ + + + + + + + +Logic-Checking BacDive Datasets • BacDiveR + + + + + + + + + +
    +
    + + + +
    +
    + + + + +
    +

    +Example of a data inconsistency

    +

    Just as the correctness of data analysis code should be tested automatically, the consistency of data should be evaluated and monitored as well. Using BacDive’s advanced search and BacDiveR’s retrieve_search_results() several examples of geographic inconsistencies have been found. Presumably due to an overly strict location-to-country-to-continent mapping, several samples collected from seas neighbouring Russia (like the Sea of Japan), were assigned to Europe.

    +
    +Two datasets with a geo-logic fault (pun intended)

    Two datasets with a geo-logic fault (pun intended)

    +
    +

    While one may debate where exactly border between Asia and Europe runs through Russia, it is clear that its Eastern shoreline is located well within Asia. These and other datasets with East Russian locations have been reported to the BacDive team and a portion of those was corrected in BacDive’s 04.07.2018 release.

    +
    library(BacDiveR)
    +  
    +inconsistent_data <- retrieve_search_results(
    +  "https://bacdive.dsmz.de/advsearch?advsearch=search&site=advsearch&searchparams[20][contenttype]=text&searchparams[20][typecontent]=contains&searchparams[20][searchterm]=Sea+of+Japan&searchparams[17][searchterm]=Europe"
    +  )
    +#> Data download in progress for BacDive-IDs: 131115 139987
    +

    As long as this specific inconsistency is not fixed, the above should display: Data download in progress for BacDive-IDs: 131115 139987.

    +
    +
    +

    +How to test datasets

    +

    If a BacDive user finds an inconsistency within the datasets they use, BacDiveR’s retrieve_search_results() can be used to construct a test-case for such a problem. In the following example, the test fails as long as BacDive contains datasets with the above-described discrepancy between the geo_loc_name and continent fields.

    +
    library(testthat)
    +
    +test_that("No inconsistent datasets exist", {
    +  expect_null(inconsistent_data)
    +})
    +#> Error: Test failed: 'No inconsistent datasets exist'
    +#> * `inconsistent_data` is not null.
    +

    Once the inconsistency is corrected in BacDive, the advanced search returns no results any more, and the above test passes. It can thus be used to monitor the resolution of such a problem after reporting it. Furthermore, the users is alerted (by the test failing again) in case new datasets appear in BacDive with the same inconsistency.

    +
    +
    +

    +References

    +

    See testthat.R-lib.org and the related “R Packages” chapter to learn more about testing in R (Wickham 2011; Wickham 2015).

    +
    +
    +

    Wickham, Hadley. 2011. “Testthat: Get Started with Testing.” The R Journal 3: 5–10. https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

    +
    +
    +

    ———. 2015. R Packages: Organize, Test, Document, and Share Your Code. 1st edition. Sebastopol, CA: O’Reilly Media. http://r-pkgs.had.co.nz/.

    +
    +
    +
    +
    + + + +
    + + + +
    + + + + + diff --git a/docs/authors.html b/docs/authors.html index 2ed4df4..167aa7d 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -58,7 +58,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -92,6 +92,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/index.html b/docs/index.html index 9697995..fc66200 100644 --- a/docs/index.html +++ b/docs/index.html @@ -29,7 +29,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -63,6 +63,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/news/index.html b/docs/news/index.html index 452cd26..e2083dd 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -58,7 +58,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -92,6 +92,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • @@ -122,7 +125,26 @@

    Changelog

    Source: NEWS.md -
    +
    +

    +BacDiveR 0.6.0

    +
    +

    +Added

    + +
    +
    +

    +Changed

    +
      +
    • +retrieve_search_results() now returns NULL when no results are found, in order to ease integration of datasets into testthat tests.
    • +
    +
    +

    BacDiveR 0.5.1

    @@ -133,26 +155,26 @@

    -
    +

    BacDiveR 0.5.0

    -
    +

    -Added

    +Added
    -
    +

    -Changed

    +Changed
    • The JSON downloads are no longer purged of all space characters pre-emptively to prevent jsonlite from complaining about invalid encoding (#43). Instead, only \r, \n and \t are repaired to \\r, \\n and \\t, which jsonlite expects. This leads to different output (newline & tabs, where previously only spaces occured)! Thus, if you are parsing BacDiveR output in any way, you may need to adjust that. Because I consider this unlikely given the “maturing” status, and because no API surface was changed, I don’t consider this a major change in the SemVer.org sense.
    -
    +

    BacDiveR 0.4.2

    @@ -163,23 +185,23 @@

    -
    +

    BacDiveR 0.4.1

    -
    +

    -Changed

    +Changed
    • Don’t run all function docu examples, and limit their output on the reference pages (see #52)
    -
    +

    BacDiveR 0.4.0

    -
    +

    -Added

    +Added
    • This changelog (see #41)
    • @@ -188,9 +210,9 @@

    -
    +

    -Changed

    +Changed
    • retrieve_data() now downloads the dataset(s) by default, not only the ID(s), see #54 & #59
    • @@ -198,7 +220,7 @@

    -
    +

    BacDiveR 0.3.1

    @@ -208,13 +230,14 @@

  • An error in the download of a single dataset found through its culture collection number (see #45)
  • -
    +

    -Added

    +Added
    • Usable example / vignette in the README.md file (see #16)
    +
    @@ -222,12 +245,7 @@

    Contents

    diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 74fb875..792a1e3 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -6,4 +6,5 @@ articles: Semi-automatic-approach: Semi-automatic-approach.html adr-001-JSON-not-XML: adr-001-JSON-not-XML.html adr-002-two-download-functions-returning-datasets: adr-002-two-download-functions-returning-datasets.html + logic-checking-bacdive-datasets: logic-checking-bacdive-datasets.html diff --git a/docs/reference/aggregate_result_URLs.html b/docs/reference/aggregate_result_URLs.html index 9f5e673..c15cca1 100644 --- a/docs/reference/aggregate_result_URLs.html +++ b/docs/reference/aggregate_result_URLs.html @@ -61,7 +61,7 @@ BacDiveR - 0.5.1 + 0.6.0
    @@ -95,6 +95,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/reference/construct_Renviron_path.html b/docs/reference/construct_Renviron_path.html index fe75a00..79a4924 100644 --- a/docs/reference/construct_Renviron_path.html +++ b/docs/reference/construct_Renviron_path.html @@ -61,7 +61,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -95,6 +95,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/reference/construct_url.html b/docs/reference/construct_url.html index 9db7e83..807ae1b 100644 --- a/docs/reference/construct_url.html +++ b/docs/reference/construct_url.html @@ -61,7 +61,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -95,6 +95,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/reference/download.html b/docs/reference/download.html index dc16657..1cb0fa4 100644 --- a/docs/reference/download.html +++ b/docs/reference/download.html @@ -61,7 +61,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -95,6 +95,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/reference/get_credentials.html b/docs/reference/get_credentials.html index 4517b17..4830d2b 100644 --- a/docs/reference/get_credentials.html +++ b/docs/reference/get_credentials.html @@ -61,7 +61,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -95,6 +95,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/reference/index.html b/docs/reference/index.html index 1b3ef95..6c6905d 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -58,7 +58,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -92,6 +92,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/reference/prepare_Renviron.html b/docs/reference/prepare_Renviron.html index ff5b4a7..40df0f6 100644 --- a/docs/reference/prepare_Renviron.html +++ b/docs/reference/prepare_Renviron.html @@ -61,7 +61,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -95,6 +95,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/reference/repair_escaping.html b/docs/reference/repair_escaping.html index 6a59972..c3a70cc 100644 --- a/docs/reference/repair_escaping.html +++ b/docs/reference/repair_escaping.html @@ -63,7 +63,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -97,6 +97,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • diff --git a/docs/reference/retrieve_data.html b/docs/reference/retrieve_data.html index 496a5d2..2fe0693 100644 --- a/docs/reference/retrieve_data.html +++ b/docs/reference/retrieve_data.html @@ -61,7 +61,7 @@ BacDiveR - 0.5.1 + 0.6.0 @@ -95,6 +95,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • @@ -160,7 +163,7 @@

    Value

    Examples

    -
    dataset_717 <- retrieve_data(searchTerm = 717, searchType = "bacdive_id")
    #> 717
    dataset_DSM_319 <- retrieve_data(searchTerm = "DSM 319", searchType = "culturecollectionno")
    #> 20319
    dataset_AJ000733 <- retrieve_data(searchTerm = "AJ000733", searchType = "sequence")
    #> 000733
    datasets_Bh <- retrieve_data(searchTerm = "Bacillus halotolerans")
    #>
    #> Data download in progress for BacDive-IDs:
    #> 1095
    #> 1847
    +
    dataset_717 <- retrieve_data(searchTerm = 717, searchType = "bacdive_id")
    #> 717
    #> Error in if (payload$detail == "Not found") { stop("Your search returned no result, sorry. Please make sure that you provided a searchTerm, and specified the correct searchType. Please type '?retrieve_data' and read through the 'searchType' section to learn more.")} else if (is_dataset(payload)) { payload <- list(payload) names(payload) <- searchTerm return(payload)} else if (!is.null(payload$count)) { if (payload$count > 100) warn_slow_download(payload$count) aggregate_datasets(payload)}: argument is of length zero
    dataset_DSM_319 <- retrieve_data(searchTerm = "DSM 319", searchType = "culturecollectionno")
    #> 20319
    #> Error in if (payload$detail == "Not found") { stop("Your search returned no result, sorry. Please make sure that you provided a searchTerm, and specified the correct searchType. Please type '?retrieve_data' and read through the 'searchType' section to learn more.")} else if (is_dataset(payload)) { payload <- list(payload) names(payload) <- searchTerm return(payload)} else if (!is.null(payload$count)) { if (payload$count > 100) warn_slow_download(payload$count) aggregate_datasets(payload)}: argument is of length zero
    dataset_AJ000733 <- retrieve_data(searchTerm = "AJ000733", searchType = "sequence")
    #> 000733
    #> Error in if (payload$detail == "Not found") { stop("Your search returned no result, sorry. Please make sure that you provided a searchTerm, and specified the correct searchType. Please type '?retrieve_data' and read through the 'searchType' section to learn more.")} else if (is_dataset(payload)) { payload <- list(payload) names(payload) <- searchTerm return(payload)} else if (!is.null(payload$count)) { if (payload$count > 100) warn_slow_download(payload$count) aggregate_datasets(payload)}: argument is of length zero
    datasets_Bh <- retrieve_data(searchTerm = "Bacillus halotolerans")
    #>
    #> Error in if (payload$detail == "Not found") { stop("Your search returned no result, sorry. Please make sure that you provided a searchTerm, and specified the correct searchType. Please type '?retrieve_data' and read through the 'searchType' section to learn more.")} else if (is_dataset(payload)) { payload <- list(payload) names(payload) <- searchTerm return(payload)} else if (!is.null(payload$count)) { if (payload$count > 100) warn_slow_download(payload$count) aggregate_datasets(payload)}: argument is of length zero
    @@ -95,6 +95,9 @@
  • ADR 2: Two download functions that return full datasets by default
  • +
  • + Logic-Checking BacDive Datasets +
  • @@ -155,7 +158,7 @@

    Value

    Examples

    -
    data_miller <- retrieve_search_results(queryURL = "https://bacdive.dsmz.de/advsearch?site=advsearch&searchparams[78][contenttype]=text&searchparams[78][typecontent]=contains&searchparams[78][searchterm]=Miller&advsearch=search")
    #> Data download in progress for BacDive-IDs:
    #> 132598
    #> 133012
    #> 140488
    +
    data_miller <- retrieve_search_results(queryURL = "https://bacdive.dsmz.de/advsearch?site=advsearch&searchparams[78][contenttype]=text&searchparams[78][typecontent]=contains&searchparams[78][searchterm]=Miller&advsearch=search")
    #> Downloading BacDive IDs:
    #> 132598
    #> 133012
    #> 140488