diff --git a/R/align_taxa.R b/R/align_taxa.R index 66333294..158a373c 100644 --- a/R/align_taxa.R +++ b/R/align_taxa.R @@ -1,10 +1,13 @@ -#' Find taxonomic alignments for a list of names to a version of the Australian Plant Census (APC) through standardizing formatting and checking for spelling issues +#' For a list of Australian plant names, find taxonomic or scientific name alignments to the APC or APNI through standardizing formatting and fixing spelling errors #' -#' This function uses Australian Plant Census (APC) & the Australian Plant Name Index (APNI) to find taxonomic alignments for a list of names. -#' It uses the internal function `match_taxa` to attempt to match input strings to taxon names in the APC/APNI. -#' It sequentially searches for matches against more than 20 different string patterns, prioritising exact matches (to accepted names as well as synonyms, orthographic variants) -#' over fuzzy matches. It prioritises matches to taxa in the APC over names in the APNI. -#' It identifies string patterns in input names that suggest a name can only be aligned to a genus (hybrids that are not in the APC/ANI; graded species; taxa not identified to species), and indicates these names only have a genus-rank match. +#' This function finds taxonomic alignments in APC or scientific name alignments in APNI. +#' It uses the internal function `match_taxa` to attempt to match input strings to taxon names in the APC/APNI. +#' It sequentially searches for matches against more than 20 different string patterns, +#' prioritising exact matches (to accepted names as well as synonyms, orthographic variants) over fuzzy matches. +#' It prioritises matches to taxa in the APC over names in the APNI. +#' It identifies string patterns in input names that suggest a name can only be aligned to a genus +#' (hybrids that are not in the APC/ANI; graded species; taxa not identified to species), +#' and indicates these names only have a genus-rank match. #' #' @param original_name A list of names to query for taxonomic alignments. #' @param output (optional) The name of the file to save the results to. diff --git a/R/create_species_state_origin_matrix.R b/R/create_species_state_origin_matrix.R index 81fb337c..59e6f564 100644 --- a/R/create_species_state_origin_matrix.R +++ b/R/create_species_state_origin_matrix.R @@ -1,6 +1,7 @@ -#' Process geographic data and return state level species origin and diversity counts +#' Use the taxon distribution data from the APC to determine state level native and introduced origin status #' -#' This function processes the geographic data available in the current or any version of the Australian Plant Census and returns state level diversity for native, introduced and more complicated species origins. +#' This function processes the geographic data available in the APC and +#' returns state level native, introduced and more complicated origins status for all taxa. #' #' #' @family diversity methods diff --git a/R/create_taxonomic_update_lookup.R b/R/create_taxonomic_update_lookup.R index fbf7acf7..1753781a 100644 --- a/R/create_taxonomic_update_lookup.R +++ b/R/create_taxonomic_update_lookup.R @@ -1,6 +1,8 @@ -#' Create a lookup table to help fix the taxonomy for a list of Australian plant species +#' Create a lookup table with the best-possible scientific name match for a list of Australian plant names #' -#' This function takes a list of Australian plant species that needs to be reconciled with current taxonomy and generates a lookup table to help fix the taxonomy. The lookup table contains the original species names, the aligned species names, and additional taxonomic information such as taxon IDs and genera. +#' This function takes a list of Australian plant names that need to be reconciled with current taxonomy and +#' generates a lookup table of the best-possible scientific name match for each input name. +#' It uses first the function `align_taxa`, then the function `update_taxonomy` to achieve the output. #' #' @family taxonomic alignment functions #' diff --git a/R/load_taxonomic_resources.R b/R/load_taxonomic_resources.R index 471f7342..90cc5e93 100644 --- a/R/load_taxonomic_resources.R +++ b/R/load_taxonomic_resources.R @@ -1,6 +1,8 @@ #' Load taxonomic resources from either stable or current versions of APC and APNI #' -#' Loads taxonomic resources into the global environment. This function accesses taxonomic data from a dataset using the provided version number or the default version. The loaded data contains two lists: APC and APNI, which contain taxonomic information about plant species in Australia. The function creates several tibbles by filtering and selecting data from the loaded lists. +#' This function loads two taxonomic datasets for Australia's vascular plants, the APC and APNI, into the global environment. +#' It accesses taxonomic data from a dataset using the provided version number or the default version. +#' The function creates several data frames by filtering and selecting data from the loaded lists. #' #' @param stable_or_current_data Type of dataset to access. The default is "stable", which loads the #' dataset from a github archived file. If set to "current", the dataset will be loaded from diff --git a/R/match_taxa.R b/R/match_taxa.R index b7852eec..e17321af 100644 --- a/R/match_taxa.R +++ b/R/match_taxa.R @@ -852,8 +852,9 @@ match_taxa <- function( return(taxa) } - # match_09a: `genus aff. species` taxa + # match_09a: `genus aff. species` and `genus cf. species`taxa # Exact match to APC-accepted or APC-known genus for names where "aff" indicates the taxon has an affinity to another taxon, but isn't the other taxon. + # Similarly, "cf" indicates that a comparison should be made between the specific taxon and another taxon, but again, isn't the other taxon. # Taxon names fitting this pattern that are not APC-accepted, APC-known, or APNI-listed species are automatically aligned to genus, # since this is the highest taxon rank that can be attached to the plant name. # This alignment can only be made after exact matches of complete taxon names to APC/APNI + fuzzy matches to APC are complete, @@ -862,7 +863,8 @@ match_taxa <- function( i <- ( stringr::str_detect(taxa$tocheck$cleaned_name, "[Aa]ff[\\.\\s]") | - stringr::str_detect(taxa$tocheck$cleaned_name, " affinis ") + stringr::str_detect(taxa$tocheck$cleaned_name, " affinis ") | + stringr::str_detect(taxa$tocheck$cleaned_name, " cf[\\.\\s]") ) & taxa$tocheck$genus %in% resources$genera_all2$canonical_name diff --git a/R/native_anywhere_in_australia.R b/R/native_anywhere_in_australia.R index edf56cae..05767c8b 100644 --- a/R/native_anywhere_in_australia.R +++ b/R/native_anywhere_in_australia.R @@ -1,9 +1,8 @@ -#' Check if a vector of species are native anywhere in Australia +#' For a vector of taxon names in to the APC, check if the species are native anywhere in Australia #' -#' This function checks if the given species is native anywhere in Australia according to the loaded version of the Australian Plant Census (APC). -#' It creates a lookup table from taxonomic resources, and checks if the species -#' is listed as native in that table. Note that this will not detect within Australia invasions, -#' e.g. if a species is from Western Australia and is invasive on the east coast. And recent invasions are unlikely to be documented yet in APC. +#' This function checks if the given species is native anywhere in Australia according to the APC. +#' Note that this will not detect within-Australia introductions, e.g. if a species is from Western Australia and is invasive on the east coast. +#' And recent invasions are unlikely to be documented yet in APC. #' For the complete matrix of species by states that also represents within-Australia invasions, #' use \link{create_species_state_origin_matrix}. For spelling checks and taxonomy updates please see \link{create_taxonomic_update_lookup}. #' diff --git a/R/standardise_names.R b/R/standardise_names.R index 0eff691a..e1bfb4e8 100644 --- a/R/standardise_names.R +++ b/R/standardise_names.R @@ -1,13 +1,10 @@ -#' Standardise Taxon Names +#' Standardises taxon names by performing a series of text substitutions to remove common inconsistencies in taxonomic nomenclature. #' -#' This function standardises taxon names by performing a series of text -#' substitutions to remove common inconsistencies in taxonomic nomenclature. -#' The function takes a character vector of taxon names as input and returns a -#' character vector of taxon names using standardised taxonomic syntax as output. In particular it standardises -#' the abbreviations used to document infraspecific taxon ranks (subsp., var., f.), -#' as people use many variants of these terms. It also standardises or removes a few additional filler -#' words used within taxon names (affinis becomes aff.; s.l. and s.s. are removed). +#' The function takes a character vector of taxon names as input and +#' returns a character vector of taxon names using standardised taxonomic syntax as output. +#' In particular it standardises taxon rank abbreviations and qualifiers (subsp., var., f.), as people use many variants of these terms. +#' It also standardises or removes a few additional filler words used within taxon names (affinis becomes aff.; s.l. and s.s. are removed). #' #' @param taxon_names A character vector of taxon names that need to be standardised. #' diff --git a/R/state_diversity_counts.R b/R/state_diversity_counts.R index 3edccdea..36c169b7 100644 --- a/R/state_diversity_counts.R +++ b/R/state_diversity_counts.R @@ -1,6 +1,7 @@ -#' Calculate Australian plant state-level diversity for native, introduced, and more complicated species origins +#' For Australian states and territories, use data from the APC to calculate state-level diversity for native, introduced, and more complicated species origins #' -#' This function calculates state-level diversity for native, introduced, and more complicated species origins based on the geographic data available in the current Australian Plant Census. +#' This function calculates state-level diversity for native, introduced, and more complicated species origins +#' based on the geographic data available in the APC. #' #' @family diversity methods #' @param state A character string indicating the Australian state or territory to calculate the diversity for. Possible values are "NSW", "NT", "Qld", "WA", "ChI", "SA", "Vic", "Tas", "ACT", "NI", "LHI", "MI", "HI", "MDI", "CoI", "CSI", and "AR". diff --git a/R/strip_names.R b/R/strip_names.R index 5c811e82..488f23f8 100644 --- a/R/strip_names.R +++ b/R/strip_names.R @@ -1,4 +1,4 @@ -#' Strip taxonomic names of subtaxa designations and special characters +#' Strip taxonomic names of taxon rank abbreviations and qualifiers and special characters #' #' Given a vector of taxonomic names, this function removes subtaxa designations ("subsp.", "var.", "f.", and "ser"), #' special characters (e.g., "-", ".", "(", ")", "?"), and extra whitespace. The resulting vector @@ -34,10 +34,10 @@ strip_names <- function(taxon_names) { tolower() } -#' Strip taxonomic names of subtaxa designations, filled words and special characters +#' Strip taxonomic names of taxon rank abbreviations and qualifiers, filler words and special characters #' #' Given a vector of taxonomic names, this function removes subtaxa designations ("subsp.", "var.", "f.", and "ser"), -#' additional filler words and characters (" x " for hybrid taxa, "sp.", "cf"), +#' additional filler words and characters (" x " for hybrid taxa, "sp."), #' special characters (e.g., "-", ".", "(", ")", "?"), and extra whitespace. The resulting vector #' of names is also converted to lowercase. #' @@ -69,7 +69,6 @@ strip_names_2 <- function(taxon_names) { stringr::str_replace_all(" sp ", " ") %>% stringr::str_replace_all(" sp1", " 1") %>% stringr::str_replace_all(" sp2", " 2") %>% - stringr::str_replace_all(" cf | cf$", " ") %>% stringr::str_replace_all("\\=", " ") %>% stringr::str_replace_all(" ", " ") %>% stringr::str_squish() %>% diff --git a/R/update_taxonomy.R b/R/update_taxonomy.R index 501e6590..f49c0f25 100644 --- a/R/update_taxonomy.R +++ b/R/update_taxonomy.R @@ -1,8 +1,9 @@ -#' Use APC and APNI to update taxonomy, replacing synonyms to current taxa where relevant +#' For a list of taxon names aligned to the APC, update the name to an accepted taxon concept per the APC and add scientific name and taxon concept metadata to names aligned to either the APC or APNI. #' -#' This function uses the Australia's Virtual Herbarium's taxonomic resources, specifically the Australian Plant -#' Census (APC) and the Australian Plant Name Index (APNI), to update taxonomy of plant species, replacing any synonyms -#' to their current accepted name. +#' This function uses the APC to update the taxonomy of names aligned to a taxon concept listed in the APC to the currently accepted name for the taxon concept. +#' The aligned_data data frame that is input must contain 5 columns, +#' `original_name`, `aligned_name`, `taxon_rank`, `taxonomic_dataset`, and `aligned_reason`. +#' The aligned name is a plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function. #' #' @family taxonomic alignment functions #' diff --git a/_pkgdown.yml b/_pkgdown.yml index c82eaa02..2833f317 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -19,9 +19,9 @@ navbar: - text: "Data providers" - text: APC and APNI href: articles/data-providers.html - - text: "Data caching" - - text: How is APC/APNI stored in APCalign? - href: 'articles/caching.html' + - text: "Functions" + - text: Details on the 10 exported functions, including examples of usage + href: function_notes.html - text: ------- - text: "Taxon matching" - text: Our fuzzy matching algorithm diff --git a/inst/extdata/APCalign_outputs_documentation.csv b/inst/extdata/APCalign_outputs_documentation.csv index 6ced2f53..12a48d13 100644 --- a/inst/extdata/APCalign_outputs_documentation.csv +++ b/inst/extdata/APCalign_outputs_documentation.csv @@ -2,7 +2,7 @@ variable,returned by,description original_name,default,The original plant name. aligned_name,default,The input plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function. accepted_name,default,The APC-accepted plant name when available. -suggested_name,default,The suggested plant name to use. Identical to the accepted_name when an accepted_name exists; otherwise the the suggested_name is the aligned_name. +suggested_name,default,The suggested plant name to use. Identical to the accepted_name when an accepted_name exists; otherwise the suggested_name is the aligned_name or the aligned name with an outdated genus updated. genus,default,The genus of the accepted (or suggested) name; only APC-accepted genus names are filled in. family,full,The family of the accepted (or suggested) name; only APC-accepted family names are filled in. taxon_rank,default,The taxonomic rank of the suggested (and accepted) name. @@ -18,4 +18,4 @@ taxon_ID_genus,full,An identifier for the genus; only filled in if an APC-accept scientific_name_ID,full,An identifier for the nomenclatural (not taxonomic) details of a scientific name; available for both APC and APNI names. taxonomic_status_aligned,full,The taxonomic status of the aligned name before any taxonomic updates have been applied. row_number,full,The row number of a specific original_name in the input. -number_of_collapsed_taxa,default,The number of possible taxon names that have been collapsed when taxonomic_splits == "collapse_to_higher_taxon". +number_of_collapsed_taxa,default,"The number of possible taxon names that have been collapsed when taxonomic_splits == ""collapse_to_higher_taxon""." diff --git a/inst/extdata/match_taxa_documentation.csv b/inst/extdata/match_taxa_documentation.csv index c3e48408..0da47f8e 100644 --- a/inst/extdata/match_taxa_documentation.csv +++ b/inst/extdata/match_taxa_documentation.csv @@ -7,12 +7,12 @@ match_03a,"Detect ` -- `, `--` (intergrade taxa) and align to genus","first word match_03b,"Detect ` -- `, `--` (intergrade taxa) and align to genus","first word (""genus"")",fuzzy,APC accepted taxon concepts,genus, match_03c,"Detect ` -- `, `--` (intergrade taxa) and align to genus","first word (""genus"")",fuzzy,other APC taxon concepts,genus, match_03d,"Detect ` -- `, `--` (intergrade taxa) and align to genus","first word (""genus"")",fuzzy,APNI,genus, -match_03e,"Detect ` -- `, `--` (intergrade taxa), but fail to align to genus",NA,no match,NA,genus, +match_03e,"Detect ` -- `, `--` (intergrade taxa), but fail to align to genus",NA,no match,NA,NA, match_04a,Detect ` \` (indecision between taxa) and align to genus.,"first word (""genus"")",exact,"APC accepted taxon concepts, other APC taxon concepts, APNI",genus,Next find strings that indicate a name reflects a data collector's indecision about which of two (or more) taxa is the appropriate taxon. These names can only be aligned to a genus. match_04b,Detect ` \` (indecision between taxa) and align to genus.,"first word (""genus"")",fuzzy,APC accepted taxon concepts,genus, match_04c,Detect ` \` (indecision between taxa) and align to genus.,"first word (""genus"")",fuzzy,other APC taxon concepts,genus, match_04d,Detect ` \` (indecision between taxa) and align to genus.,"first word (""genus"")",fuzzy,APNI,genus, -match_04e,"Detect ` \` (indecision between taxa), but fail to align to genus",NA,no match,NA,genus, +match_04e,"Detect ` \` (indecision between taxa), but fail to align to genus",NA,no match,NA,NA, match_05a,"Detect scientific names, including authorship",original_name,exact,APC accepted taxon concepts,species/infraspecific,"Check if strings are full scientific names, including authorship." match_05b,"Detect scientific names, including authorship",original_name,exact,other APC taxon concepts,species/infraspecific, match_06a,"Detect canonical names, lacking authorship",cleaned_name,exact,APC accepted taxon concepts,species/infraspecific,"Check if strings are taxon names, lacking authorship." @@ -24,14 +24,14 @@ match_09a,"Detect `aff`, `affinis` (affinity to) and align to genus","first word match_09b,"Detect `aff`, `affinis` (affinity to) and align to genus","first word (""genus"")",fuzzy,APC accepted taxon concepts,genus, match_09c,"Detect `aff`, `affinis` (affinity to) and align to genus","first word (""genus"")",fuzzy,other APC taxon concepts,genus, match_09d,"Detect `aff`, `affinis` (affinity to) and align to genus","first word (""genus"")",fuzzy,APNI,genus, -match_09e,"Detect `aff`, `affinis` (affinity to), but fail to align to genus",NA,no match,NA,genus, +match_09e,"Detect `aff`, `affinis` (affinity to), but fail to align to genus",NA,no match,NA,NA, match_10a,"Detect canonical names, lacking authorship",stripped_name,imprecise fuzzy,APC accepted taxon concepts,species/infraspecific,"Further checks if strings are taxon names, lacking authorship, now with imprecise fuzzy matching" match_10b,"Detect canonical names, lacking authorship",stripped_name,imprecise fuzzy,other APC taxon concepts,species/infraspecific, match_11a,Detect ` x ` (hybrid taxon) and align to genus,"first word (""genus"")",exact,"APC accepted taxon concepts, other APC taxon concepts, APNI",genus,"Find strings that indicate a name that is a hybrid between two taxa. Such names, unless documented in APC (i.e. matches 6, 7 above) can only be aligned to genus." match_11b,Detect ` x ` (hybrid taxon) and align to genus,"first word (""genus"")",fuzzy,APC accepted taxon concepts,genus, match_11c,Detect ` x ` (hybrid taxon) and align to genus,"first word (""genus"")",fuzzy,other APC taxon concepts,genus, match_11d,Detect ` x ` (hybrid taxon) and align to genus,"first word (""genus"")",fuzzy,APNI,genus, -match_11e,"Detect ` x ` (hybrid taxon), but fail to align to genus",NA,no match,NA,genus, +match_11e,"Detect ` x ` (hybrid taxon), but fail to align to genus",NA,no match,NA,NA, match_12a,"Detect canonical names, by checking first three words in string",trinomial (from stripped_name_2),exact,APC accepted taxon concepts,species/infraspecific,"Check if the first three words in the name string match with a taxon name, allowing notes to be discarded. Also useful for aligning phrase names." match_12b,"Detect canonical names, by checking first three words in string",trinomial (from stripped_name_2),exact,other APC taxon concepts,species/infraspecific, match_13a,"Detect canonical names, by checking first three words in string",trinomial (from stripped_name_2),fuzzy,APC accepted taxon concepts,species/infraspecific, diff --git a/inst/extdata/test_taxa.csv b/inst/extdata/test_taxa.csv new file mode 100644 index 00000000..b1dabc09 --- /dev/null +++ b/inst/extdata/test_taxa.csv @@ -0,0 +1,33 @@ +original_name +Banksia serrata +Banksia serrate +Banksee serrate +Banksia cerrata +Banksia sp. +Dryandra sp. +Argyrodendron (Whyanbeel) +Argyrodendron ssp. (Whyanbeel BH 1106RFK) +Argyrodendron Whyanbeel +Argyrodendron sp. (Whyanbeel BH 1106RFK) +Argyrodendron sp. Whyanbeel (B.P.Hyland RFK 1106) +Argyrodendron sp. Whyanbeel (B.P.Hyland RFK1106) +Dryandra aurantia +Banksia aurantia +Dryandra blechnifolia +Banksia pellaeifolia +Dryandra idiogenes +Banksia idiogenes +Dryandra lindleyana +Banksia dallanneyi +Acacia aneura +Acacia minyura +Acacia paraneura +Racosperma aneurum +Acacia aneura var. intermedia +Banksia (has long pink leaves) +Dryandra (has long pink leaves) +Acacia minyura / Acacia paraneura +Acacia aphanoclada x Acacia pyrifolia var. pyrifolia +Acacia minyura x Acacia paraneura +"no clue, a monocot" +Orchidaceae (epiphtye) diff --git a/inst/extdata/update_taxonomy_documentation.csv b/inst/extdata/update_taxonomy_documentation.csv index 58f2e306..6ad2bf13 100644 --- a/inst/extdata/update_taxonomy_documentation.csv +++ b/inst/extdata/update_taxonomy_documentation.csv @@ -1,14 +1,11 @@ -function name,categories of aligned names processed:,,,,columns filled in,, -,taxonomic dataset,taxon rank,updates to aligned name,format of `suggested_name`,accepted name (& taxon_ID),genus (& taxon_ID_genus),scientific_name_ID +,categories of aligned names processed:,,,,columns filled in,, +function name,taxonomic dataset,taxon rank,updates to aligned name,format of `suggested_name`,accepted name (& taxon_ID),genus (& taxon_ID_genus),scientific_name_ID update_taxonomy_APC_genus,APC,genus,to APC accepted genus,`genus sp. [notes]` *,no,yes,no update_taxonomy_APNI_genus,APNI,genus,none,`genus sp. [notes]`,no,no,no update_taxonomy_APC_family,APC,family,none,`family sp. [notes]`,no,no,no update_taxonomy_APC_species_and_infraspecific_taxa,APC,species & infraspecific,,APC accepted species** name,yes,yes,yes -"taxonomic_splits = ""most_likely_species""",,,to APC accepted taxon concept,most likely APC accepted species** name [alternative possible names],yes,yes,yes -"taxonomic_splits = ""return_all""",,,to APC accepted taxon concept,all possible APC accepted species** name (extra rows added),yes,yes,yes -"taxonomic_splits = ""collapse_to_higher_taxon""",,,collapsed to APC accepted genus,`genus sp.` [collapsed names],no,yes,no +" -- taxonomic_splits = ""most_likely_species""",,,to APC accepted taxon concept,most likely APC accepted species** name [alternative possible names],yes,yes,yes +" -- taxonomic_splits = ""return_all""",,,to APC accepted taxon concept,all possible APC accepted species** name (extra rows added),yes,yes,yes +" -- taxonomic_splits = ""collapse_to_higher_taxon""",,,collapsed to APC accepted genus,`genus sp.` [collapsed names],no,yes,no update_taxonomy_APNI_species_and_infraspecific_taxa,APNI,species & infraspecific,none to species name; genus to APC accepted genus if possible,APNI listed species** name*,no,sometimes,yes -not required,(not aligned),(not aligned),none,original name,no,no,no -,,,,,,, -* genus updated to APC accepted genus if possible,,,,,,, -** species or infraspecific taxon name,,,,,,, +(names not aligned),(not aligned),(not aligned),none,original name,no,no,no diff --git a/man/align_taxa.Rd b/man/align_taxa.Rd index 152bdea2..2598d66c 100644 --- a/man/align_taxa.Rd +++ b/man/align_taxa.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/align_taxa.R \name{align_taxa} \alias{align_taxa} -\title{Find taxonomic alignments for a list of names to a version of the Australian Plant Census (APC) through standardizing formatting and checking for spelling issues} +\title{For a list of Australian plant names, find taxonomic or scientific name alignments to the APC or APNI through standardizing formatting and fixing spelling errors} \usage{ align_taxa( original_name, @@ -70,11 +70,14 @@ A tibble with columns that include original_name, aligned_name, taxonomic_datase } } \description{ -This function uses Australian Plant Census (APC) & the Australian Plant Name Index (APNI) to find taxonomic alignments for a list of names. +This function finds taxonomic alignments in APC or scientific name alignments in APNI. It uses the internal function \code{match_taxa} to attempt to match input strings to taxon names in the APC/APNI. -It sequentially searches for matches against more than 20 different string patterns, prioritising exact matches (to accepted names as well as synonyms, orthographic variants) -over fuzzy matches. It prioritises matches to taxa in the APC over names in the APNI. -It identifies string patterns in input names that suggest a name can only be aligned to a genus (hybrids that are not in the APC/ANI; graded species; taxa not identified to species), and indicates these names only have a genus-rank match. +It sequentially searches for matches against more than 20 different string patterns, +prioritising exact matches (to accepted names as well as synonyms, orthographic variants) over fuzzy matches. +It prioritises matches to taxa in the APC over names in the APNI. +It identifies string patterns in input names that suggest a name can only be aligned to a genus +(hybrids that are not in the APC/ANI; graded species; taxa not identified to species), +and indicates these names only have a genus-rank match. } \examples{ \donttest{align_taxa(c("Poa annua", "Abies alba"))} diff --git a/man/create_species_state_origin_matrix.Rd b/man/create_species_state_origin_matrix.Rd index b7a4a9de..ed019678 100644 --- a/man/create_species_state_origin_matrix.Rd +++ b/man/create_species_state_origin_matrix.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/create_species_state_origin_matrix.R \name{create_species_state_origin_matrix} \alias{create_species_state_origin_matrix} -\title{Process geographic data and return state level species origin and diversity counts} +\title{Use the taxon distribution data from the APC to determine state level native and introduced origin status} \usage{ create_species_state_origin_matrix(resources = load_taxonomic_resources()) } @@ -13,7 +13,8 @@ create_species_state_origin_matrix(resources = load_taxonomic_resources()) A tibble with columns representing each state and rows representing each species. The values in each cell represent the origin of the species in that state. } \description{ -This function processes the geographic data available in the current or any version of the Australian Plant Census and returns state level diversity for native, introduced and more complicated species origins. +This function processes the geographic data available in the APC and +returns state level native, introduced and more complicated origins status for all taxa. } \examples{ \donttest{create_species_state_origin_matrix()} diff --git a/man/create_taxonomic_update_lookup.Rd b/man/create_taxonomic_update_lookup.Rd index bf67bd15..b72cd64e 100644 --- a/man/create_taxonomic_update_lookup.Rd +++ b/man/create_taxonomic_update_lookup.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/create_taxonomic_update_lookup.R \name{create_taxonomic_update_lookup} \alias{create_taxonomic_update_lookup} -\title{Create a lookup table to help fix the taxonomy for a list of Australian plant species} +\title{Create a lookup table with the best-possible scientific name match for a list of Australian plant names} \usage{ create_taxonomic_update_lookup( taxa, @@ -64,7 +64,9 @@ A lookup table containing the accepted and suggested names for each original nam } } \description{ -This function takes a list of Australian plant species that needs to be reconciled with current taxonomy and generates a lookup table to help fix the taxonomy. The lookup table contains the original species names, the aligned species names, and additional taxonomic information such as taxon IDs and genera. +This function takes a list of Australian plant names that need to be reconciled with current taxonomy and +generates a lookup table of the best-possible scientific name match for each input name. +It uses first the function \code{align_taxa}, then the function \code{update_taxonomy} to achieve the output. } \examples{ \donttest{resources <- load_taxonomic_resources() diff --git a/man/load_taxonomic_resources.Rd b/man/load_taxonomic_resources.Rd index 0d37a75a..cfab6cc3 100644 --- a/man/load_taxonomic_resources.Rd +++ b/man/load_taxonomic_resources.Rd @@ -23,7 +23,9 @@ a URL which is the cutting edge version, but this may change at any time without The taxonomic resources data loaded into the global environment. } \description{ -Loads taxonomic resources into the global environment. This function accesses taxonomic data from a dataset using the provided version number or the default version. The loaded data contains two lists: APC and APNI, which contain taxonomic information about plant species in Australia. The function creates several tibbles by filtering and selecting data from the loaded lists. +This function loads two taxonomic datasets for Australia's vascular plants, the APC and APNI, into the global environment. +It accesses taxonomic data from a dataset using the provided version number or the default version. +The function creates several data frames by filtering and selecting data from the loaded lists. } \examples{ \donttest{load_taxonomic_resources(stable_or_current_data="stable",version="0.0.2.9000")} diff --git a/man/native_anywhere_in_australia.Rd b/man/native_anywhere_in_australia.Rd index 401cb1a9..2e9e6cd4 100644 --- a/man/native_anywhere_in_australia.Rd +++ b/man/native_anywhere_in_australia.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/native_anywhere_in_australia.R \name{native_anywhere_in_australia} \alias{native_anywhere_in_australia} -\title{Check if a vector of species are native anywhere in Australia} +\title{For a vector of taxon names in to the APC, check if the species are native anywhere in Australia} \usage{ native_anywhere_in_australia(species, resources = load_taxonomic_resources()) } @@ -17,10 +17,9 @@ A tibble with two columns: \code{species}, which is the same as the unique value and \code{native_anywhere_in_aus}, a vector indicating whether each species is native anywhere in Australia, introduced by humans from elsewhere, or unknown with respect to the APC resource. } \description{ -This function checks if the given species is native anywhere in Australia according to the loaded version of the Australian Plant Census (APC). -It creates a lookup table from taxonomic resources, and checks if the species -is listed as native in that table. Note that this will not detect within Australia invasions, -e.g. if a species is from Western Australia and is invasive on the east coast. And recent invasions are unlikely to be documented yet in APC. +This function checks if the given species is native anywhere in Australia according to the APC. +Note that this will not detect within-Australia introductions, e.g. if a species is from Western Australia and is invasive on the east coast. +And recent invasions are unlikely to be documented yet in APC. For the complete matrix of species by states that also represents within-Australia invasions, use \link{create_species_state_origin_matrix}. For spelling checks and taxonomy updates please see \link{create_taxonomic_update_lookup}. } diff --git a/man/standardise_names.Rd b/man/standardise_names.Rd index 180b2755..fc691262 100644 --- a/man/standardise_names.Rd +++ b/man/standardise_names.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/standardise_names.R \name{standardise_names} \alias{standardise_names} -\title{Standardise Taxon Names} +\title{Standardises taxon names by performing a series of text substitutions to remove common inconsistencies in taxonomic nomenclature.} \usage{ standardise_names(taxon_names) } @@ -13,13 +13,10 @@ standardise_names(taxon_names) A character vector of standardised taxon names. } \description{ -This function standardises taxon names by performing a series of text -substitutions to remove common inconsistencies in taxonomic nomenclature. -The function takes a character vector of taxon names as input and returns a -character vector of taxon names using standardised taxonomic syntax as output. In particular it standardises -the abbreviations used to document infraspecific taxon ranks (subsp., var., f.), -as people use many variants of these terms. It also standardises or removes a few additional filler -words used within taxon names (affinis becomes aff.; s.l. and s.s. are removed). +The function takes a character vector of taxon names as input and +returns a character vector of taxon names using standardised taxonomic syntax as output. +In particular it standardises taxon rank abbreviations and qualifiers (subsp., var., f.), as people use many variants of these terms. +It also standardises or removes a few additional filler words used within taxon names (affinis becomes aff.; s.l. and s.s. are removed). } \examples{ standardise_names(c("Quercus suber", diff --git a/man/state_diversity_counts.Rd b/man/state_diversity_counts.Rd index 7d527049..9f2e3f68 100644 --- a/man/state_diversity_counts.Rd +++ b/man/state_diversity_counts.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/state_diversity_counts.R \name{state_diversity_counts} \alias{state_diversity_counts} -\title{Calculate Australian plant state-level diversity for native, introduced, and more complicated species origins} +\title{For Australian states and territories, use data from the APC to calculate state-level diversity for native, introduced, and more complicated species origins} \usage{ state_diversity_counts(state, resources = load_taxonomic_resources()) } @@ -16,7 +16,8 @@ A tibble of diversity counts for the specified state or territory, including nat The tibble has three columns: "origin" indicating the origin of the species, "state" indicating the Australian state or territory, and "num_species" indicating the number of species for that origin and state. } \description{ -This function calculates state-level diversity for native, introduced, and more complicated species origins based on the geographic data available in the current Australian Plant Census. +This function calculates state-level diversity for native, introduced, and more complicated species origins +based on the geographic data available in the APC. } \examples{ \donttest{state_diversity_counts(state = "NSW")} diff --git a/man/strip_names.Rd b/man/strip_names.Rd index 484bb7cf..459288c4 100644 --- a/man/strip_names.Rd +++ b/man/strip_names.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/strip_names.R \name{strip_names} \alias{strip_names} -\title{Strip taxonomic names of subtaxa designations and special characters} +\title{Strip taxonomic names of taxon rank abbreviations and qualifiers and special characters} \usage{ strip_names(taxon_names) } diff --git a/man/strip_names_2.Rd b/man/strip_names_2.Rd index 9b1d2dfa..2812d9bd 100644 --- a/man/strip_names_2.Rd +++ b/man/strip_names_2.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/strip_names.R \name{strip_names_2} \alias{strip_names_2} -\title{Strip taxonomic names of subtaxa designations, filled words and special characters} +\title{Strip taxonomic names of taxon rank abbreviations and qualifiers, filler words and special characters} \usage{ strip_names_2(taxon_names) } @@ -15,7 +15,7 @@ characters, additional filler words and extra whitespace removed, and all letter } \description{ Given a vector of taxonomic names, this function removes subtaxa designations ("subsp.", "var.", "f.", and "ser"), -additional filler words and characters (" x " for hybrid taxa, "sp.", "cf"), +additional filler words and characters (" x " for hybrid taxa, "sp."), special characters (e.g., "-", ".", "(", ")", "?"), and extra whitespace. The resulting vector of names is also converted to lowercase. } diff --git a/man/update_taxonomy.Rd b/man/update_taxonomy.Rd index 921863f0..487d23a4 100644 --- a/man/update_taxonomy.Rd +++ b/man/update_taxonomy.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/update_taxonomy.R \name{update_taxonomy} \alias{update_taxonomy} -\title{Use APC and APNI to update taxonomy, replacing synonyms to current taxa where relevant} +\title{For a list of taxon names aligned to the APC, update the name to an accepted taxon concept per the APC and add scientific name and taxon concept metadata to names aligned to either the APC or APNI.} \usage{ update_taxonomy( aligned_data, @@ -54,9 +54,10 @@ A tibble with updated taxonomy for the specified plant names. The tibble contain } } \description{ -This function uses the Australia's Virtual Herbarium's taxonomic resources, specifically the Australian Plant -Census (APC) and the Australian Plant Name Index (APNI), to update taxonomy of plant species, replacing any synonyms -to their current accepted name. +This function uses the APC to update the taxonomy of names aligned to a taxon concept listed in the APC to the currently accepted name for the taxon concept. +The aligned_data data frame that is input must contain 5 columns, +\code{original_name}, \code{aligned_name}, \code{taxon_rank}, \code{taxonomic_dataset}, and \code{aligned_reason}. +The aligned name is a plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function. } \examples{ # Update taxonomy for two plant names and print the result diff --git a/vignettes/articles/data-providers.Rmd b/vignettes/articles/data-providers.Rmd index 6802898a..22fda6ca 100644 --- a/vignettes/articles/data-providers.Rmd +++ b/vignettes/articles/data-providers.Rmd @@ -18,23 +18,23 @@ library(dplyr) ## Australian Plant Census (APC) -The [Australian Plant Census (APC)](https://biodiversity.org.au/nsl/services/search/taxonomy) is national database of accepted taxonomic names for [Australian vascular plants](https://bie.ala.org.au/species/NZOR-6-33408). The APC includes information on synonyms, and misapplications of them, as well as established status (native/introduced) and distribution across states and territories. +The [Australian Plant Census (APC)](https://biodiversity.org.au/nsl/services/search/taxonomy) is the national taxonomic database of accepted names for [Australian vascular plants](https://bie.ala.org.au/species/NZOR-6-33408). The APC includes information on synonyms, and misapplications of them, as well as established status (native/introduced) and distribution across states and territories. -'APCalign' will first use the APC to align your taxonomic names to ones that exist in the database. +'APCalign' will first attempt to align your plant names to scientific names that exist in the APC. -## Australian Plant Index (APNI) +## Australian Plant Names Index (APNI) -The [Australian Plant Index (APNI)](https://www.anbg.gov.au/apni/) is a database containing names of Australian plants and their use in scientific literature. It is primarily used by the botanical community for standardising synonyms. Importantly, APNI does not provide recommendation of taxonomy or nomenclature, this is where the APC comes in. +The [Australian Plant Names Index (APNI)](https://www.anbg.gov.au/apni/) is a database containing all names used for Australian plants in scientific literature. It is primarily used by the botanical community for standardising synonyms. Importantly, APNI does not provide recommendations of taxonomy or nomenclature; only the APC indicates which taxonomy is considered accepted. 'APCalign' uses APNI when an alignment cannot be found in the APC. ## Data standards and meta-data -Data from both APNI and APC are formatted according to the [Darwin Core standard](https://dwc.tdwg.org/terms/), a widely used format for many databases. +Data from both APNI and APC are formatted according to the [Darwin Core standard](https://dwc.tdwg.org/terms/), a widely used data standard for biodiversity data. You can find the meta-data for the APC and APNI below: - [Meta-data for APC output](https://ibis-cloud.atlassian.net/wiki/spaces/NP/pages/1154383943/NSL+Taxon+export+format) -- [Meta-data of the APNI output](https://ibis-cloud.atlassian.net/wiki/spaces/NP/pages/1154383919/NSL+Name+export+format) +- [Meta-data for APNI output](https://ibis-cloud.atlassian.net/wiki/spaces/NP/pages/1154383919/NSL+Name+export+format) For more details on APNI and APC, we recommend taking a read of [their extensive documentation](https://ibis-cloud.atlassian.net/wiki/spaces/NP/pages/1380483087/NSL+API+Documentation#1.-Introduction). diff --git a/vignettes/articles/function_notes.Rmd b/vignettes/articles/function_notes.Rmd new file mode 100644 index 00000000..7c0c6003 --- /dev/null +++ b/vignettes/articles/function_notes.Rmd @@ -0,0 +1,232 @@ +--- +title: "Function notes" +author: "Elizabeth Wenk" +date: "2024-01-22" +output: html_document +--- + +# APCalign functions + +APCalign exports [10 functions](https://traitecoevo.github.io/APCalign/reference/index.html) to facilitate the alignment of submitted plant names to scientific names on the APC and APNI lists. They are listed in order of likelihood of use. + +## Taxon name alignment and updating functions + +### create_taxonomic_update_lookup + +**description**: This function takes a list of Australian plant names that need to be reconciled with current taxonomy and generates a lookup table of the best-possible scientific name match for each input name. It uses first the function `align_taxa`, then the function `update_taxonomy` to achieve the output. The aligned name is plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function. + +**usage notes**: This is APCalign's core function, merging together the alignment and updating of taxonomy. + +**arguments**: + +``` +taxa #input vector of taxon names +stable_or_current_data = "stable" +version = default_version() +taxonomic_splits = "most_likely_species" #options for names with ambiguous taxonomic histories +full = FALSE #outputs fewer (FALSE) or more (TRUE) columns +APNI_matches = TRUE #include (TRUE) or exclude (FALSE) APNI list +imprecise_fuzzy_matches = FALSE #disallow (FALSE) or allow (TRUE) imprecise fuzzy matches +identifier = NA_character_ #include a unique identifier as part of informal names +resources = load_taxonomic_resources() +output = NULL +``` + +**output**: A data frame with rows representing each taxon and columns documenting taxon metadata (*original_name, aligned_name, accepted_name, suggested_name, genus, family, taxon_rank, taxonomic_dataset, taxonomic_status, taxonomic_status_aligned, aligned_reason, update_reason, subclass, taxon_distribution, scientific_name_authorship, taxon_ID, taxon_ID_genus, scientific_name_ID, row_number, number_of_collapsed_taxa*). + +**example**: + +```{r, eval = FALSE, echo = TRUE} +input <- c("Banksia serrata", "Banksia serrate", "Banksia cerrata", "Banksea serrata", "Banksia serrrrata", "Dryandra") +resources <- load_taxonomic_resources() + +updated_taxa <- + APCalign::create_taxonomic_update_lookup( + taxa = input, + identifier = "APCalign test", + full = TRUE, + resources = resources + ) +``` + +or, start with a csv file where there is a column of taxon names to align + +```{r, eval = FALSE, echo = TRUE} +taxon_list <- #or load data through the R studio menu + readr::read_csv(here("inst/", "extdata", "test_taxa.csv"), + show_col_types = FALSE + ) +resources <- load_taxonomic_resources() + +updated_taxa <- + APCalign::create_taxonomic_update_lookup( + taxa = taxon_list$original_name, + identifier = "APCalign test", + full = TRUE, + resources = resources + ) +``` + +**notes**\ +- If you will be running the function `APCalign::create_taxonomic_update_lookup` many times, it is best to load the taxonomic resources separately using `resources <- load_taxonomic_resources()`, then add the argument `resources = resources`\ +- The name `Banksia cerrata` does not align as the fuzzy matching algorithm does not allow the first letter of the genus and species epithet to change.\ +- The argument `taxonomic_splits` allows you to choose the outcome for updating the names of taxa with ambiguous taxonomic histories; this applies to scientific names that were once attached to a more broadly circumscribed taxon concept, that was then split into several more narrowly circumscribed taxon concepts, one of which retains the original name. There are three options: `most_likely_species` returns the name that is retained, with alternative names documented in square brackets; `return_all` adds additional rows to the output, one for each possible taxon concept; `collapse_to_higher_taxon` returns the genus with possible names in square brackets.\ +- The argument `identifier` allows you to add a fix text string to all genus- and family- level names, such as `identifier = "Royal NP"` would return \`Acacia sp. [Royal NP]`. + +### align_taxa + +**description**: This function finds taxonomic alignments in the APC or APNI. It uses the internal function `match_taxa` to attempt to match input strings to taxon names in the APC/APNI. It sequentially searches for matches against more than 20 different string patterns, prioritising exact matches (to accepted names as well as synonyms, orthographic variants) over fuzzy matches. It prioritises matches to taxa in the APC over names in the APNI. It identifies string patterns in input names that suggest a name can only be aligned to a genus (hybrids that are not in the APC/ANI; graded species; taxa not identified to species), and indicates these names only have a genus-rank match. + +**usage notes**: Users will run this function if they wish to see the details of the matching algorithms, the many output columns that the matching function compares to as it seeks the best alignment. They may also select this function if they want to adjust the "fuzziness" level for fuzzy matches, options not allowed in `create_taxonomic_update_lookup`. This function is the first half of `create_taxonomic_update_lookup`. + +**arguments**: + +``` +original_name #input vector of taxon names +output = NULL +full = FALSE #outputs fewer (FALSE) or more (TRUE) columns +resources = load_taxonomic_resources() +fuzzy_abs_dist = 3 #set number of characters allowed to be different for fuzzy match +fuzzy_rel_dist = 0.2 #set proportion of characters allowed to be different for fuzzy match +fuzzy_matches = TRUE #disallow (FALSE) or allow (TRUE) any fuzzy matches +imprecise_fuzzy_matches = FALSE #disallow (FALSE) or allow (TRUE) imprecise fuzzy matches +APNI_matches = TRUE #include (TRUE) or exclude (FALSE) APNI list +identifier = NA_character #include a unique identifier as part of informal names +``` + +**output**: A data frame with rows representing each taxon and with columns documenting the alignment made, the reason for this alignment, and a selection of taxon name mutations to which the original name was compared (*original_name, aligned_name, taxonomic_dataset, taxon_rank, aligned_reason, alignment_code, cleaned_name, stripped_name, stripped_name2, trinomial, binomial, genus, fuzzy_match_genus, fuzzy_match_genus_known, fuzzy_match_genus_APNI, fuzzy_match_cleaned_APC, fuzzy_match_cleaned_APC_known, fuzzy_match_cleaned_APC_imprecise, fuzzy_match_cleaned_APC_known_imprecise, fuzzy_match_binomial, fuzzy_match_binomial_APC_known, fuzzy_match_trinomial, fuzzy_match_trinomial_known, fuzzy_match_cleaned_APNI, fuzzy_match_cleaned_APNI_imprecise*). + +**example**: + +```{r, eval = FALSE, echo = TRUE} +input <- c("Banksia serrata", "Banksia serrate", "Banksia cerrata", "Banksia serrrrata", "Dryandra sp.", "Banksia big red flowers") +resources <- load_taxonomic_resources() + + +aligned_taxa <- + APCalign::align_taxa( + original_name = input, + identifier = "APCalign test", + full = TRUE, + resources = resources + ) +``` + +**notes**\ +- If you will be running the function `APCalign::create_taxonomic_update_lookup` many times, it is best to load the taxonomic resources separately using `resources <- load_taxonomic_resources()`, then add the argument `resources = resources`\ +- The name `Banksia cerrata` does not align as the fuzzy matching algorithm does not allow the first letter of the genus and species epithet to change.\ +- With this function you have the option of changing the fuzzy matching parameters. The defaults, with fuzzy matches only allowing changes of 3 (or fewer) characters AND 20% (or less) of characters has been carefully calibrated to catch just about all typos, but very, very rarely mis-align a name. If you wish to introduce less conservative fuzzy matching it is recommended you manually check the aligned names.\ +- It is recommended that you begin with `imprecise_fuzzy_matches = FALSE` (the default), as quite a few of the less precise fuzzy matches are likely to be erroneous. This argument should be turned on only if you plan to check all alignments manually.\ +- The argument `identifier` allows you to add a fix text string to all genus- and family- level names, such as `identifier = "Royal NP"` would return `Acacia sp. [Royal NP]`. + +### update_taxonomy + +**description**: This function uses the APC to update the taxonomy of names aligned to a taxon concept listed in the APC to the currently accepted name for the taxon concept. The aligned_data data frame that is input must contain 5 columns, `originial_name`, `aligned_name`, `taxon_rank`, `taxonomic_dataset`, and `aligned_reason`, the columns output by the function `APCalign::align_taxa()`. The aligned name is a plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function. + +**usage notes**: As the input for this function is a table with 5 columns (output by `align_taxa`), this function will only be used when you explicitly want to separate the `aligment` and `updating` components of APCalign. This function is the second half of `create_taxonomic_update_lookup`. + +**arguments**: + +``` +aligned_data #input table of aligned names and information about the aligned name +taxonomic_splits = "most_likely_species" #options for names with ambiguous taxonomic histories +output = NULL +resources = load_taxonomic_resources() +``` + +**output**: A data frame with rows representing each taxon and columns documenting taxon metadata (*original_name, aligned_name, accepted_name, suggested_name, genus, family, taxon_rank, taxonomic_dataset, taxonomic_status, taxonomic_status_aligned, aligned_reason, update_reason, subclass, taxon_distribution, scientific_name_authorship, taxon_ID, taxon_ID_genus, scientific_name_ID, row_number, number_of_collapsed_taxa*). + +## Diversity and distribution functions + +### create_species_state_origin_matrix + +**description**: This function processes the geographic data available in the APC and returns state level native, introduced and more complicated origins status for all taxa. + +**arguments**: + +``` +resources = load_taxonomic_resources() +``` + +**output**: A data frame with rows representing each species and columns for taxon name and each state . The values in each cell represent the origin of the species in that state. + +### native_anywhere_in_australia + +**description**: This function checks if the given species is native anywhere in Australia according to the APC. Note that this will not detect within-Australia introductions, e.g. if a species is from Western Australia and is invasive on the east coast. + +**arguments**: + +``` +species #input vector of taxon names +resources = load_taxonomic_resources() +``` + +**output**: A data frame with rows representing each taxon and two columns: `species`, which is the same as the unique values of the input `species`, and `native_anywhere_in_aus`, a vector indicating whether each species is native anywhere in Australia, introduced by humans from elsewhere, or unknown with respect to the APC resource. + +### state_diversity_counts + +**description**: This function calculates state-level diversity for native, introduced, and more complicated species origins based on the geographic data available in the APC. + +**arguments**: + +``` +state #state for which diversity should be summarised +resources = load_taxonomic_resources() +``` + +**output**: A data frame with three columns: "origin" indicating the origin of the species, "state" indicating the Australian state or territory, and "num_species" indicating the number of species for that origin and state. + +## Utility functions + +### load_taxonomic_resources + +**description**: This function loads two taxonomic datasets for Australia's vascular plants, the APC and APNI, into the global environment. It accesses taxonomic data from a dataset using the provided version number or the default version. The function creates several data frames by filtering and selecting data from the loaded lists. + +**usage notes**: This function is called by many other APC functions, but is unlikely to be used independently by a APCalign user. + +**arguments**: + +``` +stable_or_current_data = "stable" +version = default_version() +reload = FALSE +``` + +**output**: Several dataframes that include subsets of the APC/APNI based on taxon rank and taxonomic status. + +### standardise_names + +**description**: This function standardises taxon names by performing a series of text substitutions to remove common inconsistencies in taxonomic nomenclature. The function takes a character vector of taxon names as input and returns a character vector of taxon names using standardised taxonomic syntax as output. In particular it standardises taxon rank abbreviations and qualifiers (subsp., var., f.), as people use many variants of these terms. It also standardises or removes a few additional filler words used within taxon names (affinis becomes aff.; s.l. and s.s. are removed). + +**arguments**: + +``` +taxon_names #input vector of taxon names +``` + +**output**: A character vector of standardised taxon names. + +### strip_names + +**description**: Given a vector of taxonomic names, this function removes subtaxa designations ("subsp.", "var.", "f.", and "ser"), special characters (e.g., "-", ".", "(", ")", "?"), and extra whitespace. The resulting vector of names is also converted to lowercase. + +**arguments**: + +``` +taxon_names #input vector of taxon names +``` + +**output**: A character vector of stripped taxonomic names, with subtaxa designations, special characters, and extra whitespace removed, and all letters converted to lowercase. + +### strip_names_2 + +**description**: Given a vector of taxonomic names, this function removes subtaxa designations ("subsp.", "var.", "f.", and "ser"), additional filler words and characters (" x " [hybrid taxa], "sp."), special characters (e.g., "-", ".", "(", ")", "?"), and extra whitespace. The resulting vector of names is also converted to lowercase. + +**arguments**: + +``` +taxon_names #input vector of taxon names +``` + +**output**: A character vector of stripped taxonomic names, with subtaxa designations, special characters, additional filler words and extra whitespace removed, and all letters converted to lowercase. + diff --git a/vignettes/updating-taxon-names.Rmd b/vignettes/updating-taxon-names.Rmd index 39f52ea9..e9194941 100644 --- a/vignettes/updating-taxon-names.Rmd +++ b/vignettes/updating-taxon-names.Rmd @@ -60,11 +60,11 @@ APCalign_outputs_documentation <- ) ``` -# Aligning taxa with APC and APNI +# Aligning taxon names with taxon concepts/names in APC and APNI -XXXX +The following table indicates the rules for each of the 51 separate algorithms sequentially applied to attempt to align each submitted name to a taxon concept in APC or scientific names in APNI. -The following +Note, if the table is truncated on your screen, use horizontal scroll to view the entire table. ```{r, results='show'} match_taxa_documentation %>% @@ -74,14 +74,27 @@ match_taxa_documentation %>% # Updating taxonomy -XXX +The following table indicates the separate functions used to: + +- update aligned names to accepted names in the APC +- add best-practice suggested names to all submitted names +- add identifiers to taxon concepts (in the APC) or scientific names (in the APC or APNI) + +Different functions are used depending on the taxon rank of the aligned name and the taxonomic dataset to which the name was aligned (APC vs APNI). + ```{r, results='show'} update_taxonomy_documentation %>% - my_kable_styling() + my_kable_styling() %>% + kableExtra::add_header_above(c(" " = 1, "categories of aligned names processed" = 4, "columns filled in" = 3)) ``` +-* genus updated to APC accepted genus if possible; ** species or infraspecific taxon name + + # Outputs of APCalign +The following columns are output by the core function `create_taxonomic_update_lookup` and the two component functions `align_taxa` and `update_taxonomy`. + ```{r, results='show'} APCalign_outputs_documentation %>% my_kable_styling()