check_names
checks if the species names are correct and searches
for suggestions if the name is misspelled or not found in the Flora e Funga
do Brasil database
match_names
finds approximate matches to the specified pattern (species)
within each element of the string x
(species_to_match). It is used internally
by check_names
.
check_names(data, species, max_distance = 0.1,
include_subspecies= FALSE, include_variety = FALSE,
kingdom = "Plantae", parallel = FALSE, ncores = 1,
progress_bar = FALSE)
match_names(
species,
species_to_match,
max_distance = 0.1,
parallel = FALSE,
ncores = 1,
progress_bar = FALSE
)
(data.frame) the data.frame imported with the
load_florabr
function.
(character) names of the species to be checked.
(numeric) Maximum distance (as a fraction) allowed for
searching suggestions when the name is misspelled. It can be any value
between 0 and 1. The higher the value, the more suggestions are returned.
For more details, see agrep
. Default = 0.1.
(logical) whether to include subspecies. Default = FALSE
(logical) whether to include varieties. Default = FALSE
(character) the kingdom to which the species belong. It can be "Plantae" or "Fungi". Default = "Plantae".
(logical) whether to run in parallel. Setting this to TRUE
is recommended for improved performance when working with 100 or more species.
(numeric) number of cores to use for parallel processing.
Default is 1. This is only applicable if parallel = TRUE
.
(logical) whether to display a progress bar during processing. Default is FALSE
(character) a vector of species names to match against
the species
parameter.
a data.frame with the following columns:
input_name: the species names informed in species argument
Spelling: indicates if the species name is Correct (a perfect match with a species name in the Flora e Funga do Brasil), Probably_incorrect (partial match), or Not_found (no match with any species).
Suggested name: If Spelling is Correct, it is the same as the input_name. If Spelling is Probably_correct, one or more suggested names are listed, found according to the maximum distance. If Spelling is "Not_found", the value is NA.
Distance: The integer Levenshtein edit distance. It represents the number of single-character edits (insertions, deletions, or substitutions) required to transform the input_name into the Suggested_name.
taxonomicStatus: the taxonomic status of the species name ("Accepted" or "Synonym").
nomenclaturalStatus: the nomenclatural status of the species name. This information is not available for all species.
acceptedName: If the species name is not accepted or incorrect, the accepted name of the specie. If the species name is accepted and correct, the same as input_name and Suggested_name.
family: the family of the specie.
Flora e Funga do Brasil. Jardim Botânico do Rio de Janeiro. Available at: http://floradobrasil.jbrj.gov.br/
data("bf_data", package = "florabr")
spp <- c("Butia cattarinensis", "Araucaria angustifolia")
check_names(data = bf_data, species = spp)
#> input_name Spelling Suggested_name Distance
#> 1 Araucaria angustifolia Correct Araucaria angustifolia 0
#> 2 Butia cattarinensis Probably_incorrect Butia catarinensis 1
#> taxonomicStatus nomenclaturalStatus acceptedName family
#> 1 Accepted Correct Araucaria angustifolia Araucariaceae
#> 2 Accepted <NA> Butia catarinensis Arecaceae