Selecting a list of target species is a common task in macroecological and conservation studies. For example, a researcher may seek to model the distribution of arboreal mammals that occur in a particular state.
By applying different filters to the database, anyone can obtain a verified taxonomic list of Brazilian species of animals for any Brazilian state or Country. Additionally, the filter can be applied by phylum, class, order, family, genus, life form, habitat, origin, and taxonomic status. In this vignette, users will learn how to use faunabr package to select a list of species based on these features.
Before you begin, use the load_faunabr
function to load
the data. For more detailed information on obtaining and loading the
data, please refer to Getting
started with faunabr
library(faunabr)
#Folder where you stored the data with the function get_faunabr()
#Load data
bf <- load_faunabr(data_dir = my_dir,
data_version = "latest",
type = "short") #short version
#> Loading version 1.3
One of the primary objectives of this package is to assist in selecting a species list based on taxonomic classification, characteristics and distribution. Specifically, you can filter by:
To explore all available options for each filter, use the
fauna_attributes()
function with the desired attributes.
This function will provide a list of data.frames with the available
options to use in the select_fauna()
function.
#Get available options to filter by lifeForm, habitat and origin
fauna_at <- fauna_attributes(data = bf,
attribute = c("lifeForm", "habitat", "origin"))
head(fauna_at$lifeForm)
#> lifeForm
#> 1 colonial
#> 2 commensal
#> 3 ectoparasite
#> 4 ectoparasiteIDE
#> 5 endoparasite
#> 6 endoparasiteid
head(fauna_at$habitat)
#> habitat
#> 1 arboreal
#> 2 cavernicolous
#> 3 fossorial
#> 4 freshwater
#> 5 hyporheic
#> 6 marine
head(fauna_at$origin)
#> origin
#> 1 cryptogenic
#> 2 domesticated
#> 3 introduced
#> 4 invasive
#> 5 native
As an illustration, let’s consider the scenario where we aim to retrieve a list of all native arboreal insects with confirmed occurrences in Rio de Janeiro:
insects_rj <- select_fauna(data = bf, include_subspecies = FALSE,
phylum = "all",
class = "Insecta", #Select insecs
order = "all",
family = "all",
genus = "all",
lifeForm = "all", filter_lifeForm = "in",
habitat = "arboreal", filter_habitat = "in",
states = "RJ", #Rio de Janeiro
filter_states = "in", #Select IN Rio de Janeiro
country = "all", filter_country = "in",
origin = "native",
taxonomicStatus = "valid")
nrow(insects_rj)
#> [1] 114
The filter returned 114 species that meet the specified criteria. It’s important to note that these selections include species with confirmed occurrences in Rio de Janeiro, and some of them may also have confirmed occurrences in other states:
#First 7 unique values of states in the filtered dataset
unique(insects_rj$states)[1:7]
#> [1] "AC;AM;AP;BA;CE;DF;ES;GO;MA;MG;MS;MT;PA;PE;RJ;RN;RO;RR;SE;SP;TO"
#> [2] "BA;MA;MG;MT;PR;RJ;RS;SC;SP"
#> [3] "ES;MG;PE;PR;RJ;RS;SP"
#> [4] "ES;PR;RJ;SP"
#> [5] "RJ"
#> [6] "RJ;SP"
#> [7] "ES;MG;RJ;RS;SC;SP"
If you wish to select species with confirmed occurrences only Rio de Janeiro, and not in any other state, modify the filter_state parameter to “only”:
insects_rj_only <- select_fauna(data = bf, include_subspecies = FALSE,
phylum = "all",
class = "Insecta", #Select insecs
order = "all",
family = "all",
genus = "all",
lifeForm = "all", filter_lifeForm = "in",
habitat = "arboreal", filter_habitat = "in",
states = "RJ", #Rio de Janeiro
filter_states = "only", #Select ONLY in Rio de Janeiro
country = "all", filter_country = "in",
origin = "native",
taxonomicStatus = "valid")
nrow(insects_rj_only)
#> [1] 22
unique(insects_rj_only$states)
#> [1] "RJ"
Now, the filter has resulted in 22 species, all exclusively confined to Rio de Janeiro.
Furthermore, the package offers the flexibility to combine various
filters (please consult ?select_fauna
for comprehensive
details). For instance, consider the scenario where we aim to compile a
list of native insects with confirmed occurrences in the states of
Paraná (PR), Santa Catarina (SC), and Rio Grande do Sul (RS):
insects_south <- select_fauna(data = bf, include_subspecies = FALSE,
phylum = "all",
class = "Insecta", #Select insecs
order = "all",
family = "all",
genus = "all",
lifeForm = "all", filter_lifeForm = "in",
habitat = "all", filter_habitat = "in",
states = c("PR", "SC", "RS"), #States from southern Brazil
filter_states = "in", #IN any of these states
country = "all", filter_country = "in",
origin = "native",
taxonomicStatus = "valid")
nrow(insects_south)
#> [1] 11461
#First 10 unique values of states in the filtered dataset
unique(insects_south$states)[1:10]
#> [1] "PR;SP"
#> [2] "BA;MT;RJ;RS"
#> [3] "RS"
#> [4] "PR"
#> [5] "PR;RS"
#> [6] "GO;PA;RO;RR;SC"
#> [7] "MG;PR;SP"
#> [8] "GO;MG;PR;RS;SC"
#> [9] "GO;RJ;SC"
#> [10] "AC;AL;AM;AP;BA;CE;DF;ES;GO;MA;MG;MS;MT;PA;PB;PE;PI;PR;RJ;RN;RO;RR;RS;SC;SE;SP;TO"
By utilizing filter_state = “in”, our selection encompassed species occurring in all three states, as well as those appearing in only two or even one of them, in addition to other states. To impose a more rigorous criterion, selecting solely those species with confirmed occurrences in all three states, we can use filter_state = “and”:
insects_south_and <- select_fauna(data = bf, include_subspecies = FALSE,
phylum = "all",
class = "Insecta", #Select insecs
order = "all",
family = "all",
genus = "all",
lifeForm = "all", filter_lifeForm = "in",
habitat = "all", filter_habitat = "in",
states = c("PR", "SC", "RS"), #States from southern Brazil
filter_states = "and", #in PR AND SC AND RS
country = "all", filter_country = "in",
origin = "native",
taxonomicStatus = "valid")
nrow(insects_south_and)
#> [1] 1924
#First 10 unique values of states in the filtered dataset
unique(insects_south_and$states)[1:10]
#> [1] "GO;MG;PR;RS;SC"
#> [2] "AC;AL;AM;AP;BA;CE;DF;ES;GO;MA;MG;MS;MT;PA;PB;PE;PI;PR;RJ;RN;RO;RR;RS;SC;SE;SP;TO"
#> [3] "AC;AM;MG;MS;MT;PA;PE;PR;RO;RS;SC;SP"
#> [4] "AM;BA;CE;DF;ES;GO;MG;MS;MT;PA;PE;PR;RJ;RS;SC;SP"
#> [5] "AC;AM;AP;BA;CE;DF;ES;GO;MG;MS;MT;PA;PE;PR;RJ;RO;RR;RS;SC;SE;SP;TO"
#> [6] "BA;ES;MG;PR;RJ;RS;SC;SP"
#> [7] "AM;DF;ES;GO;PR;RJ;RR;RS;SC;SP"
#> [8] "ES;GO;MG;PR;RJ;RS;SC;SP"
#> [9] "AC;AM;AP;BA;CE;ES;GO;MA;MG;MT;PA;PR;RJ;RO;RS;SC;SP;TO"
#> [10] "GO;MG;MS;PR;RO;RS;SC;SP"
Now, our selection consists solely of species with confirmed occurrences in all of the specified states. However, by utilizing the “and” argument, we permit the filter to include species with occurrences in additional states. To confine the filter exclusively to species with confirmed occurrences in all three states, without any occurrences elsewhere, we can use filter_state = “only”:
insects_south_only <- select_fauna(data = bf, include_subspecies = FALSE,
phylum = "all",
class = "Insecta", #Select insecs
order = "all",
family = "all",
genus = "all",
lifeForm = "all", filter_lifeForm = "in",
habitat = "all", filter_habitat = "in",
states = c("PR", "SC", "RS"), #States from southern Brazil
filter_states = "only", #ONLY in PR, SC and RS
country = "all", filter_country = "in",
origin = "native",
taxonomicStatus = "valid")
nrow(insects_south_only)
#> [1] 134
#The unique state in the filtered dataset
unique(insects_south_only$states)
#> [1] "PR;RS;SC"
Now, the filter return only 134 species, all of them occurring in PR, SC and RS; with no recorded occurrences in any other states.
In addition to selecting a species list based on their characteristics, the package also includes a function for subsetting species by name. Please note that this function exclusively operates with binomial names (Genus + specificEpithet), such as Panthera onca, and does not support complete scientific names (including Author or infraspecificEpithet), such as Panthera onca (Linnaeus, 1758).
If you have species with complete scientific names, you can extract
the binomial names using the function get_binomial()
. Don’t
worry about excessive spaces between words; the function will remove any
extra spaces from the names.
complete_names <- c("Pantera onça (Linnaeus, 1758)",
"Zonotrichia capensis subtorquata Swainson, 1837",
"Mazama bororo Duarte, 1996",
"Mazama jucunda Thomas, 1913",
"Arrenurus tumulosus intercursor",
"Araucaria angustifolia")
#Panthera onca with typos to illustrate how the next function corrects it
#Araucaria angustifolia (a Plant) was used just as an example that will be used to illustrate the
#next function
binomial_names <- extract_binomial(species_names = complete_names)
binomial_names
#> [1] "Pantera onça" "Zonotrichia capensis"
#> [3] "Mazama bororo" "Mazama jucunda"
#> [5] "Arrenurus tumulosus" "Araucaria angustifolia"
Additionally, you can verify the spelling, nomenclatural status, and taxonomic status of species names using the check_fauna_names() function. If the function is unable to locate the name of a species in the database (due to a typo, for example), it can suggest potential names based on similarities to other entries in the database. To see how the function works, let’s utilize the previously created binomial_names dataset:
#Create example
checked_names <- check_fauna_names(data = bf,
species = binomial_names,
include_subspecies = FALSE,
max_distance = 0.1)
tibble::tibble(checked_names) #print data.frame as tibble
#> # A tibble: 6 × 8
#> input_name Spelling Suggested_name Distance taxonomicStatus validName #> family
#> <chr> <chr> <chr> <dbl> <chr> <chr> #> <chr>
#> 1 Zonotrichia capensis Correct Zonotrichia c… 0 valid Zonotrichia… #> Passe…
#> 2 Mazama bororo Correct Mazama bororo 0 synonym Mazama jucu… #> Cervi…
#> 3 Mazama jucunda Correct Mazama jucunda 0 valid Mazama jucu… #> Cervi…
#> 4 Arrenurus (Incertae… Correct Arrenurus tum… 0 valid Arrenurus t… #> Arren…
#> 5 Pantera onça Probabl… Panthera onca 2 valid Panthera on… #> Felid…
#> 6 Araucaria angustifo… Not_fou… NA NA NA NA NA
We can see that Zonotrichia capensis, Mazama jucunda, and Arrenurus tumulosus are spelling correctly and are valid names. Mazama bororo is spelling correctly, but it is a synonym of Mazama jucunda. In the case of Pantera onça, the spelling appears to be potentially incorrect (as the name wasn’t found in the database); however, a similar name, Panthera onca, is suggested by the function. The ‘Distance’ column indicates the Levenshtein edit distance between the input and the suggested name. The spelling of Araucaria angustifolia was flagged as Not_found (as expected, given that Araucaria angustifolia is not an animal!). Consequently, the name was not located in the database, and there were no comparable names available.
To retrieve species information from the Fauna do Brasil database,
employ the subset_species()
function. For optimal
performance, we highly recommend utilizing the
get_binomial()
and check_names()
functions
beforehand. This ensures that you’re exclusively working with species
present in the Fauna do Brasil database. To see how the function works,
let’s use the valid names in checked_names created
previously:
#Get only valid names
valids <- unique(checked_names$validName)
valids <- na.omit(valids) #Remove NA
#Subset species
my_sp <- subset_fauna(data = bf, species = valids,
include_subspecies = FALSE)
tibble::tibble(my_sp) #print data.frame as tibble
#> # A tibble: 4 × 19
#> species subspecies scientificName validName kingdom phylum class order family genus lifeForm habitat states countryCode #> origin
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> <chr>
#> 1 Panthera onca "" Panthera onca… "" Animal… Chord… Mamm… Carn… Felid… Pant… "free_l… "terre… AC;AM… argentina;… #> "nati…
#> 2 Arrenurus (I… "" Arrenurus (In… "" Animal… Arthr… Arac… Trom… Arren… Arre… "" "" NA brazil ""
#> 3 Mazama jucun… "" Mazama jucund… "" Animal… Chord… Mamm… Arti… Cervi… Maza… "herbiv… "terre… BA;ES… brazil #> "nati…
#> 4 Zonotrichia … "" Zonotrichia c… "" Animal… Chord… Aves Pass… Passe… Zono… "free_l… "" AL;AP… brazil #> "nati…
#> # ℹ 4 more variables: taxonomicStatus <chr>, nomenclaturalStatus <chr>, vernacularName <chr>, taxonRank <chr>
We can also include subspecies and/or varieties:
my_sp2 <- subset_fauna(data = bf, species = valids,
include_subspecies = TRUE)
tibble::tibble(my_sp2) #print data.frame as tibble
#> # A tibble: 14 × 19
#> species subspecies scientificName validName kingdom phylum class order family genus lifeForm habitat states countryCode #> origin
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> <chr>
#> 1 Panthera on… "" Panthera onca… "" Animal… Chord… Mamm… Carn… Felid… Pant… "free_l… "terre… AC;AM… argentina;… #> "nati…
#> 2 Arrenurus (… "" Arrenurus (In… "" Animal… Arthr… Arac… Trom… Arren… Arre… "" "" NA brazil "" #>
#> 3 Arrenurus (… "Arrenuru… Arrenurus (In… "" Animal… Arthr… Arac… Trom… Arren… Arre… "" "" NA brazil "" #>
#> 4 Arrenurus (… "Arrenuru… Arrenurus (In… "" Animal… Arthr… Arac… Trom… Arren… Arre… "" "" NA brazil "" #>
#> 5 Mazama jucu… "" Mazama jucund… "" Animal… Chord… Mamm… Arti… Cervi… Maza… "herbiv… "terre… BA;ES… brazil #> "nati…
#> 6 Zonotrichia… "" Zonotrichia c… "" Animal… Chord… Aves Pass… Passe… Zono… "free_l… "" AL;AP… brazil #> "nati…
#> 7 Zonotrichia… "Zonotric… Zonotrichia c… "" Animal… Chord… Aves Pass… Passe… Zono… "free_l… "" AP brazil #> "nati…
#> 8 Zonotrichia… "Zonotric… Zonotrichia c… "" Animal… Chord… Aves Pass… Passe… Zono… "free_l… "" RR brazil #> "nati…
#> 9 Zonotrichia… "Zonotric… Zonotrichia c… "" Animal… Chord… Aves Pass… Passe… Zono… "free_l… "" RR brazil #> "nati…
#> 10 Zonotrichia… "Zonotric… Zonotrichia c… "" Animal… Chord… Aves Pass… Passe… Zono… "free_l… "" BA;MA… brazil #> "nati…
#> 11 Zonotrichia… "Zonotric… Zonotrichia c… "" Animal… Chord… Aves Pass… Passe… Zono… "free_l… "" PA brazil #> "nati…
#> 12 Zonotrichia… "Zonotric… Zonotrichia c… "" Animal… Chord… Aves Pass… Passe… Zono… "free_l… "" AP;PA… brazil #> "nati…
#> 13 Zonotrichia… "Zonotric… Zonotrichia c… "" Animal… Chord… Aves Pass… Passe… Zono… "free_l… "" ES;MG… brazil #> "nati…
#> 14 Zonotrichia… "Zonotric… Zonotrichia c… "" Animal… Chord… Aves Pass… Passe… Zono… "free_l… "" TO brazil #> "nati…
#> # ℹ 4 more variables: taxonomicStatus <chr>, nomenclaturalStatus <chr>, vernacularName <chr>, taxonRank <chr>
#>
We can retrieve all synonyms of a species list. This can be particularly useful, for example, when searching for records of a species and all its synonyms (as listed in the Fauna do Brasil) in online databases like GBIF. To accomplish this, utilize the function fauna_synonym. To understand how the function works, let’s search for the synonyms of three species:
spp <- c("Panthera onca", "Mazama jucunda", "Subulo gouzoubira")
spp_syn <- fauna_synonym(data = bf, species = spp)
spp_syn
#> validName synonym taxonomicStatus
#> 49343 Panthera onca <NA> valid
#> 60523 Mazama jucunda Mazama bororo synonym
#> 61168 Subulo gouzoubira Mazama gouazoubira synonym
We can see that Mazama jucunda and Subulo gouzoubira have synonyms, Mazama bororo and Mazama gouazoubira, respectively. Panthera onca does not have any synonyms.