Freitas, T.M.S., Montag, L.F.A., De Marco Jr., P. & Hortal, J. (2020) How reliable are species identifications in biodiversity big data? Evaluating the records of a Neotropical fish family in online repositories. Systematics & Biodiversity, 18, 181–191. doi:10.1080/14772000.2020.1730473

The increase of free and open online biodiversity databases is of paramount importance for current research in ecology and evolution. However, little attention is paid to using updated taxonomy in these “biodiversity big data” repositories and the quality of their taxonomic information is often questioned. Here we assess how reliable is the current use of nomenclatural classification in the distributional information available from two biodiversity information networks: GBIF and the Brazilian SpeciesLink. We use as a study case the records of Auchenipteridae, a Neotropical fish family that has been subject to recent taxonomical reviews. A data filtering procedure was applied to identify and quantify the inaccuracies in the taxonomical status of the records in three steps: assessment of identification accuracy at the family, genus or species level; current validity of species name; and assignation of inaccurate species records to different categories of classification quality. Synonyms, nonexistent combinations, and outdated combinations were reassigned to currently valid species. A total of 9148 records of Auchenipteridae fishes were analyzed, of which 4165 were from GBIF and 4983 from SpeciesLink, deriving from 46 and 31 sources, respectively. After correcting all possible records following the taxonomic data filtering steps, 6988 records (76.4% of the original) were adequate for describing species distributions, while 2160 remained inaccurate. The most inaccurate records at the species level were due to the use of outdated nomenclatures, resulting in non-valid combinations of species and genus, and synonymy. Our results evidence a large taxonomic inconsistency among records, and, most importantly, that taxonomic information obtained from repositories should be used with caution. Many inaccuracy issues may be embedded in the biodiversity databases’ records, which could lead researchers to provide an incomplete or even mistaken perspective of the variations in the natural world.