Ronquillo, C., Stropp, J., Medina, N.G. & Hortal, J. (2023) Exploring the impact of data curation criteria on the observed geographical distribution of mosses. Ecology and Evolution, 13, e10786. doi:10.1002/ece3.10786

Biodiversity data records contain inaccuracies and biases. To overcome this limitation and establish robust geographic patterns, ecologists often curate records keeping those that are most suitable for their analyses. Yet, this choice is not straightforward and the outcome of the analysis may vary due to a trade-off between data quality and volume. This problem is particularly recurrent for less-studied groups with patchy sampling effort. The latitudinal pattern of mosses richness remains inconsistent across studies and these may emerge purely from sampling artefacts. Our main objective here is to assess the effect of different curation criteria on this spatial pattern in the Temperate Northern Hemisphere (above 20° latitude). We contrasted the geographical distribution of moss species records and the latitude-species richness relation obtained under different data curation scenarios. These scenarios comprehend five sources of taxonomical standardisations and eight data cleaning filters. The analyses are based on the selection of well-surveyed cells at 100 km cell resolution. The application of some ‘data curation scenarios’ severely affects the number of records selected for analysis and substantially changes the proportion of richness per cell. The sensitivity to data curation becomes detectable at regional and at the cell scales showing a large shift in the latitudinal richness peak in Europe, from 60° N to 45° N latitude, when only preserved specimens are selected and duplicates based on date of collection and coordinates are excluded. Our results stress the importance of justifying the criteria used for filtering biodiversity data retrieved from biodiversity databases to avoid detecting misleading patterns. Curating records under particular criteria compromises the information in some areas displaying different spatial information of mosses. This problem can be ameliorated if data filtering is combined with identifying well-surveyed cells, render relatively constant results under different combinations of filtering even for less well-known groups such as mosses.