Tessarolo, G., Lobo, J. M., Rangel, T. F., & Hortal, J. (2021). High uncertainty in the effects of data characteristics on the performance of species distribution models. Ecological Indicators, 121, 107147. doi:10.1016/j.ecolind.2020.107147


• Species Distribution Models’ accuracy varies with species and data characteristics.
• We assessed SDM performance using dung beetle species distributions in Madrid.
• The data and geographic distributions characteristics affect more the SDM accuracy.
• Stacked SDMs of species varying in distribution and data features can be flawed.

Abstract Species distribution models (SDM) are widely used as indicators of different aspects of geographical ranges for many purposes, from conservation to biogeographical and evolutionary analyses. However, these techniques are susceptible to various sources of uncertainty. Data coverage, species’ ecology, and the characteristics of their geographic distributions can affect SDM results, often generating critical errors in predicted distribution maps. We assess the influence of data quality, the characteristics of species distributions, and ecological traits on SDM performance. We predict the distributions of dung beetle species in Madrid region (central Spain) using six SDM techniques and validate them on an independent dataset. We relate variations in model performance with environmental completeness, data characteristics, and species traits through a partial least squares analysis. In this analysis, body size, nesting behaviour, marginality, rarity, data prevalence, Relative Occurrence Area (ROA), range size, niche breadth, and completeness are used as predictors of six assessment metrics (sensitivity, specificity, kappa, TSS, CCR, and AUC). Marginality and data prevalence were the variables that most influenced SDM performance, followed by range size, ROA, and niche breadth: species presenting higher marginality and data prevalence, and smaller ROA and niche breadth were associated with better models. Nesting behaviour, rarity, niche completeness, and body size had minor importance for SDM performance. Our results highlight the importance of taking species’ and data characteristics into account when modelling and comparing large groups of species using SDM. This implies that estimates of species richness and composition based on stacked SDMs can show high levels of error if they are constructed for groups of species with diverse ecological traits and types of geographic distributions. We suggest that the species holding characteristics that lead to poor SDM performance should not be included when constructing composite biodiversity variables. Further effort is needed to develop SDM methodologies and protocols that account for such source of uncertainty.