Doctoral research summary and conclusions
I used a data-intensive approach to assessing the species-abundance distribution. To accomplish this, I ompiled a new community abundance dataset to address taxonomic and distribution gaps in the available data suitable for macroecological questions. I compiled data from the literature for seven under-represented classes of animals, amphibians, spiders, beetles, reptiles, birds, and ray finned and cartilaginous fish. The database contains over 2000 species and more than 1.3 million individuals from locations on all continents except Antarctica over 700 sites. I made the data publicly available prior to completion of my dissertation, and now that the dissertation is complete, have submitted it for publication in Ecological Archives. I then combined the compiled dataset with currently existing publicly available community abundance datasets for birds, mammals, trees, and butterflies to create a final data set containing over 16,000 communities for nine taxonomic groups.
I then used this data to test a variety of models of the species abundance distribution, one of the oldest and most well studied patterns in ecology. Despite the extensive study of this pattern, it remains an open question as to whether the pattern contains enough information to allow the operation of biological processes to be inferred from its shape. Using a maximum likelihood approach, I tested five species abundance distribution models from four classes. I tested two purely statistical models, the logseries and the Poisson lognormal. For process-based models, I tested the Zipf distribution, a branching process model, the negative binomial distribution, a population dynamics model, and a niche partitioning model, the geometric series.
In general, I found that it is difficult to infer process from species abundance distributions alone. Part of the difficulty in identifying pattern generating mechanisms from species abundance distributions is due to the fact that multiple mechanisms have been proposed for each formulation of the species abundance distribution. In other words, it is possible for different processes to yield exactly equivalent models.
Early tests of neutral theory compared the fit of empirical species abundance distributions to the neutral prediction, but later tests suggested that species abundance comparisons alone were insufficient for a rigorous test of neutrality. However, recent work suggests that subsuming some of these differences into broad categories such as neutral or non-neutral may make it possible to draw inferences on general categories of models.
Following prior work done by Connolly et al. (2014), I used the Poisson lognormal as the non-neutral model, and the negative binomial as the neutral model. The Poisson lognormal is a classic model of species abundance distributions, which makes it a good choice for the non-neutral model. While there are many different neutral models, all of them share the negative binomial distribution as the local community prediction. I used the same data and maximum likelihood approach as for the multi-model comparison.
My results suggested that it may be difficult to distinguish among even these broad categories of models and their associated distributions, at least in terrestrial systems. This suggests that, in terrestrial systems, there may not be one single suite of processes that have equal importance in all communities, i.e., non-neutral processes may be more important in some communities, but not in others.
An additional outcome of using a data-driven approach with a large compilation of species-abundance distributions was the mitigation of a potentially important confounding factor when trying to identify pattern generating mechanisms: non-biological variation among samples (sampling intensity, spatial scale, etc.) vs. biological differences. Applying data from different taxonomic groups and from different geographic regions assists in removing some of the uncertainty relating to non-biological data differences through covering a range of sampling intensities and scales of collection. When results are consistent across datasets using very different sampling approaches, as they were in this study, it provides confidence that methodological differences were not crucial in determining the results. Thus, the agreement in results among these different datasets strongly suggests that biological differences exist between marine vs. terrestrial in the dominance of non-neutral processes. Without a breadth of data in my study, it would have been difficult to differentiate biological from non-biological differences in the systems.