This archive is to comply with additional NERC requirements to host data on the EIDC. Note that the full archive of code and extra files is archived on Zenodo here: https://zenodo.org/doi/10.5281/zenodo.4745553
It contains analysis scripts and raw data to support Terry et al. (2021) No pervasive relationship between species size and local abundance trends. Nat Ecol Evol 6, 140–144 (2022). https://doi.org/10.1038/s41559-021-01624-8 which should be consulted for full background and methods. The text given below is paraphrased from the methods section of the paper.
We downloaded all studies available in the ‘open’ component of the BioTIME database of community time series23 from https://doi.org/10.5281/zenodo.3265871. BioTIME contains observations from both fixed plots (repeat measures from the same set of specific localized sites) and from wide-ranging surveys and transects that may not necessarily precisely align year on year. We followed previous approaches and first identified studies as ‘multi-site’ or ‘single-site’ based on the number of coordinates in the BioTIME database. Single-site studies were considered as one combined assemblage, whilst widely dispersed ‘multi-site’ studies were portioned into assemblages based on a global hexagonal grid of 96km2 cells using dggridR. We retained records from assemblages with abundance or biomass data of at least 10 distinct species and at least 5 years between the first and last record.
Although the majority of the records are identified with binomials to species level, a portion of the records in the BioTIME database are labelled only at higher taxonomic levels. For simplicity, we refer to all distinct names as ‘species’. We identified uninformative labels (for example ‘spA’, ‘unidentified’, ‘Miscellaneous’, ‘larvae’, ‘grass’), and common names (mostly birds) were converted to binomials using the Encyclopaedia of Life tool via the taxize R package followed by manual inspection based on study location and species distribution where multiple options were presented. We excluded studies where the species are listed using codes. Informative names were standardized against the Global Biodiversity Information Facility name backbone using ‘taxize’. The dominant kingdom represented in each study was used to distinguish homonyms. Where BioTIME included only a genus-level identification, we matched these to genus-level size trait values listed in trait databases. Where BioTIME only included taxonomic information of higher rank than genus, we did not attempt to match the traits.
We used four separate trait databases that include some measure of organism size, but we did not mix information between databases. For amniotes, the life history database was downloaded from https://doi.org/10.6084/m9.figshare.c.3308127.v1 from which we used the ‘adult_body_mass_g’ field. For plants, we downloaded from the TRY database (https://www.try-db.org/) all records of ‘seed dry mass’ (trait 26) and ‘plant height vegetative’ (trait 3106). We grouped these by accepted species name, and calculated the mean of the log10(seedmass) values and the maximum observed height. We did not assign a value when the standard deviation of log10(seedmass) values was greater than 1. For fish, we downloaded a curated database of fish traits from https://store.pangaea.de/Publications/Beukhof-etal_2019/TraitCollectionFishNAtlanticNEPacificContShelf.xlsx, which in turn is largely based on data from the FishBase database. It is focused on the North Atlantic and Pacific continental shelf, but this represents the majority of the relevant BioTIME studies. It includes values for both genus and species level. We used maximum length, and when there were multiple values for a particular species, we took an average. For marine species, we downloaded size data from the WoRMS database. Aphia identifications (IDs) for all the species in our assemblages (excluding plants and fungi) were identified and used to download all attributes associated with these IDs held on WoRMS using the ‘worrms’ R package. Quantitative ‘body size’ measurements of length were scaled to millimetre units. We discarded values from stages other than adults, and values corresponding to minimums or thicknesses, then took a mean, except where the values differed by over an order of magnitude, which we discarded. Qualitative body sizes listed on WoRMS are divided into four categories (<0.2mm, 0.2–2mm, 2–200mm, >200mm), that were carried forwards as simple numbers (1–4). Data not from adults were discarded, and where an ID was associated with multiple distinct size categories, it was discarded.
We assessed each assemblage–trait combination where ≥40% and ≥5 of the species had data for that trait and >80% of year samples contained at least 5 species. We excluded transitory species within each assemblage by including only those species that were seen in over half of the year samples. Where this filtering left data from less than 1% of the cells in the original study, we removed the whole study. Where a study included both ‘abundance’ and ‘biomass’ data, we preferentially used the abundance data. Studies with only presence–absence data were not used.
Where a species’ time series included repeated trailing or leading zeros, these were cut to one to avoid artificial flattening of the slope. The totals for each species were square-root transformed, then scaled to a mean of 0 and a standard deviation of 1. We fit an ordinary least-squares regression model through the transformed population series against year for each species in the assemblage. The set of slopes (β) of these linear models within each dataset summarized the relative change in abundance of each species in the assemblage through time. Very small β values (<10−5), caused by model fitting errors when there is no change in rank abundance, were set to 0 to avoid spurious rankings. The main response variable τ for each assemblage was then computed as Kendall’s rank correlation coefficient between size trait values and the set of βs. Species with missing trait values were excluded from the calculation of τ. Where there were multiple assemblages per study, study-level τ was taken as a simple arithmetic mean of all assemblage-level τ values.
We also test two alternative transformations of the population data: (1) A ranking approach where, within each year, all n species in the assemblage were assigned relative ranks (from 1 for the highest to 1/n for the lowest) by their abundance or biomass depending on the fields available in BioTIME. Ties were averaged, and where a species was not observed in a particular year, it was assigned a rank of zero for that year. (2) Transformation by dividing each population time series by its mean value.
To examine study-level determinates of τ within each size trait, for each study we calculated: (1) the mean total species richness of each assemblage over the time frame, (2) the mean assemblage-level trait data completeness, (3) the mean number of years from which there were data, (4) the mean span of years from which there were data, (5) the log10-transformed number of assemblages within the study (that is, the spatial extent), (6) the absolute latitude of the centre of the study and (7) the range of traits in the assemblage (log10(max)-log10(min)). We fitted a set of linear models to assess whether these factors could predict either τ or τ2.
All analysis used the R language, and scripts are included in the KnittedScripts folder.
Core data files that are archived elsewhere (the BioTIME database of community dynamics, the database of amniote life history traits, the fish database, as detailed above) are not re-hosted here. Equally we do not include the raw trait data downloaded from TRY or WoRMS.
See data structure - no new data here.
See collection approach above - cleaning and filtering of source databases was applied in several steps.
HTMLs detailing the analyses are included to give some more background.
bt_names_all_traits.csv
is a relation table linking the
tidied name in the BioTime database with the matches in the trait
databases.
Columns:
TidyBTName
Tidy Name in BioTime database (used as
key)canonicalName
name sources from GBIFrank
Taxonomic rank of namekingdom
Kingdom of speciescommon_name
Common name of speciesAphiaID
ID from Aphia marine databaseAphia_scientificname
Name from Aphia databaseTRY_AccSpeciesName
Accepted Name from TRY plant
databaseCore Traits used:
TR_BodyLength_mm
Bodylength in mm (marine
species)
TR_QualitativeBodySize
Qualitative body size (1-4)
for marine species
TR_Mean_LengthMax
Body length (cm) (fish)
TR_adult_body_mass_g
Body mass (g)
TR_Mean_SeedMass
Seed mass (log mg) from
TRY
TR_Max_Height
Tree max height (m) from TRY
TRY_datasetIDs_SeedMass
References for seed mass
(See: https://www.try-db.org/de/Datasets.php)
TRY_datasetIDs_Height
References for plant heights
https://www.try-db.org/de/Datasets.php)
There are two tables of results:
AllThree_All.csv
is a large table detailing the species
population trends as measured in three different ways (see paper). Data
is grouped by the trait value used and STUDY_ID_CELL.
TidyBTName
Tidy Name in BioTime database (used as
key)D19_slope
Main population trend, trimmed,
sqrt-transfomred, then scaled (D19 = following Dornelas et al 2019 https://doi.org/10.1111/ele.13242 approach)D19_pvalue
Main population trend significanceD19_StdErr
Main population trend standard errorElas_slope
Alternative population trend standardising
by dividing by the mean population valuesElas_pvalue
Alternative population trend
significanceElas_StdErr
Alternative population trend standard
errorN_times_observed
The number of times the species was
observed in the communityN_used
Number of data points usedSlope
Relative rank change through timeMeanRank
Mean rank of species abundance through
timeMeanAbsRankChange
Absulte change in rank through
time.trait_value
Body size Trait valueRelTraitRank
Relative trait rank comapred to the rest
of the communitytrait
Trait used (secondary key)STUDY_ID_CELL
Study ID and Cell (secondary key)Study_Corr_Predictors.csv
details the final ‘\(\tau\)’ values associated with each
trait-study combination as presented in the principal figures. The
potential predictors are also listed
STUDY_ID
Study ID (Key)Trait
Trait categoryLogCells
Log10 of number of cells the study was split
intoMean_Sp_div
Mean species diversity per cell per
timeMean_N_Years
Mean number of years of data per cellMean_Completeness
Proprtion of years with data between
start and endMean_YearRange
Numebr of years from first to last year
with dataPROTECTED_AREA
Binary, is it a protected area according
to the BioTIME metadataGRAIN_SQ_KM_Log
What is the grain of the studyAbs_Lat
Absolute latitude.