EIDC Archive version

Introduction

This archive is to comply with additional NERC requirements to host data on the EIDC. Note that the full archive of code and extra files is archived on Zenodo here: https://zenodo.org/doi/10.5281/zenodo.4745553

It contains analysis scripts and raw data to support Terry et al. (2021) No pervasive relationship between species size and local abundance trends. Nat Ecol Evol 6, 140–144 (2022). https://doi.org/10.1038/s41559-021-01624-8 which should be consulted for full background and methods. The text given below is paraphrased from the methods section of the paper.

Collection/generation methods

Generating assemblage time series

We downloaded all studies available in the ‘open’ component of the BioTIME database of community time series23 from https://doi.org/10.5281/zenodo.3265871. BioTIME contains observations from both fixed plots (repeat measures from the same set of specific localized sites) and from wide-ranging surveys and transects that may not necessarily precisely align year on year. We followed previous approaches and first identified studies as ‘multi-site’ or ‘single-site’ based on the number of coordinates in the BioTIME database. Single-site studies were considered as one combined assemblage, whilst widely dispersed ‘multi-site’ studies were portioned into assemblages based on a global hexagonal grid of 96km2 cells using dggridR. We retained records from assemblages with abundance or biomass data of at least 10 distinct species and at least 5 years between the first and last record.

Cleaning names

Although the majority of the records are identified with binomials to species level, a portion of the records in the BioTIME database are labelled only at higher taxonomic levels. For simplicity, we refer to all distinct names as ‘species’. We identified uninformative labels (for example ‘spA’, ‘unidentified’, ‘Miscellaneous’, ‘larvae’, ‘grass’), and common names (mostly birds) were converted to binomials using the Encyclopaedia of Life tool via the taxize R package followed by manual inspection based on study location and species distribution where multiple options were presented. We excluded studies where the species are listed using codes. Informative names were standardized against the Global Biodiversity Information Facility name backbone using ‘taxize’. The dominant kingdom represented in each study was used to distinguish homonyms. Where BioTIME included only a genus-level identification, we matched these to genus-level size trait values listed in trait databases. Where BioTIME only included taxonomic information of higher rank than genus, we did not attempt to match the traits.

Trait data

We used four separate trait databases that include some measure of organism size, but we did not mix information between databases. For amniotes, the life history database was downloaded from https://doi.org/10.6084/m9.figshare.c.3308127.v1 from which we used the ‘adult_body_mass_g’ field. For plants, we downloaded from the TRY database (https://www.try-db.org/) all records of ‘seed dry mass’ (trait 26) and ‘plant height vegetative’ (trait 3106). We grouped these by accepted species name, and calculated the mean of the log10(seedmass) values and the maximum observed height. We did not assign a value when the standard deviation of log10(seedmass) values was greater than 1. For fish, we downloaded a curated database of fish traits from https://store.pangaea.de/Publications/Beukhof-etal_2019/TraitCollectionFishNAtlanticNEPacificContShelf.xlsx, which in turn is largely based on data from the FishBase database. It is focused on the North Atlantic and Pacific continental shelf, but this represents the majority of the relevant BioTIME studies. It includes values for both genus and species level. We used maximum length, and when there were multiple values for a particular species, we took an average. For marine species, we downloaded size data from the WoRMS database. Aphia identifications (IDs) for all the species in our assemblages (excluding plants and fungi) were identified and used to download all attributes associated with these IDs held on WoRMS using the ‘worrms’ R package. Quantitative ‘body size’ measurements of length were scaled to millimetre units. We discarded values from stages other than adults, and values corresponding to minimums or thicknesses, then took a mean, except where the values differed by over an order of magnitude, which we discarded. Qualitative body sizes listed on WoRMS are divided into four categories (<0.2mm, 0.2–2mm, 2–200mm, >200mm), that were carried forwards as simple numbers (1–4). Data not from adults were discarded, and where an ID was associated with multiple distinct size categories, it was discarded.

Abundance change–trait correlation

We assessed each assemblage–trait combination where ≥40% and ≥5 of the species had data for that trait and >80% of year samples contained at least 5 species. We excluded transitory species within each assemblage by including only those species that were seen in over half of the year samples. Where this filtering left data from less than 1% of the cells in the original study, we removed the whole study. Where a study included both ‘abundance’ and ‘biomass’ data, we preferentially used the abundance data. Studies with only presence–absence data were not used.

Where a species’ time series included repeated trailing or leading zeros, these were cut to one to avoid artificial flattening of the slope. The totals for each species were square-root transformed, then scaled to a mean of 0 and a standard deviation of 1. We fit an ordinary least-squares regression model through the transformed population series against year for each species in the assemblage. The set of slopes (β) of these linear models within each dataset summarized the relative change in abundance of each species in the assemblage through time. Very small β values (<10−5), caused by model fitting errors when there is no change in rank abundance, were set to 0 to avoid spurious rankings. The main response variable τ for each assemblage was then computed as Kendall’s rank correlation coefficient between size trait values and the set of βs. Species with missing trait values were excluded from the calculation of τ. Where there were multiple assemblages per study, study-level τ was taken as a simple arithmetic mean of all assemblage-level τ values.

We also test two alternative transformations of the population data: (1) A ranking approach where, within each year, all n species in the assemblage were assigned relative ranks (from 1 for the highest to 1/n for the lowest) by their abundance or biomass depending on the fields available in BioTIME. Ties were averaged, and where a species was not observed in a particular year, it was assigned a rank of zero for that year. (2) Transformation by dividing each population time series by its mean value.

To examine study-level determinates of τ within each size trait, for each study we calculated: (1) the mean total species richness of each assemblage over the time frame, (2) the mean assemblage-level trait data completeness, (3) the mean number of years from which there were data, (4) the mean span of years from which there were data, (5) the log10-transformed number of assemblages within the study (that is, the spatial extent), (6) the absolute latitude of the centre of the study and (7) the range of traits in the assemblage (log10(max)-log10(min)). We fitted a set of linear models to assess whether these factors could predict either τ or τ2.

All analysis used the R language, and scripts are included in the KnittedScripts folder.

Original Data Sources and Tables

Core data files that are archived elsewhere (the BioTIME database of community dynamics, the database of amniote life history traits, the fish database, as detailed above) are not re-hosted here. Equally we do not include the raw trait data downloaded from TRY or WoRMS.

Nature and Units of recorded values

See data structure - no new data here.

Quality control

See collection approach above - cleaning and filtering of source databases was applied in several steps.

Details of data structure:

Scripts

HTMLs detailing the analyses are included to give some more background.

  1. Generates assemblages from the raw BioTIME data by grouping dispersed studies into cells
  2. Cleans the names assigned to records in BioTIME by cross-referencing with GBIF taxonomic backbone
  3. Gathers trait data for these species from a variety of sources
  4. Conducts the analyses described in the paper

Trait Tables

bt_names_all_traits.csv is a relation table linking the tidied name in the BioTime database with the matches in the trait databases.

Columns:

  • TidyBTName Tidy Name in BioTime database (used as key)
  • canonicalName name sources from GBIF
  • rank Taxonomic rank of name
  • kingdom Kingdom of species
  • common_name Common name of species
  • AphiaID ID from Aphia marine database
  • Aphia_scientificname Name from Aphia database
  • TRY_AccSpeciesName Accepted Name from TRY plant database

Core Traits used:

  • TR_BodyLength_mm Bodylength in mm (marine species)

  • TR_QualitativeBodySize Qualitative body size (1-4) for marine species

  • TR_Mean_LengthMax Body length (cm) (fish)

  • TR_adult_body_mass_g Body mass (g)

  • TR_Mean_SeedMass Seed mass (log mg) from TRY

  • TR_Max_Height Tree max height (m) from TRY

  • TRY_datasetIDs_SeedMass References for seed mass (See: https://www.try-db.org/de/Datasets.php)

  • TRY_datasetIDs_Height References for plant heights https://www.try-db.org/de/Datasets.php)

Results Tables

There are two tables of results:

AllThree_All.csv is a large table detailing the species population trends as measured in three different ways (see paper). Data is grouped by the trait value used and STUDY_ID_CELL.

  • TidyBTName Tidy Name in BioTime database (used as key)
  • D19_slope Main population trend, trimmed, sqrt-transfomred, then scaled (D19 = following Dornelas et al 2019 https://doi.org/10.1111/ele.13242 approach)
  • D19_pvalue Main population trend significance
  • D19_StdErr Main population trend standard error
  • Elas_slope Alternative population trend standardising by dividing by the mean population values
  • Elas_pvalue Alternative population trend significance
  • Elas_StdErr Alternative population trend standard error
  • N_times_observed The number of times the species was observed in the community
  • N_used Number of data points used
  • Slope Relative rank change through time
  • MeanRank Mean rank of species abundance through time
  • MeanAbsRankChange Absulte change in rank through time.
  • trait_value Body size Trait value
  • RelTraitRank Relative trait rank comapred to the rest of the community
  • trait Trait used (secondary key)
  • STUDY_ID_CELL Study ID and Cell (secondary key)

Study_Corr_Predictors.csv details the final ‘\(\tau\)’ values associated with each trait-study combination as presented in the principal figures. The potential predictors are also listed

  • STUDY_ID Study ID (Key)
  • Trait Trait category
  • LogCells Log10 of number of cells the study was split into
  • Mean_Sp_div Mean species diversity per cell per time
  • Mean_N_Years Mean number of years of data per cell
  • Mean_Completeness Proprtion of years with data between start and end
  • Mean_YearRange Numebr of years from first to last year with data
  • PROTECTED_AREA Binary, is it a protected area according to the BioTIME metadata
  • GRAIN_SQ_KM_Log What is the grain of the study
  • Abs_Lat Absolute latitude.

Authorship, Reuse and Licensing

Original code in this repository was written by Chris Terry while at the Queen Mary University of London, with some parts derived from other authors as mentioned in-line.

Much of the original data used is subject to some kind of restrictions, normally the requirement to cite back to the original data compilations. Please do consult their requirements for data-reuse. Any ‘raw’ data included in this repository is for the purpose of reproducibility and in most cases other users should return the original sources.