Supporting Information

This deposit provides code to fit a transmission model of bovine tuberculosis spread to a population of wild badgers in Woodchester Park in the UK, as described in the pre-print here: https://doi.org/10.1101/2024.01.26.576600

We only include the code in this deposit, and not the raw data, since the latter are provided by the Animal and Plant Health Agency on behalf of the Department for the Environment, Food and Rural Affairs and are not publicly available. The data (WPbadgerData_NE_V000616_1.zip) can be requested by contacting the Animal and Plant Health Agency (see the linked manuscript above).

Collection/generation methods

This code was developed over a two-year period as part of a Natural Environment Research Council funded grant (number NE-V000616-1).

Nature and units of recorded values

N/A

Quality control

Predictive outputs from the code are compared to the observed data and show good fits to the data (see Figures section below).

Details of data structure

The code to run the iFFBS algorithm is provided in an R package called BIID, included as part of the code deposit. An additional file, runmodel.R, provides code to format the data and run the model.

Installation

You will need to install the BIID package from source. The package depends on the Rcpp and RcppArmadillo packages, which require the installation of the correct C++ compilers.

Windows

Install Rtools.

(Make sure you tick the option to add Rtools to the PATH whilst installing if requested.)

(Tested on R 4.3.1 on Windows 10.)

Mac

Install Xcode command line tools. Execute the command xcode-select --install in a Terminal.

You might also need to install the gfortran libraries from:

https://cran.r-project.org/bin/macosx/tools/gfortran-6.1.pkg

(Not tested.)

Linux

Install gcc and related packages (you might also need gcc-fortran for some of the dependencies).

In Ubuntu Linux, execute the command sudo apt-get install r-base-dev in a Terminal.

(Tested on R 4.3.1 on Ubuntu 23.04.)

Install package

NOTE: Due to this package being hosted on EIDC, the filenames of all the source code had to be re-named into lower-case. This means that installation of the package is not as straightforward as it would be usually.

As such, installing the package requires some manual steps (detailed below), and the use of the devtools, Rcpp and RcppArmadillo packages in R. Once the compilers above have been installed, then these packages can be installed from R in the usual way:

install.packages(c("devtools", "Rcpp", "RcppArmadillo"))

Then:

  1. Download the biid package folder onto your local machine.
  2. Rename the folder from biid to BIID.
  3. Rename BIID/description to BIID/DESCRIPTION.
  4. Rename BIID/namespace to BIID/NAMESPACE.
  5. Rename BIID/r to BIID/R.
  6. Rename BIID/src/makevars to BIID/src/Makevars.
  7. Rename BIID/src/makevars.win to BIID/src/Makevars.win.

Then, in R, set your working directory to the parent directory containing the BIID folder, and run:

library(devtools)
library(Rcpp)
document("BIID")

This should create the necessary auxiliary files to be able to install the package. Then run:

install("BIID")

to install the package from source. Once installed, the package can be loaded as usual using e.g.

library(BIID)

Additional packages

Additional packages required to run the model code can be installed from within R using:

install.packages(c("tidyverse", "reshape2", "MCMCpack"))

Model and data

Notation:

  • \(G\): number of social groups.
  • \(T\): number of quarters in the whole study, which is on \(t = 1, \dots, T\).
  • \(m\): number of individuals.
  • \(N\): number of iterations.
  • \(p\): number of parameters.
  • \(J\): number of diagnostic tests.

Data

Processed data required to reproduce results are stored in WPbadgerdata_NE_V000616_1.zip and can be requested as above. This must be unzipped into a folder called WPbadgerdata containing the following files:

  • birthTimes.rds: integer vector (of length \(m\)) with the birth times.

  • CaptEffort.rds: (\(G \times T\)) matrix indicating if social group \(g\) was being monitored at time \(t\) (1 if yes, 0 otherwise).

  • CaptHist.rds: (\(m \times T\)) matrix indicating if individual \(i\) was captured at time \(t\) (1 if yes, 0 otherwise).

  • capturesAfterMonit.rds: matrix of two columns showing the last capture times (second column) for individuals (first column) captured after the monitoring period. This information is used for the inference of mortality rates.

  • dfTimes.rds: (\(T \times 2\)) look-up matrix showing the times in calendar years, e.g.

    > dfTimes
        idx    time
    1     1 1980.00
    2     2 1980.25
    3     3 1980.50
    4     4 1980.75
    5     5 1981.00
  • endSamplingPeriod.rds: integer vector (of length \(m\)) showing when the individuals stopped being monitored.

  • groupNames.rds: character vector (of length \(G\)) with the social group names.

  • startSamplingPeriod.rds: integer vector (of length \(m\)) showing when the individuals started being monitored.

  • TestMat.rds: matrix where each row represents a capture event. The first three columns are time, individual number, and social group. The remaining columns are diagnostic test results (1 if positive, 0 if negative, NA if not conclusive or not taken).

  • timeVec.rds: vector (of length \(T\)) showing the dates in calendar years e.g.

    [1] 1980.00 1980.25 1980.50 1980.75 1981.00

Fitting the model

File runmodel.R is used for reading processed data and for choosing priors, initial conditions and details about HMC/RWMH parameter updates.

Outputs of the fitted model

Outputs (posterior samples for parameters and hidden states) are saved in the directory resultsDirectory which can be specified by the user in runmodel.R.

Output files are [X].Rds, where X is:

  • postPars: (\(N \times p\)) matrix with the posterior samples for all model parameters.
  • logLik and logPost: (\(N \times 1\)) vectors with the log-likelihood and log-posterior, respectively, for all iterations.
  • nSus, nExp and nInf: (\(T \times N\)) matrices showing the total number of individuals in each compartment (\(S\), \(E\), and \(I\), respectively) in the whole population, at each time point, for each iteration.

Other outputs are split into blocks of iterations, with each block stored in a different folder e.g. Iters_from1to1000. In the bullet points below \(N\) refers to the number of iterations in each block, and then within each folder there are:

  • infTimes, infectivityTimes, deathTimes: (\(m \times N\)) matrices with the infection times, times of infectiousness, and death times, respectively, for each iteration. The value -10L denotes the absence of the event.
  • nSusTested, nExpTested and nInfTested: (\(T \times N \times J\)) arrays showing the total number of individuals in each compartment which were tested by each diagnostic test.
  • nSusByGroup, nExpByGroup, nInfByGroup: (\(G \times T \times N\)) arrays showing the number of individuals in each compartment (\(S\), \(E\), and \(I\), respectively) in each social group, at each time point, for each iteration.
  • nSusTestedByGroup, nExpTestedByGroup and nInfTestedByGroup: lists of \(G\) elements (one for each social group), where each element is a (\(T \times N \times J\)) array showing the number of individuals, in each compartment, which were tested by each diagnostic test.
  • AcontribPop: (\(N \times 1\)) vector with the relative contribution of the background transmission rates for each iteration in the whole population.
  • AcontribGroup: (\(G \times N\)) matrix with the relative contribution of the background transmission rates for each iteration in each social group.
  • AcontribPopTime: (\(T \times N\)) matrix with the relative contribution of the background transmission rates for each iteration at each time point in the whole population.
  • AcontribGroupTime: (\(G \times T \times N\)) matrix with the relative contribution of the background transmission rates for each iteration at each time point in each social group.

Figures

Below we show some example figures that can be generated from the model outputs. Details about these figures can be found in the preprint here: https://doi.org/10.1101/2024.01.26.576600

Posterior predictive summaries of test results aggregated across social groups against the observed data

  1. test positives;
  2. test negatives

Plots of hidden states aggregated across all social groups

  1. Number of individuals in each state aggregated across all social groups;
  2. relative contribution of badger-to-badger transmission compared to other sources of infection;
  3. proportions of test positives in raw data over time

Posterior distribution of conditional event times against data for an example badger

Posterior summaries for effective infectious periods

  1. individual infectious periods;
  2. effective infectious period averaged across all individuals;
  3. infectious period averaged across all individuals

Posterior distribution for effective reproduction numbers

  1. individual effective reproduction numbers;
  2. population effective reproduction number

Posterior predictive survival distributions

  1. survival function;
  2. hazard function

Posterior distributions for parameters

Posterior distributions for background infection rates by social group

Traceplots for parameters

Traceplots for background rates of infection

Plots of hidden states for groups