36  Adding a Latent Stage to Make an SEIR Model

EpiModel includes an integrated SIR model, but here we show how to model an SEIR disease like COVID-19. The E compartment in this disease is an exposed state in which a person has been infected but is not infectious to others. Some infectious diseases have this latent non-infectious stage, and in general it provides a general framework for transmission risk that is dependent on one’s stage of disease progression.

Note

This tutorial has two R scripts you should download: a primary script containing the code below and a separate module script. Download both and put them in the same working directory.

36.1 Setup

First start by loading EpiModel and clearing your global environment.

Code
library(EpiModel)
rm(list = ls())

36.2 EpiModel Model Extensions

36.2.1 Conceptualization

The first step to any EpiModel extension model is to conceptually identify what new functionality, above and beyond the built-in models, is desired and where that functionality should be added. There are often many “right” answers to these questions, and this aspect is only learned over time. But in general, it is helpful to map out a model extension on a state/flow diagram to pinpoint where the additions should be concentrated.

36.2.1.1 SIR Module Set

In this particular model, we will be adding a new disease state that occurs in between the susceptible disease state and an infectious disease state in an SIR model. The standard, built-in SIR model in EpiModel uses set modules (elements of the model), each with its own set of associated functions (the realization of those elements in code). For any built-in model, you can see what set of modules and functions have been used by running the simulation with netsim and then printing the output.

Code
nw <- network_initialize(n = 100)
formation <- ~edges
target.stats <- 50
coef.diss <- dissolution_coefs(dissolution = ~offset(edges), duration = 20)
est1 <- netest(nw, formation, target.stats, coef.diss, verbose = FALSE)
param <- param.net(inf.prob = 0.3, rec.rate = 0.1)
init <- init.net(i.num = 10, r.num = 0)
control <- control.net(type = "SIR", nsteps = 25, nsims = 1, verbose = FALSE)
mod1 <- netsim(est1, param, init, control)
mod1
EpiModel Simulation
=======================
Model class: netsim

Simulation Summary
-----------------------
Model type: SIR
No. simulations: 1
No. time steps: 25
No. NW groups: 1

Fixed Parameters
---------------------------
inf.prob = 0.3
rec.rate = 0.1
act.rate = 1
groups = 1

Model Output
-----------------------
Variables: sim.num s.num i.num r.num num si.flow ir.flow
Networks: sim1
Transmissions: sim1

Formation Statistics
----------------------- 
      Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges     50    50.12     0.24   0.85   0.141            NA         2.891


Duration Statistics
----------------------- 
      Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges     20   20.919    4.596  1.149     0.8            NA         1.712

Dissolution Statistics
----------------------- 
      Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges   0.05    0.052    3.842  0.006   0.328            NA         0.029

36.2.1.2 Module Classes

This includes a series of modules, which we classify into Standard and Flexible modules as shown on the table below. Any modules may be modified, but standard modules are those that are typically not modified because they generalize the core internal processes for simulation object generation, network resimulation, and epidemic bookkeeping. Flexible modules, in contrast, are those that are likely necessary to modify for an EpiModel extension.

In the example above, we used all of these modules except the arrivals and departures modules because we had a closed population. In addition to these set of built-in modules, any user can add more modules to this set, depending on what is needed. Again, this involves some conceptualization of how to organize the model processes, including whether those processes are similar to the built-in modules or something new.

36.2.1.3 Modules vs. Functions

All modules have associated functions, and these are passed into the epidemic model run in netsim through control.net. Printing out the arguments for this function, you will see that each of the standard modules have default associated functions as inputs, and flexible modules have a default of NULL.

Code
args(control.net)
function (type, nsteps, start = 1, nsims = 1, ncores = 1, resimulate.network = FALSE, 
    tergmLite = FALSE, cumulative.edgelist = FALSE, truncate.el.cuml = 0, 
    attr.rules, epi.by, initialize.FUN = initialize.net, resim_nets.FUN = resim_nets, 
    summary_nets.FUN = summary_nets, infection.FUN = NULL, recovery.FUN = NULL, 
    departures.FUN = NULL, arrivals.FUN = NULL, nwupdate.FUN = nwupdate.net, 
    prevalence.FUN = prevalence.net, verbose.FUN = verbose.net, 
    module.order = NULL, save.nwstats = TRUE, nwstats.formula = "formation", 
    save.transmat = TRUE, save.network, save.other, verbose = TRUE, 
    verbose.int = 1, skip.check = FALSE, raw.output = FALSE, 
    tergmLite.track.duration = FALSE, set.control.ergm = control.simulate.formula(MCMC.burnin = 2e+05), 
    set.control.tergm = control.simulate.formula.tergm(), save.diss.stats = TRUE, 
    dat.updates = NULL, ...) 
NULL

For built-in models, EpiModel selects which modules are needed based on the model parameters and initial conditions. For each time step, the modules run in the order in which they are specified in the output here; this also matches the order in which they are listed in control.net.

36.2.1.4 Built-in Functions as Templates

Any of the built-in functions associated with flexible modules are intended to be templates for user inspection and extension for research-level models. So, for example, mod1 above shows that infection.net was used as the infection module. That function has a help page briefly describing what it does. And you can also inspect the function contents with:

Code
View(infection.net)

We’ll use an edited down version of that function, with some additional explanation, below. In addition, the disease progression state transition within an SIR model (and an SIS model too) is handled by the recovery module, with an associated recovery.net function.

Code
View(recovery.net)

We will use an edited down version of that function as a template for a new, more general disease progression module.

36.2.2 The EpiModel API

All EpiModel module functions have a set of shared design requirements. This set of requirements defines the EpiModel Application Programming Interface (API) for extension. The best way to learn this is through a concrete example like this, but here are the general API rules:

  1. Each module has an associated R function, the general design of which is:
Code
fx <- function(dat, at) {
  
  ## function processes that update dat
  
  return(dat)
}

The function takes an object called dat, which is the master data object passed around by netsim, performs some processes (e.g., infection, recovery, aging, interventions), updates the dat object, and then returns that object. The other input argument to each function must be at, which is a time step counter.

  1. Data are stored on the dat object in a particular way: in sublists that are organized by category. The main categories of data to interact with include model inputs (parameters, initial conditions, and controls) from those three associated input functions; nodal attributes (e.g., an individual disease status for each person); and summary statistics (e.g., the disease prevalence at at time step). There are accessor functions for reading (these are the get_ functions) and writing (these are the set_ functions) to the dat object in the appropriate place.

  2. The typical function design involves three steps: a) reading the relevant inputs from the dat object; b) performing some micro-level process on the nodes that is usually a function of fixed parameters and time-varying nodal attributes; c) writing the updated objects back on to the dat object.

Let’s see how this API works by extending our infection and recovery functions to transition an SIR model into an SEIR model.

36.2.3 Infection Module

The built-in infection module for an SIR model performs the following functions listed in the function below. The core process is determining which edges are eligible for a disease transmission to occur, and then randomly simulating that transmission process. Why is it necessary to update the infection function for an SEIR model? Because an SIR involves a transition between S and I disease statuses when a new infection occurs, but an SEIR model involves a transition between S and E disease statuses. It’s a small, but important change. Here’s the full modified function, with embedded comments. Note that we can use the browser function to run this function in debug mode by uncommenting the third line (we will demonstrate this).

Code
infect <- function(dat, at) {

  ## Uncomment this to run environment interactively
  # browser()

  ## Attributes ##
  active <- get_attr(dat, "active")
  status <- get_attr(dat, "status")
  infTime <- get_attr(dat, "infTime")

  ## Parameters ##
  inf.prob <- get_param(dat, "inf.prob")
  act.rate <- get_param(dat, "act.rate")

  ## Find infected nodes ##
  idsInf <- which(active == 1 & status == "i")
  nActive <- sum(active == 1)
  nElig <- length(idsInf)

  ## Initialize default incidence at 0 ##
  nInf <- 0

  ## If any infected nodes, proceed with transmission ##
  if (nElig > 0 && nElig < nActive) {

    ## Look up discordant edgelist ##
    del <- discord_edgelist(dat, at)

    ## If any discordant pairs, proceed ##
    if (!(is.null(del))) {

      # Set parameters on discordant edgelist data frame
      del$transProb <- inf.prob
      del$actRate <- act.rate
      del$finalProb <- 1 - (1 - del$transProb)^del$actRate

      # Stochastic transmission process
      transmit <- rbinom(nrow(del), 1, del$finalProb)

      # Keep rows where transmission occurred
      del <- del[which(transmit == 1), ]

      # Look up new ids if any transmissions occurred
      idsNewInf <- unique(del$sus)
      nInf <- length(idsNewInf)

      # Set new attributes and transmission matrix
      if (nInf > 0) {
        status[idsNewInf] <- "e"
        infTime[idsNewInf] <- at
        dat <- set_attr(dat, "status", status)
        dat <- set_attr(dat, "infTime", infTime)

        dat <- set_transmat(dat, del, at)
  
      }
    }
  }

  ## Save summary statistic for S->E flow
  dat <- set_epi(dat, "se.flow", at, nInf)

  return(dat)
}

Each step is relatively self-explanatory with the comments, but we will run this interactively at the end of this tutorial to step through the updated data structures. Infection is one of the more complex processes because it involves a dyadic process (a contact between an S and I node). That involves a construction of a discordant edgelist; that is, a list of edges in which there is a disease discordant dyad.

The main point here is: we have made a change to the infection module function, and it consists of updating the disease status of a newly infected persons to "e" instead of "i". Additionally, we are tracking a new summary statistic, se.flow that tracks the size of the flow from S to E based on the number of new infected, nInf at the time step.

36.2.4 Progression Module

Next up is the disease progression module. Here, we have generalized the built-in recovery module function to handle two disease progression transitions after infection: E to I (latent to infectious stages) and I to R (infectious to recovered stages). Like many individual-level transitions, this involves flipping a weighted coin with rbinom: this performs a series of random Bernoulli draws based on the specified parameters. Here is the full model function.

Code
progress <- function(dat, at) {

  ## Uncomment this to function environment interactively
  # browser()

  ## Attributes ##
  active <- get_attr(dat, "active")
  status <- get_attr(dat, "status")

  ## Parameters ##
  ei.rate <- get_param(dat, "ei.rate")
  ir.rate <- get_param(dat, "ir.rate")

  ## E to I progression process ##
  nInf <- 0
  idsEligInf <- which(active == 1 & status == "e")
  nEligInf <- length(idsEligInf)

  if (nEligInf > 0) {
    vecInf <- which(rbinom(nEligInf, 1, ei.rate) == 1)
    if (length(vecInf) > 0) {
      idsInf <- idsEligInf[vecInf]
      nInf <- length(idsInf)
      status[idsInf] <- "i"
    }
  }

  ## I to R progression process ##
  nRec <- 0
  idsEligRec <- which(active == 1 & status == "i")
  nEligRec <- length(idsEligRec)

  if (nEligRec > 0) {
    vecRec <- which(rbinom(nEligRec, 1, ir.rate) == 1)
    if (length(vecRec) > 0) {
      idsRec <- idsEligRec[vecRec]
      nRec <- length(idsRec)
      status[idsRec] <- "r"
    }
  }

  ## Write out updated status attribute ##
  dat <- set_attr(dat, "status", status)

  ## Save summary statistics ##
  dat <- set_epi(dat, "ei.flow", at, nInf)
  dat <- set_epi(dat, "ir.flow", at, nRec)
  dat <- set_epi(dat, "e.num", at,
                 sum(active == 1 & status == "e"))
  dat <- set_epi(dat, "r.num", at,
                 sum(active == 1 & status == "r"))

  return(dat)
}

This set of two progression processes involves querying who is eligible to transition, randomly transitioning some of those eligible, updating the status attribute for those who have progressed, and then recording some new summary statistics.

36.3 Network Model

With the epidemic modules defined, we will now step back to parameterize, estimate, and diagnose the TERGM. Here we will use a relatively basic model with an edges and degree(1). Here we are not using any nodal attributes in either the TERGM or the epidemic modules, but these could be added (we will get more practice with that tomorrow). Note that we are using a relatively high mean degree (2 per capita) compared to some of our prior models, with lower than expected isolates.

Code
# Initialize the network
nw <- network_initialize(500)

# Define the formation model: edges + degree terms
formation = ~edges + degree(1)

# Input the appropriate target statistics for each term
target.stats <- c(500, 180)

# Parameterize the dissolution model
coef.diss <- dissolution_coefs(dissolution = ~offset(edges), duration = 25)
coef.diss
Dissolution Coefficients
=======================
Dissolution Model: ~offset(edges)
Target Statistics: 25
Crude Coefficient: 3.178054
Mortality/Exit Rate: 0
Adjusted Coefficient: 3.178054

Next we fit the network model.

Code
# Fit the model
est <- netest(nw, formation, target.stats, coef.diss)

And then diagnose it. We are including a wide range of degree terms to monitor so we can see the full degree distribution . Although the degree(1) term does not look great visually, it is only off by ~5 edges in absolute terms.

Code
dx <- netdx(est, nsims = 10, ncores = 5, nsteps = 500,
            nwstats.formula = ~edges + degree(0:7), 
            keep.tedgelist = TRUE)

Network Diagnostics
-----------------------
- Simulating 10 networks
- Calculating formation statistics
Code
print(dx)
EpiModel Network Diagnostics
=======================
Diagnostic Method: Dynamic
Simulations: 10
Time Steps per Sim: 500

Formation Diagnostics
----------------------- 
        Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges      500  498.529   -0.294  2.340  -0.629         7.927        24.322
degree0     NA   51.552       NA  0.401      NA         1.281         7.824
degree1    180  174.596   -3.002  0.780  -6.931         2.846        12.538
degree2     NA  116.223       NA  0.424      NA         1.640         9.654
degree3     NA   83.598       NA  0.399      NA         1.816         8.888
degree4     NA   44.633       NA  0.357      NA         1.205         7.099
degree5     NA   19.495       NA  0.213      NA         1.149         4.832
degree6     NA    6.998       NA  0.107      NA         0.753         2.794
degree7     NA    2.191       NA  0.058      NA         0.313         1.510

Duration Diagnostics
----------------------- 
      Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges     25   24.867   -0.531  0.102  -1.301         0.272         1.093

Dissolution Diagnostics
----------------------- 
      Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges   0.04     0.04    0.207      0   0.665             0         0.009
Code
plot(dx)

36.4 Epidemic Model Parameterization

The epidemic model parameterization for any extension model consists of using the same three input functions as with built-in models.

First we start with the model parameters in param.net. Note that we have two new rates here: ei.rate and ir.rate. These control the transitions from E to I, and then I to R. We have selected these parameter names and input the values here, but note that the same parameters get pulled into the disease progression function above. So in general there must be consistency between the naming of the parameters as inputs and their references in the model functions.

Code
param <- param.net(inf.prob = 0.5, act.rate = 2,
                   ei.rate = 0.01, ir.rate = 0.01)

Next we specify the initial conditions. Here we are specifying that there are 10 infectious individuals at the epidemic outset. If we wanted to initialize the model with persons only in the latent stage, we would need to set disease status as a nodal attribute on the network instead (following the same approach in the Serosorting Tutorial in Module 8.

Code
init <- init.net(i.num = 10)

Finally, we specify the control settings in control.net. Extension models like this require some significant updates (compared to built-in models) here. First, type is set to NULL in any extension model because we are no longer using EpiModel to pre-select which modules to run; it is an entirely manual process. Second, the new input parameter for the infection.FUN argument is infect; in other words, our new function for our infection module is the one we built above. Note that it is called infect. Third, we are defining a new module, progress.FUN, with an argument that matches the name of our new progression function. We defined a new module, rather than replaced the function of the recovery module, just to show you how to define a new module here. We could have just as easily done it the latter way with the same results. Finally, we are explicitly setting resimulate.network to FALSE; this is the default, so this is not necessary, but it is just to remind you this is a model without network feedback.

Code
source("mod9-SEIR-fx.R")
control <- control.net(type = NULL,
                       nsteps = 500,
                       nsims = 1,
                       ncores = 1,
                       infection.FUN = infect,
                       progress.FUN = progress,
                       resimulate.network = FALSE)

In the R scripts that we include with this tutorial, you will see that we have two separate R script files (for the R Markdown build, they all go in this single file). One contains the module functions, and the other contains all the other code to parameterize and run the model. We do this because it allows for easier interaction with the functions in browser mode. I will demonstrate this live next. But in general, placing the functions in a separate file conceptually disentangles the model functionality from the model parameterization. It is critical, however, that you source in the file containing the functions before you run control.net (otherwise, control.net does not know what infect and progress are).

36.5 Epidemic Simulation

Finally we are ready to do the epidemic model simulation and analysis. This is done using the same approach as the built-in models. We will start with running one model in the browser mode interactively.

Code
sim <- netsim(est, param, init, control)

And then go back, comment out the browser lines, re-source the functions, and run a full-scale model with 10 simulations.

Code
source("mod9-SEIR-fx.R")
control <- control.net(type = NULL,
                       nsteps = 500,
                       nsims = 10,
                       ncores = 5,
                       infection.FUN = infect,
                       progress.FUN = progress,
                       resimulate.network = FALSE)
sim <- netsim(est, param, init, control)

Once the model simulation is complete, we can work with the model object just like a built-in model. Start by printing the output to see what is available.

Code
print(sim)
EpiModel Simulation
=======================
Model class: netsim

Simulation Summary
-----------------------
Model type:
No. simulations: 10
No. time steps: 500
No. NW groups: 1

Fixed Parameters
---------------------------
inf.prob = 0.5
act.rate = 2
ei.rate = 0.01
ir.rate = 0.01
groups = 1

Model Functions
-----------------------
initialize.FUN 
resim_nets.FUN 
summary_nets.FUN 
infection.FUN 
nwupdate.FUN 
prevalence.FUN 
verbose.FUN 
progress.FUN 

Model Output
-----------------------
Variables: sim.num s.num i.num num ei.flow ir.flow e.num 
r.num se.flow
Networks: sim1 ... sim10
Transmissions: sim1 ... sim10

Formation Statistics
----------------------- 
        Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges      500  495.887   -0.823  2.025  -2.031         6.370        22.367
degree1    180  176.377   -2.013  0.720  -5.033         2.471        11.846


Duration Statistics
----------------------- 
      Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges     25   24.855   -0.578  0.114  -1.264         0.372         1.201

Dissolution Statistics
----------------------- 
      Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges   0.04     0.04      0.4      0   1.279             0         0.009

Here is the default plot with all the compartment sizes over time. This includes the new summary statistics we tracked in the disease progression function.

Code
par(mar = c(3, 3, 1, 1), mgp = c(2, 1, 0))
plot(sim)

Here are the flow sizes, including the new se.flow incidence tracker that we established in the new infection function.

Code
plot(sim, y = c("se.flow", "ei.flow", "ir.flow"), legend = TRUE)

Finally, here is the data frame output from the model, with rows limited to time step 100 across all 10 simulations.

Code
df <- as.data.frame(sim)
df[which(df$time == 100), ]
     sim time sim.num s.num i.num num ei.flow ir.flow e.num r.num se.flow
100    1  100     500   172    77 500       2       2   211    35       5
600    2  100     500   287    46 500       1       0   143    20       4
1100   3  100     500   149    83 500       0       1   215    49       4
1600   4  100     500   296    54 500       2       0   132    15       3
2100   5  100     500   233    53 500       1       1   180    32       2
2600   6  100     500   217    71 500       2       0   180    30       2
3100   7  100     500   248    63 500       2       0   158    22       9
3600   8  100     500   166    89 500       0       0   198    45       2
4100   9  100     500   290    40 500       3       2   135    29       6
4600  10  100     500   222    74 500       5       1   176    24       4

And here are the transmission matrices and their phylogram plots (one per infection seed).

Code
tm1 <- get_transmat(sim)
plot(tm1)
found multiple trees, returning a list of 10phylo objects