32  Epidemic Models with Demography

This tutorial demonstrates how to model a Susceptible-Infected (SI) epidemic in an open population. An example of an SI epidemic is HIV, with infected persons never leaving the infected stage before mortality. Modeling HIV is quite complicated, and this tutorial is not meant to be a full-scale HIV model: it does not include many components of HIV models, such as disease stage, drug therapy, and so on. This only represents SI dynamics generally with the possibility of vital dynamics.

The population is open because we are now including arrivals and departures in the epidemic process. Simulating an epidemic in an open population requires adjustments that accommodate changes to the network structure over time, as detailed in the Module 8 slides.

For simplicity and to keep the overall population size stable, our initial example will have the departure processes not depend on disease status (no disease-dependent mortality). We will then show the implications of the “death correction” to the estimation to demonstrate why we implement that model correction, and finally explore models in which the population size does vary.

Unlike in the tutorial in Module 5, we are using an attribute here but not the special group attribute. So we will allow for heterogeneity in network structure but assume homogeneous epidemic parameters. It is straightforward to extend EpiModel (we will cover this in Module 9) to allow for any arbitrarily specified attributes to impact both the network structure and epidemic processes.

Note

Download the R script to follow along with this tutorial here.

Start with loading EpiModel:

Code
library(EpiModel)

32.1 Network Structure

The network is initialized as before, but now we set a vertex attribute of risk group onto the network. Risk group will be a binary variable, with the risk groups evenly sized in the population.

Code
nw <- network_initialize(n = 500)  
nw <- set_vertex_attribute(nw, attrname = "risk", value = rep(0:1, each = 250))

We will proceed by fitting network model in which Risk Group 1 has a higher mean degree and there is strong assortative (within-group) mixing (aka, homophily).

32.2 Model 1: Stable Population Size

The first model will assume that there is random mixing in the population, with no preference for same-risk partnerships, and that the two risk groups have the same mean degree. In fact, this means that there is no need to include separate terms for degree and homophily in the network model (it is an edges-only model or Bernoulli random graph).

32.2.1 Network Parameterization

We we will add heterogeneity in mean degree by risk group and for homophily in risk group mixing by using nodefactor and nodematch terms and associated target statistics.

The formula for the formation component of the model will be as follows. The nodefactor term represents variation in the overall mean degree by the named attribute, risk. The nodematch represents mixing by that named attribute.

Code
formation <- ~edges + nodefactor("risk") + nodematch("risk")

The overall mean degree will be 0.5, but will vary by risk group. We will specify that the mean degree for risk group 1 is 0.75. Because the overall mean degree is 0.5, and the proportions of risk groups is equal, that implies that the mean degree for risk group 0 will be 0.25.

The target statistic for the nodefactor term is the product of the mean degree of risk group 1 and the size of risk group 1: 0.75 * 250 = 187.5. Finally, we will specify that 90% of the partnerships occur between persons of the same risk group, which equals 112.5 edges.

Code
target.stats <- c(125, 187.5, 112.5)

The dissolution model components will be as follows. We specify a homogeneous dissolution model with an average relational duration of 40 weeks. The dissolution coefficients must be adjusted to accommodate the exogenous probability of edge dissolution due to mortality (or other forms of nodal departure) – this is the “death correction. As shown in the Module 8 Slides, this adjustment has the impact of increasing the coefficient. Recall that this is because the dissolution model is actually implemented as a persistence model: a larger coefficient = longer partnerships.

Code
coef.diss <- dissolution_coefs(dissolution = ~offset(edges), 
                               duration = 40, d.rate = 0.001)
coef.diss
Dissolution Coefficients
=======================
Dissolution Model: ~offset(edges)
Target Statistics: 40
Crude Coefficient: 3.663562
Mortality/Exit Rate: 0.001
Adjusted Coefficient: 3.7469

The d.rate parameter here is the average mortality (or other departure) rate per time step. This will correspond to the epidemic parameters specified below in param.net. In the case that there is disease-induced mortality, this parameter should represent a weighted average of the different departure rates in the full system (we will explore this below).

In this tutorial example a d.rate parameter of 0.001 corresponds to an average “lifespan” (time spent in the population) of 1/0.001 = 1000 time steps. This is probably unrealistic for most disease models (especially since the average relations last 40 time steps), but we are using a small value of lifespan for demonstration purposes. In fact, you may encounter the following message if the departure rate is too high relative to the average relational duration.

Code
coef.diss.high <- dissolution_coefs(dissolution = ~offset(edges), 
                               duration = 40, d.rate = 0.05)
# Error: The competing risk of departure is too high for the given duration of 40; specify a d.rate lower than 0.01258.

In this test case below, the average “lifespan” is half as long as the average relationship (which probably doesn’t make any sense). We have tried to catch this problem for you with an informative error message.

Ok, so let’s use our smaller d.rate parameter of 0.001.

32.2.2 Estimation and Diagnostics

The network model is estimated using the netest function with all the inputs we specified above.

Code
est1 <- netest(nw, formation, target.stats, coef.diss)

We can use the summary function to display the coefficient estimates for the nodefactor and nodematch terms, which are strong (> 0 on the log odds scale) and highly significant. The specific values of those coefficients do not matter much here (what matters is how they generate simulations of the complete network, below), but the positive sign of both terms shows these are greater than expected than under the null model, precisely as we have parameterized them.

Code
summary(est1)
Call:
ergm(formula = formation, constraints = constraints, offset.coef = coef.form, 
    target.stats = target.stats, eval.loglik = FALSE, control = set.control.ergm, 
    verbose = verbose, basis = nw)

Monte Carlo Maximum Likelihood Results:

                  Estimate Std. Error MCMC % z value Pr(>|z|)    
edges              -9.1317     0.2898      0 -31.509   <1e-04 ***
nodefactor.risk.1   0.6115     0.1120      0   5.461   <1e-04 ***
nodematch.risk      2.0328     0.2927      0   6.946   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-likelihood was not estimated for this fit. To get deviances, AIC, and/or BIC, use ‘*fit* <-logLik(*fit*, add=TRUE)’ to add it to the object or rerun this function with eval.loglik=TRUE.

Dissolution Coefficients
=======================
Dissolution Model: ~offset(edges)
Target Statistics: 40
Crude Coefficient: 3.663562
Mortality/Exit Rate: 0.001
Adjusted Coefficient: 3.7469

For the diagnostics for this model, we set the nwstats.formula to expand the formation formula to specify that we want to monitor the mean degree and mixing by the risk attribute. We do this with nodefactor and nodematch terms, respectively. By adding them to the nwstats.formula, but not in the formation formula for the TERGM itself, we can get summary statistics for simulations to see what the expected values of these terms would be in a null model. Setting levels = NULL in the nodefactor term will output statistics for both risk groups in the model.

We can also monitor the nodematch term with a diff = TRUE argument, which allows for differential homophily by group. In this model, there would never be a reason to specify differential homophily by group in the ERGM formula because it would be over-specified: because it is a binary attribute, individual diagonal cells are a function of edges and nodefactor terms. It can be used for models either without a nodefactor term, or for models with nodefactor terms but with a mixing attribute with more than two levels. But again we show it here for comparison, and getting you familiar with the different syntax options.

Code
dx1 <- netdx(est1, nsims = 10, nsteps = 1000, ncores = 4,
             nwstats.formula = ~edges + 
                                nodefactor("risk", levels = NULL) + 
                                nodematch("risk", diff = TRUE))

Printing the diagnostics shows a good fit between the targets and the simulated outputs. There is no difference between the simulated means for nodefactor statistics for risk group 0 compared to risk group 1, indicating no differential in mean degree by risk group.

Code
dx1
EpiModel Network Diagnostics
=======================
Diagnostic Method: Dynamic
Simulations: 10
Time Steps per Sim: 1000

Formation Diagnostics
----------------------- 
                  Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means)
edges              125.0  125.796    0.637  0.909   0.876         3.475
nodefactor.risk.0     NA   64.924       NA  0.970      NA         1.593
nodefactor.risk.1  187.5  186.669   -0.443  1.514  -0.549         6.638
nodematch.risk.0      NA   26.243       NA  0.489      NA         0.871
nodematch.risk.1      NA   87.116       NA  0.734      NA         3.151
                  SD(Statistic)
edges                    11.502
nodefactor.risk.0        11.459
nodefactor.risk.1        19.299
nodematch.risk.0          5.514
nodematch.risk.1          9.379

Duration Diagnostics
----------------------- 
      Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges     40   39.675   -0.812  0.287  -1.131         1.161         3.458

Dissolution Diagnostics
----------------------- 
      Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means) SD(Statistic)
edges  0.025    0.025   -0.305      0  -0.539             0         0.014

Plotting the diagnostics shows that the simulations are right on target for the terms in the formation model. We will skip plotting the dissolution model diagnostics for this example.

Code
plot(dx1)

Recall that this TERGM estimation and diagnostics has not yet been confronted with the complexities of an open population in which there is arrival and departure. So we should not expect any challenges with corrections to any changing population or mortality, because the population size and composition remains fixed during this step. Only when we move on to the next step of the epidemic simulation do these exogenous process introduce these factors.

32.2.3 Epidemic Simulation

This epidemic simulation will investigate how variation in mean degree by risk group and highly assortative risk group mixing jointly impact epidemic prevalence overall and by risk group. We will use one set of epidemic parameters, initial conditions, and control settings shared across the two counterfactual models.

The parameters are entered into the param.net function should be familiar by now. Since this is an SI epidemic, there will be no recovery rate parameter. The three new parameters will control the arrival rate, departure rate among the susceptibles, and departure rate in the infected group. Note that this is the same departure rate as what we specified in dissolution_coefs. For this step, we will assume that all three rates are equal, which implies that there is no disease-induced mortality and a stable population size.

Code
param <- param.net(inf.prob = 0.1, act.rate = 5,
                   a.rate = 0.001, ds.rate = 0.001, di.rate = 0.001)

Similar to the previous tutorial, we only need to specify the number infected at the outset. To generate stable epidemic conditions quickly, we will see that the prevalence is 10%, distributed randomly among the population.

Code
init <- init.net(i.num = 50)

The inputs for the control settings share some similarities with the control settings from the Day 3 tutorials, but we will be adding some new parameters here:

  • resimulate.network: this argument specifies that the network should be resimulated timestep-by-timestep in the model, in response to changing conditions like demography. By default this argument is FALSE, so here we must set it to TRUE.
  • epi.by: this argument provides stratified prevalence estimates by a categorical attribute in the network (if we had used the special group attribute, stratified estimates would be available by group by default). By default, this argument is NULL.
  • tergmLite: this is a streamlined approach for network simulation that speeds up the process by 20-50 fold. This is possible because of trade offs between complete versus sparse representation of the underlying network object. By default this is set to FALSE, which means that the full network object is retained. With tergmLite methods, this means that the full network object cannot be plotted at different time points (because the individual-level network history is not retained). Otherwise, the two methods should be equivalent.

For the main epidemic model comparison, we are going to run 20 simulations on 10 cores with tergmLite = TRUE. Note that we are using the nwstats.formula argument to add a couple of network terms to monitor for the post-simulation diagnostics.

Code
control <- control.net(type = "SI", nsteps = 300, nsims = 20, ncores = 10, 
                       resimulate.network = TRUE, epi.by = "risk", tergmLite = TRUE,
                       nwstats.formula = ~edges + 
                                          nodefactor("risk", levels = NULL) + 
                                          nodematch("risk", diff = TRUE) + 
                                          meandeg)

To simulate the epidemic model, we again use the netsim function.

Code
sim1 <- netsim(est1, param, init, control)

32.2.4 Epidemic Model Analysis

Our model analysis will consist of model diagnostics, and then the examination of the epidemiological outcomes overall and by risk group.

32.2.4.1 Post-Simulation Diagnostics

It is important to first examine the network model diagnostics after the epidemic simulation, since the vital dynamics within these simulations may have changed the structure of the network in unexpected ways.

Printing the netsim output object shows its contents. The output compartments now include the post-simulation diagnostics, similar to the output from a netdx object (pre-simulation diagnostics).

Code
sim1
EpiModel Simulation
=======================
Model class: netsim

Simulation Summary
-----------------------
Model type: SI
No. simulations: 20
No. time steps: 300
No. NW groups: 1

Fixed Parameters
---------------------------
inf.prob = 0.1
act.rate = 5
a.rate = 0.001
ds.rate = 0.001
di.rate = 0.001
groups = 1

Model Output
-----------------------
Variables: sim.num s.num s.num.risk0 s.num.risk1 i.num 
i.num.risk0 i.num.risk1 num num.risk0 num.risk1 si.flow 
ds.flow di.flow a.flow
Networks: sim1 ... sim20
Transmissions: sim1 ... sim20

Formation Statistics
----------------------- 
                  Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means)
edges              125.0  125.309    0.247  1.104   0.280         6.855
nodefactor.risk.0     NA   61.416       NA  0.967      NA         5.375
nodefactor.risk.1  187.5  189.202    0.907  1.919   0.887        10.927
nodematch.risk.0      NA   24.456       NA  0.440      NA         2.334
nodematch.risk.1      NA   88.348       NA  0.888      NA         5.035
meandeg               NA    0.500       NA  0.004      NA         0.024
                  SD(Statistic)
edges                    12.612
nodefactor.risk.0        10.586
nodefactor.risk.1        21.469
nodematch.risk.0          4.901
nodematch.risk.1         10.204
meandeg                   0.047


Duration and Dissolution Statistics
----------------------- 
Not available when:
- `control$tergmLite == TRUE`
- `control$save.network == FALSE`
- `control$save.diss.stats == FALSE`
- dissolution formula is not `~ offset(edges)`
- `keep.diss.stats == FALSE` (if merging)

Note that even with the introduction of vital dynamics (births and deaths), the simulation means are very close to the target statistics. While there may be a small difference here, it is quite small on the absolute scale (3-5 edges per time step), which is due to the smaller sample size (smaller network and low number of simulations) rather than a model bias. Note that dissolution statistics are not currently available in this model, but something that we plan to add in the future.

Here we show how to plot model diagnostics recorded within the netsim object. Although we have saved all the network statistics in the formation formula, here we plot the number of edges over time. We are happy with the diagnostics because the interquartile range of the simulations overlaps with the target statistic lines.

Code
plot(sim1, type = "formation", plots.joined = FALSE)

We can also pull out one specific formula term, like edges and meandeg (for mean degree), to plots separately here. We also show the unique values of each of the 10 simulations.

Code
par(mfrow = c(1, 2))
plot(sim1, type = "formation", stats = "edges", qnts = FALSE, sim.lines = TRUE)
plot(sim1, type = "formation", stats = "meandeg", qnts = FALSE, sim.lines = TRUE)

Overall, these diagnostics suggest that any of the exogenous vital dynamic processes that we added to the model did not have unintended consequences on the network structure. Good news!

32.2.4.2 Demographic Outcomes

Next, let’s explore the demographic outcomes that are now part of the epidemic model processes. First, we can look at the overall population size, num, and the population size by risk group attribute. Because we have balanced birth and death rates, we expect the total population size to remain stable at 500 (our initial network size); it does! We also expect that the 50/50 distribution of the two groups to remain stable; it does!

Code
par(mfrow = c(1, 2))
plot(sim1, y = "num", legend = TRUE, sim.lines = TRUE)
plot(sim1, y = c("num.risk0", "num.risk1"), legend = TRUE, sim.lines = TRUE)

Note however that we are evaluating stability based on the means across simulations. Some individual outlier simulations has the population changing +/- 30 nodes, and the same goes for the risk group-specific numbers. That’s ok.

Next, we can look at the flows for the three vital dynamics process. a.flow is the count of new arrivals into the model, ds.flow is the number of departures from the susceptible group, and di.flow is the number of departures from the infected group.

Code
par(mfrow = c(1, 1))
plot(sim1, y = c("a.flow", "ds.flow", "di.flow"), mean.lwd = 2.5,
     qnts = FALSE, legend = TRUE, ylim = c(0, 0.6))

The overall count for the arrivals is stable, and right around the expected value of 0.5 per time step (because the per capita arrival rate is 0.001 and the population size is 500, then 500 * 0.001 = 0.5). The two death flow counts are changing. Why? Because (as we will see below), the state size of the susceptibles is shrinking as the state size of the infecteds is growing.

To demonstrate this changing epidemiology, we can calculate standardized death rates, and add them to the model object using the mutate_epi function.

Code
sim1 <- mutate_epi(sim1, ds.flow.st = ds.flow / s.num,
                         di.flow.st = di.flow / i.num)

With that added, let’s plot it. Now, the numbers converge to stable and equal values (however there is some flucuation at the beginning of the model due to low numbers of infecteds).

Code
plot(sim1, y = c("ds.flow.st", "di.flow.st"), 
     qnts = FALSE, legend = TRUE, mean.lwd = 2.5)

32.2.4.3 Epidemic Outcomes

Finally, let’s take a look at the epidemic outcomes for disease prevalence. Overall, total population prevalence is contained in i.num, and we can plot that here.

Code
plot(sim1, y = "i.num", qnts = 1, main = "Total Prevalence", ylim = c(0, 500))

The next plot shows the prevalence by risk group. Our model demonstrates that Risk Group 1, with the higher mean degree and strong assortative mixing, has the higher disease prevalence.

Code
plot(sim1, y = c("i.num.risk0", "i.num.risk1"),  legend = TRUE, qnts = 1,
     ylim = c(0, 500), main = "Prevalence by Group")

32.3 Model 2: Impact of Death Correction

To clearly demonstrate the importance of the “death correction” for the dissolution component of the model, we will re-estimate the network model from above, but incorrectly setting the d.rate parameter in dissolution_coefs to 0 (whereas it was properly specified before as the weighted average of the death rates in the epidemic parameters). Notice that the crude and adjusted coefficients are the same.

Code
coef.diss2 <- dissolution_coefs(dissolution = ~offset(edges), 
                                duration = 40, d.rate = 0)
coef.diss2
Dissolution Coefficients
=======================
Dissolution Model: ~offset(edges)
Target Statistics: 40
Crude Coefficient: 3.663562
Mortality/Exit Rate: 0
Adjusted Coefficient: 3.663562

We next reestimate the network model (new diagnostics with netdx are not needed because the death correction does not impact the TERGM estimation), and then resimulate the epidemic model with netsim (all the other inputs are recycled from Model 1).

Code
est2 <- netest(nw, formation, target.stats, coef.diss2)
sim2 <- netsim(est2, param, init, control)

Let’s first review the demographics of the model as we did above, looking at the total population size and the group-specific population size. Both are stable, so any changes to the network stats should not be influenced by population size variation.

Code
par(mfrow = c(1, 2))
plot(sim2, y = "num", legend = TRUE, sim.lines = TRUE)
plot(sim2, y = c("num.risk0", "num.risk1"), legend = TRUE, sim.lines = TRUE)

Next, we plot the edges in the post-simulation network diagnostics. Now there is a clear downward trend in the count of edges.

Code
par(mfrow = c(1, 1))
plot(sim2, type = "formation", stats = "edges", qnts = FALSE, sim.lines = TRUE)

If the population size declined, we might expect the total edge count to decline by the mean degree (meandeg) should remain constant (because we used that network size correction). Alas, the mean degree declined too.

Code
par(mfrow = c(1, 1))
plot(sim2, type = "formation", stats = "meandeg", qnts = FALSE, sim.lines = TRUE)
abline(h = 0.5, lty = 2, lwd = 2)

The numerical summary of the statistics in the print of the netsim object reveals that the simulations for edges are a bit lower than the target, and the meandeg is lower than the expected 0.5.

Code
print(sim2)
EpiModel Simulation
=======================
Model class: netsim

Simulation Summary
-----------------------
Model type: SI
No. simulations: 20
No. time steps: 300
No. NW groups: 1

Fixed Parameters
---------------------------
inf.prob = 0.1
act.rate = 5
a.rate = 0.001
ds.rate = 0.001
di.rate = 0.001
groups = 1

Model Output
-----------------------
Variables: sim.num s.num s.num.risk0 s.num.risk1 i.num 
i.num.risk0 i.num.risk1 num num.risk0 num.risk1 si.flow 
ds.flow di.flow a.flow
Networks: sim1 ... sim20
Transmissions: sim1 ... sim20

Formation Statistics
----------------------- 
                  Target Sim Mean Pct Diff Sim SE Z Score SD(Sim Means)
edges              125.0  117.790   -5.768  0.886  -8.139         7.172
nodefactor.risk.0     NA   60.456       NA  0.811      NA         4.639
nodefactor.risk.1  187.5  175.123   -6.601  1.629  -7.598        13.553
nodematch.risk.0      NA   24.104       NA  0.369      NA         1.981
nodematch.risk.1      NA   81.437       NA  0.807      NA         6.423
meandeg               NA    0.471       NA  0.003      NA         0.027
                  SD(Statistic)
edges                    11.493
nodefactor.risk.0        10.221
nodefactor.risk.1        20.794
nodematch.risk.0          4.571
nodematch.risk.1         10.191
meandeg                   0.045


Duration and Dissolution Statistics
----------------------- 
Not available when:
- `control$tergmLite == TRUE`
- `control$save.network == FALSE`
- `control$save.diss.stats == FALSE`
- dissolution formula is not `~ offset(edges)`
- `keep.diss.stats == FALSE` (if merging)

This example highlights why we add that death rate correction in dissolution_coefs: to increase the persistence of edges from natural dissolution to account for the competing risk of edge dissolution from departure (death).

32.4 Model 3: Varying Population Size

Finally, let’s build a model in which there is a higher rate of disease-specific mortality. Specifically, we will assume here that the mortality rate from disease is twice that of the natural mortality rate (due to all other causes). Therefore, we will parameterize a di.rate = 0.002 and a ds.rate = 0.001. We will not change any other epidemic parameters.

Code
param3 <- param.net(inf.prob = 0.1, act.rate = 5,
                    a.rate = 0.001, ds.rate = 0.001, di.rate = 0.002)

Going back to the network model parameterization, no changes are needed for the formation model target statistics. However, we do need to update the d.rate parameter in dissolution_coefs to account for the higher average mortality in this new model parameterization.

What value of d.rate should we choose, since there are now two death rates in param.net? One natural starting point would be 0.0015, the average of the two rates above. However, what we also need to consider the weighted average of person-time spent in each disease state. If there were equal time spent in S and I, then the simple average would do. But in our model, more total person time is spent in the infected state.

In practice, coming up with a d.rate that works here will involve some trial and error based on analysis of the post-simulation diagnostics. We have selected the d.rate = 0.0018, indicating that more person-time is spent in the I state than the S state.

Code
coef.diss3 <- dissolution_coefs(dissolution = ~offset(edges), 
                                duration = 40, d.rate = 0.0018)
coef.diss3
Dissolution Coefficients
=======================
Dissolution Model: ~offset(edges)
Target Statistics: 40
Crude Coefficient: 3.663562
Mortality/Exit Rate: 0.0018
Adjusted Coefficient: 3.818895

Next, we estimate the network model and simulate the epidemic model again.

Code
est3 <- netest(nw, formation, target.stats, coef.diss3)
sim3 <- netsim(est3, param3, init, control)

First, we review the total population size. Now there is a pretty steep decline in the population size as prevalence increases.

Code
par(mfrow = c(1, 2))
plot(sim3, y = "num", sim.lines = TRUE, 
     main = "Population Size", legend = FALSE)
plot(sim3, y = "i.num", sim.lines = TRUE, 
     main = "Disease Prevalence", legend = FALSE)

What is the impact on the edges count in the network simulations? It also declines as the population size declines. That is OK! The edges are a function of the mean degree and the population size (N), so as N decreases the edges should decrease.

Code
par(mfrow = c(1, 1))
plot(sim3, type = "formation", stats = "edges", qnts = FALSE, sim.lines = TRUE)

To account for the declining population size, the mean degree (meandeg) should remain constant if the population size correction and the death correction were implemented correction. And Voila! This is a good model diagnostic.

Code
par(mfrow = c(1, 1))
plot(sim3, type = "formation", stats = "meandeg", qnts = FALSE, sim.lines = TRUE)
abline(h = 0.5, lty = 2, lwd = 2)