Working with Custom Attributes and Summary Statistics in EpiModel
EpiModel v2.6.1
2026-04-18
Source:vignettes/attributes-and-summary-statistics.Rmd
attributes-and-summary-statistics.RmdIntroduction
This vignette covers how to work with nodal attributes, attribute
histories, and epidemic summary statistics in EpiModel network models
with custom extension modules. It assumes familiarity with setting up
and running network models with netsim() and with the
extension API. See the Network Modeling for
Epidemics (NME) course materials and the EpiModel Gallery
for background.
For working with network objects and edgelists, see the companion vignette Working with Network Objects in EpiModel. For parameter management, see Working with Model Parameters in EpiModel.
In network simulations with netsim, all simulation state
is stored in the main list object (referred to as
dat). This vignette covers three types of data on
dat:
- Current Nodal Attributes: Per-node characteristics at the current time step.
- Historical Nodal Attributes: Selective recordings of attribute values over time.
- Epidemic Trackers: Population-level summary statistics computed at each time step.
Current Nodal Attributes
Attributes are characteristics of the nodes (e.g., persons)
in the model at the current time step. They are stored as vectors in
dat$attr, all of the same length as the number of
nodes.
Core Attributes
Every node has four core attributes, managed automatically by EpiModel:
-
active: 1 if the node is currently in the population, 0 if departed. -
entrTime: the time step when the node entered the population. -
exitTime: the time step when the node departed (NAif still active). -
unique_id: a globally unique integer identifier (never reused).
You can retrieve the list of core attributes and their default types
with get_core_attributes():
get_core_attributes()
#> $active
#> [1] 1
#>
#> $entrTime
#> [1] 1
#>
#> $exitTime
#> [1] NA
#>
#> $unique_id
#> [1] 1In addition, the built-in epidemic modules in core EpiModel use the
status attribute (disease status, with values like
"s", "i", "r") and
infTime (time of infection). More complex disease-specific
EpiModel implementations (e.g., for HIV or COVID) may handle disease
status differently. Custom attributes of any name and scalar value can
be added, such as age, race, or
viral_load.
Positional Indexing vs Unique IDs
Each node can be referenced in two ways:
By position: Think of it like a row number in a spreadsheet.
dat$attr$active[3]accesses the third node’s value directly. This is the standard way to look up node information and is very fast.By
unique_id: A globally unique integer attribute assigned at creation and never reused. Slower to look up, but allows referencing nodes that have already departed.
Consider a simple example. A model starts with 5 nodes:
| Position (row number) | unique_id |
active |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 4 | 4 | 1 |
| 5 | 5 | 1 |
Node 3 departs at step 10. The vectors still have 5 elements, but
active[3] = 0:
| Position (row number) | unique_id |
active |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 0 |
| 4 | 4 | 1 |
| 5 | 5 | 1 |
A new node arrives at step 15. In many simulations, departed nodes
are dropped from the attribute vectors, so the new node takes position 3
(the freed slot) but receives a new unique_id = 6:
| Position (row number) | unique_id |
active |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 6 | 1 |
| 4 | 4 | 1 |
| 5 | 5 | 1 |
Positional indices are used by current edgelists and attribute
accessors. Unique IDs are used by cumulative edgelists and attribute
histories because they can refer to nodes no longer in the model. See
help("unique_id-tools", package = "EpiModel") for
conversion functions.
Working With Attributes
Accessing Attributes
Use get_attr() to extract an attribute vector:
active <- get_attr(dat, "active")This returns a vector of length equal to the current number of nodes. Requesting a nonexistent attribute throws an error.
Modifying Attributes
Below is a minimal module that increments the age of all nodes by 1:
aging_module <- function(dat, at) {
# Extract current attribute
age <- get_attr(dat, "age")
# Aging process
new_age <- age + 1
# Output updated attributes
dat <- set_attr(dat, "age", new_age)
return(dat)
}Key points about set_attr():
- It returns a modified copy of
dat, which must be assigned back. (This does not cause performance issues due to R’s shallow copy semantics since R 3.1.) - The replacement vector must have the same length as the number of nodes, or the function throws an error.
Summary
The recommended workflow for attribute manipulation in modules:
- Extract attributes into local vectors with
get_attr(). - Modify the local vectors with your dynamic process.
- Update
datwith the revised vectors usingset_attr(). - Return
datat the end of the function.
These functions have additional arguments described in
help("net-accessor", package = "EpiModel").
Historical Nodal Attributes
Current attributes reflect the state of each node at the present time step. Sometimes you need to track what happened to specific nodes over time. Because saving the full history of all attributes would consume too much memory, EpiModel provides a selective recording mechanism: you choose which attributes, for which nodes, at which time steps to record.
This is useful for diagnostics (verifying that a dynamic process is coded correctly) or for tracking a limited set of individual-level outcomes for further analysis.
Recording Attribute Histories
The following module records the viral load of infected nodes every
10 time steps. It assumes the model defines a status
attribute (with values "s" for susceptible,
"i" for infected) and a viral_load attribute
(continuous for infected nodes, NA otherwise).
viral_load_logger_module <- function(dat, at) {
# Run every 10 time steps
if (at %% 10 == 0) {
# Attributes
status <- get_attr(dat, "status")
viral_load <- get_attr(dat, "viral_load")
infected <- which(status == "i")
dat <- record_attr_history(
dat, at,
"viral_load",
infected,
viral_load[infected]
)
}
return(dat)
}record_attr_history() takes five arguments:
-
dat: the main list object. -
at: the time step to associate with the recording (usually the current step). - A label for the attribute (here
"viral_load"). - A vector of
posit_ids for the nodes of interest. Internally, these are converted tounique_ids so the history remains valid even after nodes depart. - The values to record for those IDs. The number of values must match the number of IDs, unless a single value is provided (which is recycled for all IDs, using less memory).
Use attribute histories sparingly—storage can grow large in open-population models with many nodes over many time steps.
Accessing Attribute Histories
Attribute histories are accessed after the simulation is complete:
sim <- netsim(est, param, init, control)
attr_history <- get_attr_history(sim)The result is a named list of data.frames, one for each
recorded attribute label. Suppose we recorded "viral_load"
(as above) and also "status" transitions in another module.
The output would look like:
get_attr_history(sim)
# $viral_load
# sim step attribute uids values
# 1 1 10 viral_load 1001 2000
# 2 1 10 viral_load 1002 1878
# 3 1 20 viral_load 1001 1500
# 4 1 20 viral_load 1002 300
# ...
#
# $status
# sim step attribute uids values
# 1 1 22 status 1001 i
# 2 1 64 status 1002 i
# 3 1 110 status 1001 r
# 4 1 220 status 1002 r
# ...Each data.frame has columns sim (simulation
number), step (time step), attribute (the
label), uids (unique IDs), and values (the
recorded values). The list structure allows each history to have
different dimensions.
Epidemic Trackers
Epidemic trackers are population-level summary statistics computed and stored at each time step—historical summary statistics about the population, not individual nodes.
For example, the built-in tracker num records the total
population size at each step, while s.num,
i.num, and r.num record compartment sizes.
Working With Epidemic Trackers in Modules
Inside a module, access and modify epidemic trackers with
get_epi() and set_epi(). Here is an extended
aging module that also tracks mean age and its change over time:
aging_track_module <- function(dat, at) {
# Attributes
age <- get_attr(dat, "age")
# Aging process
new_age <- age + 1
# Calculate summary statistics
mean_age <- mean(new_age)
prev_mean_age <- get_epi(dat, at - 1, "mean_age")
age_change <- mean_age - prev_mean_age
# Update nodal attributes
dat <- set_attr(dat, "age", new_age)
# Update epidemic trackers
dat <- set_epi(dat, "mean_age", at, mean_age)
dat <- set_epi(dat, "age_change", at, age_change)
return(dat)
}Key points:
Accessing Epidemic Trackers After a Simulation
Epidemic trackers are the main quantities of interest after a
simulation. Access them by calling as.data.frame() on a
netsim object, or use plot() and
summary():
sim <- netsim(est, param, init, control)
# Raw per-simulation data
as.data.frame(sim)
# Means across simulations
as.data.frame(sim, out = "mean")
# Plot and summary
plot(sim)
summary(sim, at = 50)Derived Statistics with mutate_epi()
After a simulation, you can compute derived epidemic statistics
without re-running the model. mutate_epi() works like
dplyr::mutate() but operates on the epi
list:
# Add incidence rate and prevalence
sim <- mutate_epi(sim, prev = i.num / num)
sim <- mutate_epi(sim, ir = (si.flow / s.num) * 100)
# These new variables appear in the data.frame output
as.data.frame(sim)Stratified Prevalence with epi.by
The epi.by parameter in control.net()
automatically generates stratified compartment counts by a nodal
attribute. For example, to track infection by a "risk"
attribute:
set.seed(0)
nw <- network_initialize(n = 100)
nw <- set_vertex_attribute(nw, "risk", rep(0:1, each = 50))
est <- netest(
nw, formation = ~edges + nodefactor("risk"),
target.stats = c(40, 20),
coef.diss = dissolution_coefs(~offset(edges), 20), # duration = 20
verbose = FALSE
)
param <- param.net(inf.prob = 0.3, act.rate = 1)
init <- init.net(i.num = 5)
control <- control.net(
type = "SI", nsims = 1, nsteps = 50,
epi.by = "risk",
verbose = FALSE
)
sim <- netsim(est, param, init, control)With epi.by = "risk", the output includes stratified
columns automatically:
d <- as.data.frame(sim)
names(d)
#> [1] "sim" "time" "s.num" "s.num.risk0" "s.num.risk1"
#> [6] "i.num" "i.num.risk0" "i.num.risk1" "num" "num.risk0"
#> [11] "num.risk1" "si.flow"The naming pattern is {tracker}.{attribute}{value}
(e.g., i.num.risk0, i.num.risk1,
num.risk0, num.risk1).
Custom Epidemic Trackers
For summary statistics beyond built-in compartment counts, you can
define custom tracker functions and pass them to
control.net(). This avoids writing a full module for simple
calculations.
Writing Tracker Functions
A tracker function must take exactly one argument
(dat) and return a scalar value (numeric,
logical, or character). It is called automatically at the end of each
time step, after all modules have run.
epi_s_num <- function(dat) {
needed_attributes <- c("status")
output <- with(get_attr_list(dat, needed_attributes), {
sum(status == "s", na.rm = TRUE)
})
return(output)
}Here is a tracker that computes the proportion of the population infected:
epi_prop_infected <- function(dat) {
needed_attributes <- c("status", "active")
output <- with(get_attr_list(dat, needed_attributes), {
pop <- active == 1
infected <- status == "i"
sum(infected & pop, na.rm = TRUE) / sum(pop, na.rm = TRUE)
})
return(output)
}We recommend this pattern for tracker functions:
- Define
needed_attributesas a vector of attribute names. - Use
with(get_attr_list(dat, needed_attributes), { ... })to work in a clean environment. - The last expression inside
withbecomes the output. It must be a scalar. - Return the output.
Using Custom Tracker Functions
Pass tracker functions as a named list via .tracker.list
in control.net(). The names become column names in the
output data.frame:
some.trackers <- list(
prop_infected = epi_prop_infected,
s_num = epi_s_num
)
control <- control.net(
type = "SI",
nsims = 1,
nsteps = 50,
verbose = FALSE,
.tracker.list = some.trackers
)
param <- param.net(
inf.prob = 0.3,
act.rate = 0.1
)
nw <- network_initialize(n = 50)
nw <- set_vertex_attribute(nw, "race", rbinom(50, 1, 0.5))
est <- netest(
nw,
formation = ~edges,
target.stats = 25,
coef.diss = dissolution_coefs(~offset(edges), 10, 0), # duration = 10, closed population
verbose = FALSE
)
#> Starting simulated annealing (SAN)
#> Iteration 1 of at most 4
#> Finished simulated annealing
#> Starting maximum pseudolikelihood estimation (MPLE):
#> Obtaining the responsible dyads.
#> Evaluating the predictor and response matrix.
#> Maximizing the pseudolikelihood.
#> Finished MPLE.
init <- init.net(i.num = 10)
sim <- netsim(est, param, init, control)
d <- as.data.frame(sim)
knitr::kable(tail(d, n = 15))| sim | time | s.num | i.num | num | si.flow | prop_infected | s_num | |
|---|---|---|---|---|---|---|---|---|
| 36 | 1 | 36 | 26 | 24 | 50 | 0 | 0.48 | 26 |
| 37 | 1 | 37 | 26 | 24 | 50 | 0 | 0.48 | 26 |
| 38 | 1 | 38 | 26 | 24 | 50 | 0 | 0.48 | 26 |
| 39 | 1 | 39 | 26 | 24 | 50 | 0 | 0.48 | 26 |
| 40 | 1 | 40 | 25 | 25 | 50 | 1 | 0.50 | 25 |
| 41 | 1 | 41 | 25 | 25 | 50 | 0 | 0.50 | 25 |
| 42 | 1 | 42 | 25 | 25 | 50 | 0 | 0.50 | 25 |
| 43 | 1 | 43 | 24 | 26 | 50 | 1 | 0.52 | 24 |
| 44 | 1 | 44 | 23 | 27 | 50 | 1 | 0.54 | 23 |
| 45 | 1 | 45 | 22 | 28 | 50 | 1 | 0.56 | 22 |
| 46 | 1 | 46 | 22 | 28 | 50 | 0 | 0.56 | 22 |
| 47 | 1 | 47 | 21 | 29 | 50 | 1 | 0.58 | 21 |
| 48 | 1 | 48 | 21 | 29 | 50 | 0 | 0.58 | 21 |
| 49 | 1 | 49 | 20 | 30 | 50 | 1 | 0.60 | 20 |
| 50 | 1 | 50 | 20 | 30 | 50 | 0 | 0.60 | 20 |
Note that in the .tracker.list we pass the function
objects (epi_prop_infected, epi_s_num) without
parentheses—we are storing the functions, not calling them.
Trackers are always run at the end of a simulation step. The values they output reflect the state of the simulation after all modules have completed their work for that step.
Recording Arbitrary Objects (Advanced Debugging)
When debugging complex research models, you may want to inspect the
state of an object during simulation without stopping it.
record_raw_object() saves any R object at a given time step
for later inspection.
introspect_module <- function(dat, at) {
age <- get_attr(dat, "age")
if (mean(age, na.rm = TRUE) > 50) {
obj <- data.frame(
age = age,
status = get_attr(dat, "status")
)
dat <- record_raw_object(dat, at, "old pop", obj)
}
return(dat)
}After simulation, access recorded objects from the
netsim output:
sim <- netsim(est, param, init, control)
# Access raw records for simulation 1
# Returns a nested list keyed by label, then time step
raw <- sim[["raw.records"]][[1]]
# Access the "old pop" data.frame recorded at step 75
raw[["old pop"]][[75]]Use sparingly: Raw records are not truncated and can consume substantial memory, especially for large attribute vectors recorded at many time steps. This feature is intended for debugging, not routine output collection.