Vector addition is also frustrating. Say you write x=1:10
, i.e., make the vector
> x
[1] 1 2 3 4 5 6 7 8 9 10
Then x+1
gives
> x+1
[1] 2 3 4 5 6 7 8 9 10 11
i.e., adds 1 to all entries in the vector
Beware of this in particular when addressing sets of indices in lists, vectors or matrices
end
to access the last entry in a matrix/vector/list..length
(lists or vectors), nchar
(character chains), dim
(matrices.. careful, of course returns 2 values)if (condition is true) {
list of stuff to do
}
Even if list of stuff to do
is a single instruction, best to use curly braces
if (condition is true) {
list of stuff to do
} else if (another condition) {
...
} else {
...
}
for
applies to lists or vectors
for (i in 1:10) {
something using integer i
}
for (j in c(1,3,4)) {
something using integer j
}
for (n in c("truc", "muche", "chose")) {
something using string n
}
for (m in list("truc", "muche", "chose", 1, 2)) {
something using string n or integer n, depending
}
Very useful function (a few others in the same spirit: sapply
, vapply
, mapply
)
Applies a function to each entry in a list/vector/matrix
Because there is a parallel version (parLapply
) that we will see later, worth learning
l = list()
for (i in 1:10) {
l[[i]] = runif(i)
}
lapply(X = l, FUN = mean)
or, to make a vector
unlist(lapply(X = l, FUN = mean))
or
sapply(X = l, FUN = mean)
Can "pick up" nontrivial list entries
l = list()
for (i in 1:10) {
l[[i]] = list()
l[[i]]$a = runif(i)
l[[i]]$b = runif(2*i)
}
sapply(X = l, FUN = function(x) length(x$b))
gives
[1] 2 4 6 8 10 12 14 16 18 20
Just recall: the argument to the function you define is a list entry (l[[1]]
, l[[2]]
, etc., here)
# Suppose we want to vary 3 parameters
variations = list(
p1 = seq(1, 10, length.out = 10),
p2 = seq(0, 1, length.out = 10),
p3 = seq(-1, 1, length.out = 10)
)
# Create the list
tmp = expand.grid(variations)
PARAMS = list()
for (i in 1:dim(tmp)[1]) {
PARAMS[[i]] = list()
for (k in 1:length(variations)) {
PARAMS[[i]][[names(variations)[k]]] = tmp[i, k]
}
}
There is still a loop, but you can split this list, use it on different machines, etc. And can use parLapply
(more on that later)
Recent movement (5-10 years): governments (local or higher) create portals where data are centralised and published
library(wbstats)
source("useful_functions.R")
pop_data_CTRY <- wb_data(country = "CAN", indicator = "SP.POP.TOTL",
mrv = 100, return_wide = FALSE)
y_range = range(pop_data_CTRY$value)
y_axis <- make_y_axis(y_range)
png(file = "pop_CAN.png",
width = 800, height = 400)
plot(pop_data_CTRY$date, pop_data_CTRY$value * y_axis$factor,
xlab = "Year", ylab = "Population", type = "b", lwd = 2,
yaxt = "n")
axis(2, at = y_axis$ticks, labels = y_axis$labels, las = 1)
dev.off()
crop_figure("pop_CAN.png")
dplyr
vs sqldf
dplyr
is part of the tidyverse
set of libraries. Load magrittr
and its pipe %>%
sqldf
allows to use SQL on dataframes.. interesting alternative if you know SQL
library(sqldf)
library(dplyr)
SARS = read.csv("../DATA/SARS-CoV-1_data.csv")
## Three ways to keep only the data for one country
ctry = "Canada"
# The basic one
idx = which(SARS$country == ctry)
SARS_selected = SARS[idx,]
# The sqldf way
SARS_selected = sqldf(paste0("SELECT * FROM SARS WHERE country = '",
ctry, "'"))
# The dplyr way
SARS_selected = SARS %>%
filter(country == ctry)
# Create incidence for the selected country. diff does difference one by one,
# so one less entry than the vector on which it is being used, thus we pad with a 0.
SARS_selected$incidence = c(0, diff(SARS_selected$totalNumberCases))
# Keep only positive incidences (discard 0 or negative adjustments)
SARS_selected = SARS_selected %>%
filter(incidence > 0)
# Plot the result
# Before plotting, we need to make the dates column we will use be actual dates..
SARS_selected$toDate = lubridate::ymd(SARS_selected$toDate)
EpiCurve(SARS_selected,
date = "toDate", period = "day",
freq = "incidence",
title = "SARS-CoV-1 incidence in Canada in 2003")
deSolve
:The functions provide an interface to the FORTRAN functions 'lsoda', 'lsodar', 'lsode', 'lsodes' of the 'ODEPACK' collection, to the FORTRAN functions 'dvode', 'zvode' and 'daspk' and a C-implementation of solvers of the 'Runge-Kutta' family with fixed or variable time steps
So you are benefiting from years and year of experience: ODEPACK is a set of Fortran (77!) solvers developed at Lawrence Livermore National Laboratory (LLNL) starting in the late 70s
Other good solvers are also included, those written in C
Refer to the package help for details
As with more numerical solvers, you need to write a function returning the value of the right hand side of your equation (the vector field) at a given point in phase space, then call this function from the solver
library(deSolve)
rhs_logistic <- function(t, x, p) {
with(as.list(x), {
dN <- p$r * N *(1-N/p$K)
return(list(dN))
})
}
params = list(r = 0.1, K = 100)
IC = c(N = 50)
times = seq(0, 100, 1)
sol <- ode(IC, times, rhs_logistic, params)
This also works: add p
to arguments of as.list
and thus use without p$
prefix
library(deSolve)
rhs_logistic <- function(t, x, p) {
with(as.list(c(x, p)), {
dN <- r * N *(1-N/K)
return(list(dN))
})
}
params = list(r = 0.1, K = 100)
IC = c(N = 50)
times = seq(0, 100, 1)
sol <- ode(IC, times, rhs_logistic, params)
In this case, beware of not having a variable and a parameter with the same name..
lsoda
lsoda
switches automatically between stiff and nonstiff methods
You can also specify other methods: "lsode", "lsodes", "lsodar", "vode", "daspk", "euler", "rk4", "ode23", "ode45", "radau", "bdf", "bdf_d", "adams", "impAdams" or "impAdams_d" ,"iteration" (the latter for discrete-time systems)
ode(y, times, func, parms, method = "ode45")
or, if using standard incidence
See the code SIR_KMK_fitting.R, which we will go over now
R
Life results from the interaction of countless processes
Each process happens with some degree of imprecision. E.g., when a virus replicates, it misses a base here and there; when a cell undergoes mitosis, material is not split 50-50; when you meet another human, you forget to shake their hand. Etc.
Because of repetition, this gives rise to uncertainty
Stochastic systems incorporate uncertainty
Constant total population
Basic reproduction number
Next slides:
In following slide, we
CTMC similar to DTMC except in way they handle time between events (transitions)
DTMC: transitions occur each
CTMC:
CTMC are roughly equivalent to ODE
A continuous time Markov chain can be formulated in terms of
Here, time is in
For small
with
Easy as
Transition | Effect | Weight | Probability |
---|---|---|---|
new infection | |||
recovery of an infectious |
States are
Transition | Effect | Weight | Probability |
---|---|---|---|
birth of a susceptible | |||
death of a susceptible | |||
new infection | |||
death of an infectious | |||
recovery of an infectious |
States are
Transition | Effect | Weight | Probability |
---|---|---|---|
new infection | |||
recovery of an infectious |
States are
Suppose system has state
set
while {
If you do not have an exponential distribution random number generator.. We want
Use cumulative distribution function
which has values in
set
while {
(See simulate_birth_death_CTMC.R
on course GitHub repo)
set
while {
b = 0.01 # Birth rate
d = 0.01 # Death rate
t_0 = 0 # Initial time
N_0 = 100 # Initial population
# Vectors to store time and state. Initialise with initial condition.
t = t_0
N = N_0
t_f = 1000 # Final time
# We'll track the current time and state (could also just check last entry in t
# and N, but will take more operations)
t_curr = t_0
N_curr = N_0
while (t_curr<=t_f) {
xi_t = (b+d)*N_curr
# The exponential number generator does not like a rate of 0 (when the
# population crashes), so we check if we need to quit
if (N_curr == 0) {
break
}
tau_t = rexp(1, rate = xi_t)
t_curr = t_curr+tau_t
v = c(b*N_curr, xi_t)/xi_t
zeta_t = runif(n = 1)
pos = findInterval(zeta_t, v)+1
switch(pos,
{
# Birth
N_curr = N_curr+1
},
{
# Death
N_curr = N_curr-1
})
N = c(N, N_curr)
t = c(t, t_curr)
}
> tail(diff(t))
[1] 1.282040e-05 5.386999e-04 5.468540e-04 1.779985e-04 6.737294e-05 2.618084e-04
GillespieSSA2
and adaptivetau
, two standard packages for SSA in R
, implement tau leapinglibrary(GillespieSSA2)
IC <- c(S = (Pop-I_0), I = I_0)
params <- c(gamma = gamma, beta = beta)
reactions <- list(
reaction("beta*S*I", c(S=-1,I=+1), "new_infection"),
reaction("gamma*I", c(S=+1,I=-1), "recovery")
)
set.seed(NULL)
sol <- ssa(
initial_state = IC,
reactions = reactions,
params = params,
method = ssa_exact(),
final_time = t_f,
)
plot(sol$time, sol$state[,"I"], type = "l",
xlab = "Time (days)", ylab = "Number infectious")
R
To see multiple realisations: good idea to parallelise, then interpolate results. Write a function, e.g., run_one_sim
that .. runs one simulation, then..
nb_cores <- detectCores()
if (nb_cores > 124) {
nb_cores = 124
}
cl <- makeCluster(nb_cores)
clusterEvalQ(cl,{
library(GillespieSSA2)
})
clusterExport(cl,
c("params",
"run_one_sim"),
envir = .GlobalEnv)
SIMS = parLapply(cl = cl,
X = 1:params$number_sims,
fun = function(x) run_one_sim(params))
stopCluster(cl)
See SIS_CTMC_parallel.R
on course GitHub repo
Run the parallel code for 50 sims between tictoc::tic()
and tictoc::toc()
, giving 4.785 sec elapsed
, then the sequential version
tictoc::tic()
SIMS = lapply(X = 1:params$number_sims,
FUN = function(x) run_one_sim(params))
tictoc::toc()
which gives 158.306 sec elapsed
on a 64C/128T AMD Ryzen Threadripper 3990X. That is, parallel ran 33.08