One major advantage in my view: uses a lot of C and Fortran code. E.g., deSolve:
The functions provide an interface to the FORTRAN functions 'lsoda', 'lsodar', 'lsode', 'lsodes' of the 'ODEPACK' collection, to the FORTRAN functions 'dvode', 'zvode' and 'daspk' and a C-implementation of solvers of the 'Runge-Kutta' family with fixed or variable time steps
Very active community on the web, easy to find solutions (same true of Python, I just prefer R)
Rmarkdown: markdown that incorporates R commands. Useful for generating reports in html or pdf, can make slides as well..
RSweave: LaTeX incorporating R commands. Useful for generating reports. Not used as much as Rmarkdown these days
R is a scripted language
Interactive
Allows you to work in real time
Be careful: what is in memory might involve steps not written down in a script
If you want to reproduce your steps, it is good to write all the steps down in a script and to test from time to time running using Rscript: this will ensure that all that is required to run is indeed loaded to memory when it needs to, i.e., that it is not already there..
Programming in R
Similar to matlab..
.. with some differences, of course! Otherwise, where would the fun be? ;)
Assignment
Two ways:
X <- 10
or
X = 10
First version is preferred by R purists.. I don't really care
Lists
A very useful data structure, quite flexible and versatile. Empty list: L <- list(). Convenient for things like parameters. For instance
x = 1:10
y <- c(x, 12)
> y
[1] 1 2 3 4 5 6 7 8 9 10 12
z = c("red", "blue")
> z
[1] "red" "blue"
z = c(z, 1)
> z
[1] "red" "blue" "1"
Note that in z, since the first two entries are characters, the added entry is also a character. Contrary to lists, vectors have all entries of the same type
Matrices
Matrix (or vector) of zeros
A <- mat.or.vec(nr = 2, nc = 3)
Matrix with prescribed entries
B <- matrix(c(1,2,3,4), nr = 2, nc = 2)
> B
[,1] [,2]
[1,] 1 3
[2,] 2 4
C <- matrix(c(1,2,3,4), nr = 2, nc = 2, byrow = TRUE)
> C
[,1] [,2]
[1,] 1 2
[2,] 3 4
Remark that here and elsewhere, naming the arguments (e.g., nr = 2) allows to use arguments in any order
Matrix operations
Probably the biggest annoyance in R compared to other languages
The notation A*B is the Hadamard product (what would be denoted A.*B in matlab), not the standard matrix multiplication
Matrix multiplication is written A %*% B
Vector operations
Vector addition is also frustrating. Say you write x=1:10, i.e., make the vector
> x
[1] 1 2 3 4 5 6 7 8 9 10
Then x+1 gives
> x+1
[1] 2 3 4 5 6 7 8 9 10 11
i.e., adds 1 to all entries in the vector
Beware of this in particular when addressing sets of indices in lists, vectors or matrices
For the matlab-ers here
R does not have the keyword end to access the last entry in a matrix/vector/list..
Use length (lists or vectors), nchar (character chains), dim (matrices.. careful, of course returns 2 values)
Flow control
if (condition is true) {
list of stuff to do
}
Even if list of stuff to do is a single instruction, best to use curly braces
if (condition is true) {
list of stuff to do
} else if (another condition) {
...
} else {
...
}
For loops
for applies to lists or vectors
for (i in 1:10) {
something using integer i
}
for (j in c(1,3,4)) {
something using integer j
}
for (n in c("truc", "muche", "chose")) {
something using string n
}
for (m in list("truc", "muche", "chose", 1, 2)) {
something using string n or integer n, depending
}
lapply
Very useful function (a few others in the same spirit: sapply, vapply, mapply)
Applies a function to each entry in a list/vector/matrix
Because there is a parallel version (parLapply) that we will see later, worth learning
l = list()
for (i in 1:10) {
l[[i]] = runif(i)
}
lapply(X = l, FUN = mean)
or, to make a vector
unlist(lapply(X = l, FUN = mean))
or
sapply(X = l, FUN = mean)
"Advanced" lapply
Can "pick up" nontrivial list entries
l = list()
for (i in 1:10) {
l[[i]] = list()
l[[i]]$a = runif(i)
l[[i]]$b = runif(2*i)
}
sapply(X = l, FUN = function(x) length(x$b))
gives
[1] 2 4 6 8 10 12 14 16 18 20
Just recall: the argument to the function you define is a list entry (l[[1]], l[[2]], etc., here)
Avoid parameter variation loops with expand.grid
# Suppose we want to vary 3 parameters
variations = list(
p1 = seq(1, 10, length.out = 10),
p2 = seq(0, 1, length.out = 10),
p3 = seq(-1, 1, length.out = 10)
)
# Create the list
tmp = expand.grid(variations)
PARAMS = list()
for (i in 1:dim(tmp)[1]) {
PARAMS[[i]] = list()
for (k in 1:length(variations)) {
PARAMS[[i]][[names(variations)[k]]] = tmp[i, k]
}
}
There is still a loop, but you can split this list, use it on different machines, etc. And can use parLapply
library(tidyverse)
# Get bounding polygon for Winnipeg
bb_poly = osmdata::getbb(place_name = "winnipeg",
format_out = "polygon")
# Get roads
roads <- osmdata::opq(bbox = bb_poly) %>%
osmdata::add_osm_feature(key = 'highway',
value = 'residential') %>%
osmdata::osmdata_sf () %>%
osmdata::trim_osmdata (bb_poly)
# Get rivers
rivers <- osmdata::opq(bbox = bb_poly) %>%
osmdata::add_osm_feature(key = 'waterway',
value = "river") %>%
osmdata::osmdata_sf () %>%
osmdata::trim_osmdata (bb_poly)
And we finish easily
We have the pairs of trees potentially in contact with each other
We have the roads and rivers of the city, which is a collection of line segments
If there is an intersection between a tree pair and a road/river, then we can forget this tree pair as their root systems cannot come into contact
st_crs(tree_pairs) = sf::st_crs(roads$osm_lines$geometry)
iroads = sf::st_intersects(x = roads$osm_lines$geometry,
y = tree_pairs)
irivers = sf::st_intersects(x = rivers$osm_lines$geometry,
y = tree_pairs)
tree_pairs_roads_intersect = c()
for (i in 1:length(iroads)) {
if (length(iroads[[i]])>0) {
tree_pairs_roads_intersect = c(tree_pairs_roads_intersect,
iroads[[i]])
}
}
tree_pairs_roads_intersect = sort(tree_pairs_roads_intersect)
to_keep = 1:dim(tree_locs_orig)[1]
to_keep = setdiff(to_keep,tree_pairs_roads_intersect)
Data wrangling
Data wrangling: dplyr vs sqldf
dplyr is part of the tidyverse set of libraries. Load magrittr and its pipe %>%
sqldf allows to use SQL on dataframes.. interesting alternative if you know SQL
library(sqldf)
library(dplyr)
SARS = read.csv("../DATA/SARS-CoV-1_data.csv")
## Three ways to keep only the data for one country
ctry = "Canada"
# The basic one
idx = which(SARS$country == ctry)
SARS_selected = SARS[idx,]
# The sqldf way
SARS_selected = sqldf(paste0("SELECT * FROM SARS WHERE country = '",
ctry, "'"))
# The dplyr way
SARS_selected = SARS %>%
filter(country == ctry)
# Create incidence for the selected country. diff does difference one by one,
# so one less entry than the vector on which it is being used, thus we pad with a 0.
SARS_selected$incidence = c(0, diff(SARS_selected$totalNumberCases))
# Keep only positive incidences (discard 0 or negative adjustments)
SARS_selected = SARS_selected %>%
filter(incidence > 0)
# Plot the result
# Before plotting, we need to make the dates column we will use be actual dates..
SARS_selected$toDate = lubridate::ymd(SARS_selected$toDate)
EpiCurve(SARS_selected,
date = "toDate", period = "day",
freq = "incidence",
title = "SARS-CoV-1 incidence in Canada in 2003")
The functions provide an interface to the FORTRAN functions 'lsoda', 'lsodar', 'lsode', 'lsodes' of the 'ODEPACK' collection, to the FORTRAN functions 'dvode', 'zvode' and 'daspk' and a C-implementation of solvers of the 'Runge-Kutta' family with fixed or variable time steps
So you are benefiting from years and year of experience: ODEPACK is a set of Fortran (77!) solvers developed at Lawrence Livermore National Laboratory (LLNL) starting in the late 70s
Other good solvers are also included, those written in C
As with more numerical solvers, you need to write a function returning the value of the right hand side of your equation (the vector field) at a given point in phase space, then call this function from the solver
library(deSolve)
rhs_logistic <- function(t, x, p) {
with(as.list(x), {
dN <- p$r * N *(1-N/p$K)
return(list(dN))
})
}
params = list(r = 0.1, K = 100)
IC = c(N = 50)
times = seq(0, 100, 1)
sol <- ode(IC, times, rhs_logistic, params)
This also works: add p to arguments of as.list and thus use without p$ prefix
library(deSolve)
rhs_logistic <- function(t, x, p) {
with(as.list(c(x, p)), {
dN <- r * N *(1-N/K)
return(list(dN))
})
}
params = list(r = 0.1, K = 100)
IC = c(N = 50)
times = seq(0, 100, 1)
sol <- ode(IC, times, rhs_logistic, params)
In this case, beware of not having a variable and a parameter with the same name..
Default method: lsoda
lsoda switches automatically between stiff and nonstiff methods
You can also specify other methods: "lsode", "lsodes", "lsodar", "vode", "daspk", "euler", "rk4", "ode23", "ode45", "radau", "bdf", "bdf_d", "adams", "impAdams" or "impAdams_d" ,"iteration" (the latter for discrete-time systems)
ode(y, times, func, parms,
method = "ode45")
You can even implement your own integration method
Example - Fitting data
Example - Fitting data
Note that this is a super simplified version of what to do