R for modellers - Vignette 04

Data types and simple operations

Julien Arino

Department of Mathematics

University of Manitoba*

* The University of Manitoba campuses are located on original lands of Anishinaabeg, Cree, Oji-Cree, Dakota and Dene peoples, and on the homeland of the Métis Nation.

Assignment

Two ways:

X <- 10

X = 10

First version is preferred by R purists.. I don’t really care

Lists

A very useful data structure, quite flexible and versatile

Empty list

L <- list()

Convenient for things like parameters

L$a <- 10
L$b <- 3
L[["another_name"]] <- "Plouf plouf"

L[1]

$a
[1] 10

L[[2]]

[1] 3

L$a

[1] 10

L[["b"]]

[1] 3

L$another_name

[1] "Plouf plouf"

Accessing subsets of list entries

L = list()
for (i in 1:10) {
  L[[i]] = 2*i
}

Then to access entries 3 and 4

L[3:4]

[[1]]
[1] 6

[[2]]
[1] 8

List names can be parameters

L <- list()
L$a <- 10
L$b <- 3
L[["another_name"]] <- "Plouf plouf"
for (n in names(L)) {
  writeLines(paste0("n=", n, ", L[[n]]=", L[[n]]))
}

n=a, L[[n]]=10
n=b, L[[n]]=3
n=another_name, L[[n]]=Plouf plouf

List of lists

L <- list()
L[["2024"]] = list()
L[["2024"]]$population = 200
L[["2024"]]$v = 1:5

$`2024`
$`2024`$population
[1] 200

$`2024`$v
[1] 1 2 3 4 5

Convenient: we could replicate the same list elements for “2023”, for instance

Vectors

x = 1:10
(y <- c(x, 12))

 [1]  1  2  3  4  5  6  7  8  9 10 12

(Line 2: surrounded by ( ) so that the result appears)

Concatenating two vectors

The c() command is ubiquitous in R
Used to make vectors, concatenate them, etc.

x = 1:5
y = 10:12
(z = c(x, y))

[1]  1  2  3  4  5 10 11 12

Vectors have a single entry type

z = c("red", "blue")
(z = c(z, 1))

[1] "red"  "blue" "1"

Since the first two entries are characters, the added entry is also a character. Contrary to lists, vectors have all entries of the same type

Populating an empty vector

v = c()
for (i in 1:10) {
  v = c(v, 2*i)
}
v

 [1]  2  4  6  8 10 12 14 16 18 20

Very useful method to create a vector if you don’t know in advance how many entries it will have

Vector operations - Beware !

Say

x = 1:10

Then x+1 gives

x+1

 [1]  2  3  4  5  6  7  8  9 10 11

i.e., adds 1 to all entries in the vector

Use seq to make more complex sequences

(x = seq(from = 2, to = 10, by = 1.5))

[1] 2.0 3.5 5.0 6.5 8.0 9.5

The (from, to, by) form is the default; others exist

y = seq(from = 2, to = 100, length.out = 6)

round(y, 2)

[1]   2.0  21.6  41.2  60.8  80.4 100.0

Naming vector entries

It is possible (and often useful) to name vector entries

x = seq(from = 2, to = 10, by = 1.5)
names(x) = sprintf("v%d", 1:length(x))
x

 v1  v2  v3  v4  v5  v6 
2.0 3.5 5.0 6.5 8.0 9.5

x["v5"]

v5 
 8

Matrices

Matrix (or vector) of zeros

(A <- mat.or.vec(nr = 2, nc = 3))

     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0

Matrix with prescribed entries

(B <- matrix(c(1,2,3,4), nr = 2, nc = 2))

     [,1] [,2]
[1,]    1    3
[2,]    2    4

(C <- matrix(c(5,6,7,8), nc = 2, nr = 2, 
             byrow = TRUE))

     [,1] [,2]
[1,]    5    6
[2,]    7    8

Here and elsewhere, naming the arguments (e.g., nr = 2) allows to use arguments in any order

Matrix operations

Probably the biggest annoyance in R compared to other languages !

A*B is the Hadamard product \(A\circ B\) (denoted A.*B in matlab), not the standard matrix multiplication

Standard matrix multiplication is A %*% B

For the matlab-ers here

R does not have the keyword end to access the last entry in a matrix/vector/list..

Use length (lists or vectors), nchar (character chains), dim (matrices.. careful, of course returns 2 values)

Concatenating matrices

Dimensions must be compatible

rbind(B, C)

     [,1] [,2]
[1,]    1    3
[2,]    2    4
[3,]    5    6
[4,]    7    8

cbind(B, C)

     [,1] [,2] [,3] [,4]
[1,]    1    3    5    6
[2,]    2    4    7    8

Concatenating vectors and matrices

v = c(9, 10)
rbind(B, v)

  [,1] [,2]
     1    3
     2    4
v    9   10

cbind(B, v)

          v
[1,] 1 3  9
[2,] 2 4 10

Naming matrix rows/columns

Can be useful sometimes

rownames(B) = c("before", "after")
colnames(B) = c("Jane", "John")
B

       Jane John
before    1    3
after     2    4

Not assigning a value returns the existing values, if any

rownames(B)

[1] "before" "after"

colnames(C)

NULL

Access matrix/vector entries

By position

B[1,2]

[1] 3

v[1]

[1] 9

By name, if present, and combining

B["before", "Jane"]

[1] 1

B["before", 2]

[1] 3

Whole rows/columns

B["before", ]

Jane John 
   1    3

B[, "Jane"]

before  after 
     1      2

C[,]

     [,1] [,2]
[1,]    5    6
[2,]    7    8

Submatrices

D = matrix(data = runif(100), nc = 10)
D[2:3, 5:7]

          [,1]      [,2]       [,3]
[1,] 0.2398867 0.3167605 0.04903629
[2,] 0.8981800 0.7612056 0.60225418

runif(100): generate 100 uniformly distributed random numbers between the default min=0 and max=1

Note that indices are “local” to the result

Submatrix of a named matrix

D = matrix(data = runif(100), nc = 10)
rownames(D) = sprintf("R%d", 1:dim(D)[1])
colnames(D) = sprintf("C%d", 1:dim(D)[2])
(E = D[2:3, 5:7])

          C5        C6        C7
R2 0.3275763 0.5880480 0.3150221
R3 0.8573982 0.3703588 0.7112473

Indices are “local” but names are those “extracted”

E["R3", "C6"]

[1] 0.3703588

E[2, 2]

[1] 0.3703588

Data frames

From the R documentation:

[data frames are] tightly coupled collections of variables which share many of the properties of matrices and of lists, used as the fundamental data structure by most of R’s modeling software

Data frames are lists and matrices

Easier to access elements than in lists

More flexible than matrices (columns can be of different types)

L3 <- LETTERS[1:3]
fac <- sample(L3, 8, replace = TRUE)
(df <- data.frame(x = 1, y = 1:8, fac = fac))

is.character(df$x)

[1] FALSE

is.character(df$fac)

[1] TRUE

Data frames are lists and matrices (2)

df$fac

[1] "B" "A" "A" "B" "C" "B" "A" "A"

df[["fac"]]

[1] "B" "A" "A" "B" "C" "B" "A" "A"

df[, "fac"]

[1] "B" "A" "A" "B" "C" "B" "A" "A"

df[, 3]

[1] "B" "A" "A" "B" "C" "B" "A" "A"

df$fac[2]

[1] "A"

df[["fac"]][2]

[1] "A"

df[2, "fac"]

[1] "A"

df[2, 3]

[1] "A"

which

The `which` function

Extremely useful

Important to learn how to use

Give the TRUE indices of a logical object, allowing for array indices

TRUE indices of a logical object?

Return to logical tests in Vignette 05 about flow control

TRUE indices: those indices for which a property is TRUE

E.g., \(x<1\)?

df

df$y < 5

[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

which(df$y < 5)

[1] 1 2 3 4

df$fac == "A"

[1] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE

which(df$fac == "A")

[1] 2 3 7 8

`which` is useful

df$fac[which(df$fac == "A")] = "Z"
df

`which` can return array indices

E = matrix(data = runif(25), nr = 5)
(rc = which(E < 0.1, arr.ind = TRUE))

     row col
[1,]   3   1
[2,]   3   3
[3,]   5   3

E[rc] = Inf
round(E, digits = 2)

     [,1] [,2] [,3] [,4] [,5]
[1,] 0.58 0.87 0.63 0.14 0.87
[2,] 0.11 0.69 0.43 0.33 0.26
[3,]  Inf 0.13  Inf 0.46 0.80
[4,] 0.60 0.67 0.97 0.35 0.29
[5,] 0.36 0.57  Inf 0.30 0.58

Type checking/casting

Checking types

is.type, for whatever type, is typically defined

is.array, is.atomic, is.character, is.data.frame, is.double, is.function, is.integer, is.list, is.logical, is.matrix, is.numeric, is.object, is.vector

Many packages also define specific types

Casting

Typically, if is.type exists for type type, then as.type also exists

as.array, as.data.frame, as.list, as.matrix, as.numeric, as.vector

Often: matrix \(\leftrightarrow\) data frame, list \(\leftrightarrow\) matrix

Example: matrix to list

to_vary_m = 
  expand.grid(p1= seq(1, 3, length.out = 10),
              p2 = seq(0.8, 3, length.out = 10))
to_vary_l = split(to_vary_m, seq(nrow(to_vary_m)))

expand.grid: makes a matrix with every combination of the values of the vectors p1 and p2

to_vary_m[3,]

        p1  p2
3 1.444444 0.8

to_vary_l[3]

$`3`
        p1  p2
3 1.444444 0.8

R for modellers - Vignette 04

Data types and simple operations

Assignment

Lists

Accessing subsets of list entries

List names can be parameters

List of lists

Vectors

Concatenating two vectors

Vectors have a single entry type

Populating an empty vector

Vector operations - Beware !

Use seq to make more complex sequences

Naming vector entries

Matrices

Matrix (or vector) of zeros

Matrix with prescribed entries

Matrix operations

For the matlab-ers here

Concatenating matrices

Concatenating vectors and matrices

Naming matrix rows/columns

Access matrix/vector entries

Whole rows/columns

Submatrices

Submatrix of a named matrix

Data frames

Data frames

Data frames are lists and matrices

Data frames are lists and matrices (2)

which

The which function

TRUE indices of a logical object?

which is useful

which can return array indices

Type checking/casting

Checking types

Casting

Example: matrix to list

The `which` function

`which` is useful

`which` can return array indices