R for modellers - Vignette 04

Data types and simple operations

Julien Arino

Department of Mathematics

University of Manitoba*

* The University of Manitoba campuses are located on original lands of Anishinaabeg, Cree, Oji-Cree, Dakota and Dene peoples, and on the homeland of the Métis Nation.


Two ways:

X <- 10


X = 10

First version is preferred by R purists.. I don’t really care


A very useful data structure, quite flexible and versatile

Empty list

L <- list()

Convenient for things like parameters

L$a <- 10
L$b <- 3
L[["another_name"]] <- "Plouf plouf"
[1] 10
[1] 3
[1] 10
[1] 3
[1] "Plouf plouf"

Accessing subsets of list entries

L = list()
for (i in 1:10) {
  L[[i]] = 2*i

Then to access entries 3 and 4

[1] 6

[1] 8

List names can be parameters

L <- list()
L$a <- 10
L$b <- 3
L[["another_name"]] <- "Plouf plouf"
for (n in names(L)) {
  writeLines(paste0("n=", n, ", L[[n]]=", L[[n]]))
n=a, L[[n]]=10
n=b, L[[n]]=3
n=another_name, L[[n]]=Plouf plouf

List of lists

L <- list()
L[["2024"]] = list()
L[["2024"]]$population = 200
L[["2024"]]$v = 1:5
[1] 200

[1] 1 2 3 4 5

Convenient: we could replicate the same list elements for “2023”, for instance


x = 1:10
(y <- c(x, 12))
 [1]  1  2  3  4  5  6  7  8  9 10 12

(Line 2: surrounded by ( ) so that the result appears)

Concatenating two vectors

  • The c() command is ubiquitous in R

  • Used to make vectors, concatenate them, etc.

x = 1:5
y = 10:12
(z = c(x, y))
[1]  1  2  3  4  5 10 11 12

Vectors have a single entry type

z = c("red", "blue")
(z = c(z, 1))
[1] "red"  "blue" "1"   

Since the first two entries are characters, the added entry is also a character. Contrary to lists, vectors have all entries of the same type

Populating an empty vector

v = c()
for (i in 1:10) {
  v = c(v, 2*i)
 [1]  2  4  6  8 10 12 14 16 18 20

Very useful method to create a vector if you don’t know in advance how many entries it will have

Vector operations - Beware !


x = 1:10

Then x+1 gives

 [1]  2  3  4  5  6  7  8  9 10 11

i.e., adds 1 to all entries in the vector

Use seq to make more complex sequences

(x = seq(from = 2, to = 10, by = 1.5))
[1] 2.0 3.5 5.0 6.5 8.0 9.5

The (from, to, by) form is the default; others exist

y = seq(from = 2, to = 100, length.out = 6)

round(y, 2)
[1]   2.0  21.6  41.2  60.8  80.4 100.0

Naming vector entries

It is possible (and often useful) to name vector entries

x = seq(from = 2, to = 10, by = 1.5)
names(x) = sprintf("v%d", 1:length(x))
 v1  v2  v3  v4  v5  v6 
2.0 3.5 5.0 6.5 8.0 9.5 



Matrix (or vector) of zeros

(A <- mat.or.vec(nr = 2, nc = 3))
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0

Matrix with prescribed entries

(B <- matrix(c(1,2,3,4), nr = 2, nc = 2))
     [,1] [,2]
[1,]    1    3
[2,]    2    4
(C <- matrix(c(5,6,7,8), nc = 2, nr = 2, 
             byrow = TRUE))
     [,1] [,2]
[1,]    5    6
[2,]    7    8

Here and elsewhere, naming the arguments (e.g., nr = 2) allows to use arguments in any order

Matrix operations

Probably the biggest annoyance in R compared to other languages !

  • A*B is the Hadamard product \(A\circ B\) (denoted A.*B in matlab), not the standard matrix multiplication

  • Standard matrix multiplication is A %*% B

For the matlab-ers here

  • R does not have the keyword end to access the last entry in a matrix/vector/list..

  • Use length (lists or vectors), nchar (character chains), dim (matrices.. careful, of course returns 2 values)

Concatenating matrices

Dimensions must be compatible

rbind(B, C)
     [,1] [,2]
[1,]    1    3
[2,]    2    4
[3,]    5    6
[4,]    7    8
cbind(B, C)
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    6
[2,]    2    4    7    8

Concatenating vectors and matrices

v = c(9, 10)
rbind(B, v)
  [,1] [,2]
     1    3
     2    4
v    9   10
cbind(B, v)
[1,] 1 3  9
[2,] 2 4 10

Naming matrix rows/columns

Can be useful sometimes

rownames(B) = c("before", "after")
colnames(B) = c("Jane", "John")
       Jane John
before    1    3
after     2    4

Not assigning a value returns the existing values, if any

[1] "before" "after" 

Access matrix/vector entries

By position

[1] 3
[1] 9

By name, if present, and combining

B["before", "Jane"]
[1] 1
B["before", 2]
[1] 3

Whole rows/columns

B["before", ]
Jane John 
   1    3 
B[, "Jane"]
before  after 
     1      2 
     [,1] [,2]
[1,]    5    6
[2,]    7    8


D = matrix(data = runif(100), nc = 10)
D[2:3, 5:7]
          [,1]      [,2]       [,3]
[1,] 0.2398867 0.3167605 0.04903629
[2,] 0.8981800 0.7612056 0.60225418

runif(100): generate 100 uniformly distributed random numbers between the default min=0 and max=1

Note that indices are “local” to the result

Submatrix of a named matrix

D = matrix(data = runif(100), nc = 10)
rownames(D) = sprintf("R%d", 1:dim(D)[1])
colnames(D) = sprintf("C%d", 1:dim(D)[2])
(E = D[2:3, 5:7])
          C5        C6        C7
R2 0.3275763 0.5880480 0.3150221
R3 0.8573982 0.3703588 0.7112473

Indices are “local” but names are those “extracted”

E["R3", "C6"]
[1] 0.3703588
E[2, 2]
[1] 0.3703588

Data frames

Data frames

From the R documentation:

[data frames are] tightly coupled collections of variables which share many of the properties of matrices and of lists, used as the fundamental data structure by most of R’s modeling software

Data frames are lists and matrices

  • Easier to access elements than in lists

  • More flexible than matrices (columns can be of different types)
L3 <- LETTERS[1:3]
fac <- sample(L3, 8, replace = TRUE)
(df <- data.frame(x = 1, y = 1:8, fac = fac))
  x y fac
1 1 1   B
2 1 2   A
3 1 3   A
4 1 4   B
5 1 5   C
6 1 6   B
7 1 7   A
8 1 8   A

[1] TRUE

Data frames are lists and matrices (2)

[1] "B" "A" "A" "B" "C" "B" "A" "A"
[1] "B" "A" "A" "B" "C" "B" "A" "A"
df[, "fac"]
[1] "B" "A" "A" "B" "C" "B" "A" "A"
df[, 3]
[1] "B" "A" "A" "B" "C" "B" "A" "A"

[1] "A"
[1] "A"
df[2, "fac"]
[1] "A"
df[2, 3]
[1] "A"


The which function

  • Extremely useful

  • Important to learn how to use

Give the TRUE indices of a logical object, allowing for array indices

TRUE indices of a logical object?

  • Return to logical tests in Vignette 05 about flow control

  • TRUE indices: those indices for which a property is TRUE

  • E.g., \(x<1\)?
  x y fac
1 1 1   B
2 1 2   A
3 1 3   A
4 1 4   B
5 1 5   C
6 1 6   B
7 1 7   A
8 1 8   A
df$y < 5
which(df$y < 5)
[1] 1 2 3 4
df$fac == "A"
which(df$fac == "A")
[1] 2 3 7 8

which is useful

df$fac[which(df$fac == "A")] = "Z"
  x y fac
1 1 1   B
2 1 2   Z
3 1 3   Z
4 1 4   B
5 1 5   C
6 1 6   B
7 1 7   Z
8 1 8   Z

which can return array indices

E = matrix(data = runif(25), nr = 5)
(rc = which(E < 0.1, arr.ind = TRUE))
     row col
[1,]   3   1
[2,]   3   3
[3,]   5   3
E[rc] = Inf
round(E, digits = 2)
     [,1] [,2] [,3] [,4] [,5]
[1,] 0.58 0.87 0.63 0.14 0.87
[2,] 0.11 0.69 0.43 0.33 0.26
[3,]  Inf 0.13  Inf 0.46 0.80
[4,] 0.60 0.67 0.97 0.35 0.29
[5,] 0.36 0.57  Inf 0.30 0.58

Type checking/casting

Checking types

is.type, for whatever type, is typically defined

is.array, is.atomic, is.character, is.data.frame, is.double, is.function, is.integer, is.list, is.logical, is.matrix, is.numeric, is.object, is.vector

Many packages also define specific types


Typically, if is.type exists for type type, then as.type also exists

as.array, as.data.frame, as.list, as.matrix, as.numeric, as.vector

Often: matrix \(\leftrightarrow\) data frame, list \(\leftrightarrow\) matrix

Example: matrix to list

to_vary_m = 
  expand.grid(p1= seq(1, 3, length.out = 10),
              p2 = seq(0.8, 3, length.out = 10))
to_vary_l = split(to_vary_m, seq(nrow(to_vary_m)))

expand.grid: makes a matrix with every combination of the values of the vectors p1 and p2

        p1  p2
3 1.444444 0.8
        p1  p2
3 1.444444 0.8