Rscript
: this will ensure that all that is required to run is indeed loaded to memory when it needs to, i.e., that it is not already there..Two ways:
X <- 10
or
X = 10
First version is preferred by R purists.. I don't really care
x = 1:10
y <- c(x, 12)
> y
[1] 1 2 3 4 5 6 7 8 9 10 12
z = c("red", "blue")
> z
[1] "red" "blue"
z = c(z, 1)
> z
[1] "red" "blue" "1"
Note that in z
, since the first two entries are characters, the added entry is also a character. Vectors have all entries of the same type, so whatever you put in there first is what it is
Let us do something inefficacious but illustrative
x = c()
for (i in 1:10) {
x = c(x, i)
}
would give us the same as x = 1:10
Vector addition can be frustrating. Say you write x=1:10
, i.e., make the vector
> x
[1] 1 2 3 4 5 6 7 8 9 10
Then x+1
gives
> x+1
[1] 2 3 4 5 6 7 8 9 10 11
i.e., adds 1 to all entries in the vector
Beware of this in particular when addressing sets of indices in lists, vectors or matrices
Matrix (or vector) of size
A <- mat.or.vec(nr = 2, nc = 3)
Matrix with prescribed entries
B <- matrix(c(1,2,3,4), nr = 2, nc = 2)
> B
[,1] [,2]
[1,] 1 3
[2,] 2 4
C <- matrix(c(1,2,3,4), nr = 2, nc = 2, byrow = TRUE)
> C
[,1] [,2]
[1,] 1 2
[2,] 3 4
Remark that here and elsewhere, naming the arguments (e.g., nr = 2
) allows to use arguments in any order
Probably the biggest annoyance in R compared to other languages
A*B
is the Hadamard product A.*B
in matlab), not the standard matrix multiplicationA %*% B
end
to access the last entry in a matrix/vector/list..length
(lists or vectors), nchar
(character chains), dim
(matrices.. careful, of course returns 2 values)A[i,j] # Entry (i,j)
A[i,] # Row i
A[,j] # Column j
A[i,dim(A)[2]] # Last entry in row i
A[dim(A)[1],dim(A)[2]] # Last entry in matrix
A[i,] <- c(1,2,3) # Replace row i by 1,2,3
B[,j] <- c(1,2) # Replace column j by 1,2
Beware, in this case, the dimensions must make sense.. the above operations will fail if
head()
- shows first 6 rows; override with, e.g., head(dataframe, n = 10)
tail()
- shows last 6 rowsdim()
- returns number of rows and number of columnsnrow()
- number of rowsncol()
- number of columnsstr()
- structure of data frame - name, type and preview of data in each columnnames()
or colnames()
- show the names attribute for a data framesapply(dataframe, class)
- shows the class of each column in the data frameA very useful data structure, quite flexible and versatile. Empty list: L <- list()
. Convenient for things like parameters. For instance
L <- list()
L$a <- 10
L$b <- 3
L[["another_name"]] <- "Plouf plouf"
> L[1]
$a
[1] 10
> L[[2]]
[1] 3
> L$a
[1] 10
> L[["b"]]
[1] 3
> L$another_name
[1] "Plouf plouf"
is.type
functions (e.g., is.numeric
, is.character
, is.matrix
, is.list
) allow to check the type of an object, as.type
functions (e.g., as.numeric
, as.character
, as.matrix
, as.list
) allow to convert an object to a given type
The result of data typing can be weird, so it may take a few tries to get things right
In if
statements (see later) and many other places, you need to evaluate the truth value of a statement. Use ==
for equality, !=
for inequality, >
, <
, >=
, <=
for the obvious. &
is the logical and
, |
is the logical or
, !
is the logical not
which
Make a vector of 5 uniformly distributed numbers (by default in
> v = runif(5)
> v
[1] 0.682311734 0.612788785 0.681121278 0.003132367 0.842270188
Then using logical statements and which
helps for selection
> v <= 0.5
[1] FALSE FALSE FALSE TRUE FALSE
> which(v <= 0.5)
[1] 4
(which
returns indices for which the statement is TRUE
)
which
(1)Make a matrix of 9 uniformly distributed numbers
> A = matrix(data = runif(9), nr = 3)
> A
[,1] [,2] [,3]
[1,] 0.1605460 0.18508003 0.6043105
[2,] 0.7762981 0.02225763 0.3739177
[3,] 0.8170578 0.88845646 0.5842683
Then using logical statements and which
helps for selection
> A <= 0.5
[,1] [,2] [,3]
[1,] TRUE TRUE FALSE
[2,] FALSE TRUE TRUE
[3,] FALSE FALSE FALSE
which
(2)Note that by default, which
returns indices of the matrix enumerated column-wise (1-3 are first column, 4-6 are second, etc.)
> which(A <= 0.5)
[1] 1 4 5 8
If you want "proper" matrix indices, use
> which(A <= 0.5, arr.ind = TRUE)
row col
[1,] 1 1
[2,] 1 2
[3,] 2 2
[4,] 2 3
which
to set vector/matrix entriesMake a vector of 5 / matrix of 9 uniformly distributed numbers (by default in
> v = runif(5)
> v[which(v<0.5)] = 0
> v
[1] 0.8877751 0.9500462 0.0000000 0.0000000 0.0000000
> A = matrix(data = runif(9), nr = 3)
> A[which(A<0.5)] = 0
> A
[,1] [,2] [,3]
[1,] 0.9365261 0.0000000 0.0000000
[2,] 0.8927255 0.0000000 0.0000000
[3,] 0.7267821 0.8341371 0.6286996
if (condition is true) {
list of stuff to do
}
Even if list of stuff to do
is a single instruction, best to use curly braces
if (condition is true) {
list of stuff to do
} else if (another condition) {
...
} else {
# This is the default if none of the above conditions are true
...
}
for
applies to lists or vectors
for (i in 1:10) {
something using integer i
}
for (j in c(1,3,4)) {
something using integer j
}
for (n in c("truc", "muche", "chose")) {
something using string n
}
for (m in list("truc", "muche", "chose", 1, 2)) {
something using string n or integer n, depending
}
Very useful function (a few others in the same spirit: sapply
, vapply
, mapply
)
Applies a function to each entry in a list/vector/matrix
Because there is a parallel version (parLapply
) that we will see later, worth learning
l = list()
for (i in 1:10) {
l[[i]] = runif(i)
}
lapply(X = l, FUN = mean)
or, to make a vector
unlist(lapply(X = l, FUN = mean))
or sapply(X = l, FUN = mean)
Can "pick up" nontrivial list entries
l = list()
for (i in 1:10) {
l[[i]] = list()
l[[i]]$a = runif(i)
l[[i]]$b = runif(2*i)
}
sapply(X = l, FUN = function(x) length(x$b))
gives
[1] 2 4 6 8 10 12 14 16 18 20
Just recall: the argument to the function you define is a list entry (l[[1]]
, l[[2]]
, etc., here)
# Suppose we want to vary 3 parameters
variations = list(
p1 = seq(1, 10, length.out = 10),
p2 = seq(0, 1, length.out = 10),
p3 = seq(-1, 1, length.out = 10)
)
# Create the list
tmp = expand.grid(variations)
PARAMS = list()
for (i in 1:dim(tmp)[1]) {
PARAMS[[i]] = list()
for (k in 1:length(variations)) {
PARAMS[[i]][[names(variations)[k]]] = tmp[i, k]
}
}
There is still a loop, but you can split this list, use it on different machines, etc. And can use parLapply
A function needs three things
Generic form ([ ] indicates something optional)
function_name <- function([arguments]) {
set of instructions
[return value]
}
_
is allowed and often used for spacefirstName
, lastName
FirstName
, LastName
first_name
, last_name
first-name
, last-name
. Not allowed in R
!A function can have no arguments, in which case it looks like this
function_name <- function() {
set of instructions
[return value]
}
and is used by calling as function_name()
. E.g.,
print_date = function() {
print(Sys.Date())
}
is used as
> print_date()
[1] "2023-10-13"
You can (and should when possible) set default values for arguments to a function
print_date = function(date_format = "YYYY-MM-DD") {
date = as.character(Sys.Date())
tmp = strsplit(date, "-")
YYYY = tmp[[1]][1]
MM = tmp[[1]][2]
DD = tmp[[1]][3]
if (date_format == "MM-DD-YYYY") {
OUT = sprintf("%s-%s-%s", MM, DD, YYYY)
} else if (date_format == "DD-MM-YYYY") {
OUT = sprintf("%s-%s-%s", DD, MM, YYYY)
} else {
OUT = date
}
return(OUT)
}
> print_date()
[1] "2023-10-13"
> print_date("DD-MM-YYYY")
[1] "13-10-2023"
Often, you will create a function of several variables, but will want to use it as a function of fewer, e.g., in a minimisation routine
my_silly_function = function(x,y) {
return(x+y)
}
To use as a function of, say, x
with y=5
,
function(t) my_silly_function(x = t, y = 5)
whereas to use as a function of y
with x=2
function(t) my_silly_function(x = 2, y = t)
You can use any letter in the call to function
; I am not using x
here to make it obvious, but you could do function(x) my_silly_function(x = x, y = 5)
R
commandsR
is run, it goes through the document.. text is formated as prescribed by the type of program used (markdown for Rmarkdown and Quarto, R
commands are run, with the output incorporated to the text# Section
## Subsection
### Subsubsection
**bold text**
*italicised text*
[linked text](https://www.google.ca/)
R
code chunks are included in the text as```{r}
Some R code
```
You must use {r}
after the first three backticks. Blocks like
```R
Some R code
```
or
```
Some R code
```
are pure markdown code blocks, R
does not execute the R
commands there
It is a good idea to name chunks: when your code gets lengthy or complicated, debugging is greatly facilitated with named chunks, since errors will refer to the chunk name; with unnamed chunks, they will just refer to chunk number... RStudio also shows chunk names in the quick selection box
Chunk names appear in the {r}
statement at the beginning of a chunk, e.g., in the RStudio Rmd
skeleton file, the first chunk
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
is named setup
Chunk options follow the chunk name, if any, separated with commas. For instance, in the RStudio Rmd
skeleton file, the first chunk
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
is set with include=FALSE
, which prevents the code and results to show in the rendered file. See a list of options here
Note that the RStudio skeleton file includes the statement
knitr::opts_chunk$set(echo = TRUE)
which sets the chunk options globally (unless overridden in a specific chunk). For instance, if you want the default behaviour to be that your code is not shown, you could do
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
and override this in a specific chunk with
```{r some name, echo=TRUE}
Some commands
```
R
code as a code block (unless you choose to hide the code or the output)R
"inline", that is, within a regular markdown statement instead of a code chunk, using `r r-command`
, where r-command
is the R
command you want to useFor example, the default Rmd
file generated by RStudio uses the R
example dataset cars
. To show the number of rows in a regular sentence, outside of a code chunk, you could write
The cars dataset contains `r dim(cars)[1]` rows.
which renders as
The cars dataset contains 50 rows.
html
, pdf
or as a Word filepdf
, you will need to have