Functions

Using functions is a great way to generalize and automatize processes. R comes with several basic functions implemented in packages (which are basic collections of functions and objects) such as stats and graphics. Functions will usually take arguments provided by the user based on which they will perform an action. It is always important to read the Help page of a function to learn which arguments are possible and how to use them.

For example, you can use a function to generate a sequence more complex than just by using 1:10 using the arguments by and length.out of the function seq:

seq(from = 1, to = 20, by = 2)  ## increment the sequence by 2
 [1]  1  3  5  7  9 11 13 15 17 19
seq(from = 1, to = 20, length.out = 5)  ## only 5 numbers between 1 and 20
[1]  1.00  5.75 10.50 15.25 20.00

A sequence can also be created by repetitions using the function rep and defining arguments as each (each element is repeated x times) and times (the sequence is repeated x times).

rep(1:4, each = 2)
[1] 1 1 2 2 3 3 4 4
rep(c("cat", "dog", "mouse"), times = 2)
[1] "cat"   "dog"   "mouse" "cat"   "dog"   "mouse"
rep(c("cat", "dog", "mouse"), times = 1:3)
[1] "cat"   "dog"   "dog"   "mouse" "mouse" "mouse"

We saw before that a logical operation can be made for each element in a vector using logical operators. If you want to check whether all or any elements fit a certain criterium you can use the functions:

vec.1 <- c(2, 4, 6)
all(vec.1<4)
[1] FALSE
any(vec.1<4)
[1] TRUE

The base packages in R also include functions to perform more complex but very useful calculations, such as logarithms (with different bases) and antilogs, square roots, sums, means and medians.

log(42)  ## natural log
[1] 3.73767
log10(42)  ## base 10 log
[1] 1.623249
exp(3.73767)  ## antilog
[1] 42.00002
X = 13^2
sqrt(X) == 13
[1] TRUE
vec <- seq(1, 100, by = 2)
sum(vec)
[1] 2500
mean(vec)
[1] 50
median(vec)
[1] 50

Checking the range of values in a list, the minimum and maximum values, and the length of this list can be very useful. As well as sorting a list or selecting which values correspond to a criterium.

range(vec)
[1]  1 99
max(vec)
[1] 99
min(vec)
[1] 1
length(vec)
[1] 50
sort(vec)
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
[26] 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99
rev(vec)
 [1] 99 97 95 93 91 89 87 85 83 81 79 77 75 73 71 69 67 65 63 61 59 57 55 53 51
[26] 49 47 45 43 41 39 37 35 33 31 29 27 25 23 21 19 17 15 13 11  9  7  5  3  1
which(vec == 3)  ## gives the position in the list
[1] 2

We saw before how to see the class of an object using the class() function, but we can also check whether an object is from a specific class using the group of functions “is.X”:

is.numeric(vec)
[1] TRUE
is.character(vec)
[1] FALSE

Factor is an object type that represent categorical variables. They have a determined number levels (categories), and some functions can be used to check those.

color.names <- factor(c("black", "white", "pink", "pink", "white", "white"))
class(color.names)
[1] "factor"
length(color.names)
[1] 6
levels(color.names)
[1] "black" "pink"  "white"
length(levels(color.names))
[1] 3
# there is also a function to do that
nlevels(color.names)
[1] 3

Let’s say you want to calculate the mean value of a sequence that contains NA. This will return an NA:

y <- c(4, NA, 7)
mean(y)
[1] NA

So, you need to deal with this NA before obtaining this mean value. You can identify its position and manually remove it, but this becomes impracticable as the dimensions of an object increase. You can replace the NA by another value, using the ifelse() function, but that will change the original data. Alternatively, you can omit the NAs from the calculation, either before (using the function na.omit()) or with an argument implemented in the mean() function:

# Option 1. Manually removing NAs. Try to understand the series of code here
is.na(y)  ## tells you which positions are NAs
[1] FALSE  TRUE FALSE
y[!is.na(y)]
[1] 4 7
y.removed <- y[!is.na(y)]  
y.removed
[1] 4 7
mean(y.removed)
[1] 5.5
# Option 2. Replacing NAs
y.replaced <- ifelse(test = is.na(y), yes = 0, no = y) 
y.replaced
[1] 4 0 7
mean(y.replaced)
[1] 3.666667
# Option 3. Omitting NAs
y.omit <- na.omit(y)
y.omit
[1] 4 7
attr(,"na.action")
[1] 2
attr(,"class")
[1] "omit"
mean(y.omit)
[1] 5.5
# Option 4. Use an argument in mean()
mean(y, na.rm = TRUE)
[1] 5.5

On the next section we will see how you can write your own function.

Packages

In this section we will install and load packages to use their functions. Packages available on the official R repository (the CRAN repository) can be installed using the function:

install.packages("ggplot2")

Installing a package does not make it readly available, they first need to be loaded on the session.

library(ggplot2)