# Writing functions in R

# Anonymous functions

An anonymous function is, as the name implies, not assigned a name. This can be useful when the function is a part of a larger operation, but in itself does not take much place. One frequent use-case for anonymous functions is within the *apply family of Base functions.

Calculate the root mean square for each column in a data.frame:

df <- data.frame(first=5:9, second=(0:4)^2, third=-1:3)

apply(df, 2, function(x) { sqrt(sum(x^2)) })
    first    second     third 
15.968719 18.814888  3.872983 

Create a sequence of step-length one from the smallest to the largest value for each row in a matrix.

x <- sample(1:6, 12, replace=TRUE)
mat <- matrix(x, nrow=3)

apply(mat, 1, function(x) { seq(min(x), max(x)) })

An anonymous function can also stand on its own:

(function() { 1 })()
[1] 1

is equivalent to

f <- function() { 1 })
f()
[1] 1

# RStudio code snippets

This is just a small hack for those who use self-defined functions often.
Type "fun" RStudio IDE and hit TAB.

enter image description here (opens new window)

The result will be a skeleton of a new function.

name <- function(variables) {
        
}

One can easily define their own snippet template, i.e. like the one below

name <- function(df, x, y) {
        require(tidyverse)
        out <- 
        return(out)
}

The option is Edit Snippets in the Global Options -> Code menu.

# Named functions

R is full of functions, it is after all a functional programming language (opens new window), but sometimes the precise function you need isn't provided in the Base resources. You could conceivably install a package (opens new window) containing the function, but maybe your requirements are just so specific that no pre-made function fits the bill? Then you're left with the option of making your own.

A function can be very simple, to the point of being being pretty much pointless. It doesn't even need to take an argument:

one <- function() { 1 }
one()
[1] 1

two <- function() { 1 + 1 }
two()
[1] 2

What's between the curly braces { } is the function proper. As long as you can fit everything on a single line they aren't strictly needed, but can be useful to keep things organized.

A function can be very simple, yet highly specific. This function takes as input a vector (vec in this example) and outputs the same vector with the vector's length (6 in this case) subtracted from each of the vector's elements.

vec <- 4:9
subtract.length <- function(x) { x - length(x) }
subtract.length(vec)
[1] -2 -1  0  1  2  3

Notice that length() is in itself a pre-supplied (i.e. Base) function. You can of course use a previously self-made function within another self-made function, as well as assign variables and perform other operations while spanning several lines:

vec2 <- (4:7)/2

msdf <- function(x, multiplier=4) {
    mult <- x * multiplier
    subl <- subtract.length(x)
    data.frame(mult, subl)
}

msdf(vec2, 5)
  mult subl
1 10.0 -2.0
2 12.5 -1.5
3 15.0 -1.0
4 17.5 -0.5

multiplier=4 makes sure that 4 is the default value of the argument multiplier, if no value is given when calling the function 4 is what will be used.

The above are all examples of named functions, so called simply because they have been given names (one, two, subtract.length etc.)

# Passing column names as argument of a function

Sometimes one would like to pass names of columns from a data frame to a function. They may be provided as strings and used in a function using [[. Let's take a look at the following example, which prints to R console basic stats of selected variables:

basic.stats <- function(dset, vars){
    for(i in 1:length(vars)){
        print(vars[i])
        print(summary(dset[[vars[i]]]))
    }
}

basic.stats(iris, c("Sepal.Length", "Petal.Width"))

As a result of running above given code, names of selected variables and their basic summary statistics (minima, first quantiles, medians, means, third quantiles and maxima) are printed in R console. The code dset[[vars[i]]] selects i-th element from the argument vars and selects a corresponding column in declared input data set dset. For example, declaring iris[["Sepal.Length"]] alone would print the Sepal.Length column from the iris data set as a vector.