# Column wise operation

# sum of each column

Suppose we need to do the sum of each column in a dataset

set.seed(20)
df1 <- data.frame(ID = rep(c("A", "B", "C"), each = 3), V1 = rnorm(9), V2 = rnorm(9))
m1 <- as.matrix(df1[-1])

There are many ways to do this. Using base R, the best option would be colSums

colSums(df1[-1], na.rm = TRUE)

Here, we removed the first column as it is non-numeric and did the sum of each column, specifying the na.rm = TRUE (in case there are any NAs in the dataset)

This also works with matrix

colSums(m1, na.rm = TRUE)

This can be done in a loop with lapply/sapply/vapply


lapply(df1[-1], sum, na.rm = TRUE)

It should be noted that the output is a list. If we need a vector output


sapply(df1[-1], sum, na.rm = TRUE)


vapply(df1[-1], sum, na.rm = TRUE, numeric(1))

For matrices, if we want to loop through columns, then use apply with MARGIN = 1


apply(m1, 2, FUN = sum, na.rm = TRUE)

There are ways to do this with packages like dplyr or data.table


library(dplyr)
 df1 %>%
     summarise_at(vars(matches("^V\\d+")), sum, na.rm = TRUE)

Here, we are passing a regular expression to match the column names that we need to get the sum in summarise_at. The regex will match all columns that start with V followed by one or more numbers (\\d+).

A data.table option is

library(data.table)   
setDT(df1)[, lapply(.SD, sum, na.rm = TRUE), .SDcols = 2:ncol(df1)]

We convert the 'data.frame' to 'data.table' (setDT(df1)), specified the columns to be applied the function in .SDcols and loop through the Subset of Data.table (.SD) and get the sum.

If we need to use a group by operation, we can do this easily by specifying the group by column/columns


df1 %>%
   group_by(ID) %>%   
   summarise_at(vars(matches("^V\\d+")), sum, na.rm = TRUE)

In cases where we need the sum of all the columns, summarise_each can be used instead of summarise_at

df1 %>%
    group_by(ID) %>%
    summarise_each(funs(sum(., na.rm = TRUE)))

The data.table option is

setDT(df1)[, lapply(.SD, sum, na.rm = TRUE), by = ID]

← Control flow structures JSON →