# # Column wise operation

## # sum of each column

Suppose we need to do the `sum`

of each column in a dataset

```
set.seed(20)
df1 <- data.frame(ID = rep(c("A", "B", "C"), each = 3), V1 = rnorm(9), V2 = rnorm(9))
m1 <- as.matrix(df1[-1])
```

There are many ways to do this. Using `base R`

, the best option would be `colSums`

```
colSums(df1[-1], na.rm = TRUE)
```

Here, we removed the first column as it is non-numeric and did the `sum`

of each column, specifying the `na.rm = TRUE`

(in case there are any NAs in the dataset)

This also works with `matrix`

```
colSums(m1, na.rm = TRUE)
```

This can be done in a loop with `lapply/sapply/vapply`

```
lapply(df1[-1], sum, na.rm = TRUE)
```

It should be noted that the output is a `list`

. If we need a `vector`

output

```
sapply(df1[-1], sum, na.rm = TRUE)
```

Or

```
vapply(df1[-1], sum, na.rm = TRUE, numeric(1))
```

For matrices, if we want to loop through columns, then use `apply`

with `MARGIN = 1`

```
apply(m1, 2, FUN = sum, na.rm = TRUE)
```

There are ways to do this with packages like `dplyr`

or `data.table`

```
library(dplyr)
df1 %>%
summarise_at(vars(matches("^V\\d+")), sum, na.rm = TRUE)
```

Here, we are passing a regular expression to match the column names that we need to get the `sum`

in `summarise_at`

. The regex will match all columns that start with `V`

followed by one or more numbers (`\\d+`

).

A `data.table`

option is

```
library(data.table)
setDT(df1)[, lapply(.SD, sum, na.rm = TRUE), .SDcols = 2:ncol(df1)]
```

We convert the 'data.frame' to 'data.table' (`setDT(df1)`

), specified the columns to be applied the function in `.SDcols`

and loop through the Subset of Data.table (`.SD`

) and get the `sum`

.

If we need to use a group by operation, we can do this easily by specifying the group by column/columns

```
df1 %>%
group_by(ID) %>%
summarise_at(vars(matches("^V\\d+")), sum, na.rm = TRUE)
```

In cases where we need the `sum`

of all the columns, `summarise_each`

can be used instead of `summarise_at`

```
df1 %>%
group_by(ID) %>%
summarise_each(funs(sum(., na.rm = TRUE)))
```

The `data.table`

option is

```
setDT(df1)[, lapply(.SD, sum, na.rm = TRUE), by = ID]
```