# # boxplot

## # Create a box-and-whisker plot with boxplot() {graphics}

This example use the default `boxplot()`

function and the `iris`

data frame.

```
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
```

### # Simple boxplot (Sepal.Length)

Create a box-and-whisker graph of a numerical variable

```
boxplot(iris[,1],xlab="Sepal.Length",ylab="Length(in centemeters)",
main="Summary Charateristics of Sepal.Length(Iris Data)")
```

### # Boxplot of sepal length grouped by species

Create a boxplot of a numerical variable grouped by a categorical variable

```
boxplot(Sepal.Length~Species,data = iris)
```

### # Bring order

To change order of the box in the plot you have to change the order of the categorical variable's levels.

For example if we want to have the order `virginica - versicolor - setosa`

```
newSpeciesOrder <- factor(iris$Species, levels=c("virginica","versicolor","setosa"))
boxplot(Sepal.Length~newSpeciesOrder,data = iris)
```

### # Change groups names

If you want to specifie a better name to your groups you can use the `Names`

parameter. It take a vector of the size of the levels of categorical variable

```
boxplot(Sepal.Length~newSpeciesOrder,data = iris,names= c("name1","name2","name3"))
```

### # Small improvements

### # Color

`col`

: add a vector of the size of the levels of categorical variable

```
boxplot(Sepal.Length~Species,data = iris,col=c("green","yellow","orange"))
```

### # Proximity of the box

`boxwex`

: set the margin between boxes.

Left `boxplot(Sepal.Length~Species,data = iris,boxwex = 0.1)`

Right `boxplot(Sepal.Length~Species,data = iris,boxwex = 1)`

### # See the summaries which the boxplots are based `plot=FALSE`

To see a summary you have to put the paramater `plot`

to `FALSE`

.

Various results are given

```
> boxplot(Sepal.Length~newSpeciesOrder,data = iris,plot=FALSE)
$stats #summary of the numerical variable for the 3 groups
[,1] [,2] [,3]
[1,] 5.6 4.9 4.3 # extreme value
[2,] 6.2 5.6 4.8 # first quartile limit
[3,] 6.5 5.9 5.0 # median limit
[4,] 6.9 6.3 5.2 # third quartile limit
[5,] 7.9 7.0 5.8 # extreme value
$n #number of observations in each groups
[1] 50 50 50
$conf #extreme value of the notchs
[,1] [,2] [,3]
[1,] 6.343588 5.743588 4.910622
[2,] 6.656412 6.056412 5.089378
$out #extreme value
[1] 4.9
$group #group in which are the extreme value
[1] 1
$names #groups names
[1] "virginica" "versicolor" "setosa"
```

## # Additional boxplot style parameters.

### # Box

- boxlty - box line type
- boxlwd - box line width
- boxcol - box line color
- boxfill - box fill colors

### # Median

- medlty - median line type ("blank" for no line)
- medlwd - median line widht
- medcol - median line color
- medpch - median point (NA for no symbol)
- medcex - median point size
- medbg - median point background color

### # Whisker

- whisklty - whisker line type
- whisklwd - whisker line width
- whiskcol - whisker line color

### # Staple

- staplelty - staple line type
- staplelwd - staple line width
- staplecol - staple line color

### # Outliers

- outlty - outlier line type ("blank" for no line)
- outlwd - outlier line width
- outcol - outlier line color
- outpch - outlier point type (NA for no symbol)
- outcex - outlier point size
- outbg - outlier point background color

### # Example

Default and heavily modified plots side by side

```
par(mfrow=c(1,2))
# Default
boxplot(Sepal.Length ~ Species, data=iris)
# Modified
boxplot(Sepal.Length ~ Species, data=iris,
boxlty=2, boxlwd=3, boxfill="cornflowerblue", boxcol="darkblue",
medlty=2, medlwd=2, medcol="red", medpch=21, medcex=1, medbg="white",
whisklty=2, whisklwd=3, whiskcol="darkblue",
staplelty=2, staplelwd=2, staplecol="red",
outlty=3, outlwd=3, outcol="grey", outpch=NA
)
```

#### # Syntax

#### # Parameters

Parameters | Details (source R Documentation) |
---|---|

formula | a formula, such as y ~ grp, where y is a numeric vector of data values to be split into groups according to the grouping variable grp (usually a factor). |

data | a data.frame (or list) from which the variables in formula should be taken. |

subset | an optional vector specifying a subset of observations to be used for plotting. |

na.action | a function which indicates what should happen when the data contain NAs. The default is to ignore missing values in either the response or the group. |

boxwex | a scale factor to be applied to all boxes. When there are only a few groups, the appearance of the plot can be improved by making the boxes narrower. |

plot | if TRUE (the default) then a boxplot is produced. If not, the summaries which the boxplots are based on are returned. |

col | if col is non-null it is assumed to contain colors to be used to colour the bodies of the box plots. By default they are in the background colour. |

← Base Plotting ggplot2 →