# tidyverse: an overview
# What is
tidyverse (opens new window) is the fast and elegant way to turn basic
R into an enhanced tool, redesigned by Hadley/Rstudio. The development of all packages included in
tidyverse follow the principle rules of The tidy tools manifesto (opens new window). But first, let the authors describe their masterpiece:
The tidyverse is a set of packages that work in harmony because they share common data representations and API design. The tidyverse package is designed to make it easy to install and load core packages from the tidyverse in a single command.
The best place to learn about all the packages in the tidyverse and how they fit together is R for Data Science. Expect to hear more about the tidyverse in the coming months as I work on improved package websites, making citation easier, and providing a common home for discussions about data analysis with the tidyverse.([source](https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/)))
# How to use it?
Just with the ordinary
R packages, you need to install and load the package.
The difference is, on a single command a couple of dozens of packages are installed/loaded. As a bonus, one may rest assured that all the installed/loaded packages are of compatible versions.
# What are those packages?
The commonly known and widely used packages:
- ggplot2 (opens new window): advanced data visualisation SO_doc (opens new window)
- dplyr (opens new window): fast (Rcpp (opens new window)) and coherent approach to data manipulation SO_doc (opens new window)
- tidyr (opens new window): tools for data tidying SO_doc (opens new window)
- readr (opens new window): for data import.
- purrr (opens new window): makes your pure functions purr by completing R's functional programming tools with important features from other languages, in the style of the JS packages underscore.js, lodash and lazy.js.
- tibble (opens new window): a modern re-imagining of data frames.
- magrittr (opens new window): piping to make code more readable SO_doc (opens new window)
Packages for manipulating specific data formats:
- hms (opens new window): easily read times
- stringr (opens new window): provide a cohesive set of functions designed to make working with strings as easy as posssible
- lubridate (opens new window): advanced date/times manipulations SO_doc (opens new window)
- forcats (opens new window): advanced work with factors (opens new window).
- DBI (opens new window): defines a common interface between the R and database management systems (DBMS)
- haven (opens new window): easily import SPSS, SAS and Stata files SO_doc (opens new window)
- httr (opens new window): the aim of httr is to provide a wrapper for the curl package, customised to the demands of modern web APIs
- jsonlite (opens new window): a fast JSON parser and generator optimized for statistical data and the web
- readxl (opens new window): read.xls and .xlsx files without need for dependency packages SO_doc (opens new window)
- rvest (opens new window): rvest helps you scrape information from web pages SO_doc (opens new window)
- xml2 (opens new window): for XML
- modelr (opens new window): provides functions that help you create elegant pipelines when modelling
- broom (opens new window): easily extract the models into tidy data
tidyverse suggest the use of:
- knitr (opens new window): the amazing general-purpose literate programming engine, with lightweight API's designed to give users full control of the output without heavy coding work. SO_docs: one (opens new window), two (opens new window)
- rmarkdown (opens new window): Rstudio's package for reproducible programming. SO_docs: one (opens new window), two (opens new window), three (opens new window), four (opens new window)
# Creating tbl_df’s
as_data_frame function to turn a data frame into a tbl_df:
library(tibble) mtcars_tbl <- as_data_frame(mtcars)
One of the most notable differences between data.frames and tbl_dfs is how they print:
# A tibble: 32 x 11 mpg cyl disp hp drat wt qsec vs am gear carb * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 # ... with 22 more rows
- The printed output includes a summary of the dimensions of the table (
32 x 11)
- It includes the type of each column (
- It prints a limited number of rows. (To change this use
options(tibble.print_max = [number])).
Many functions in the dplyr package work naturally with tbl_dfs, such as