# Reproducible R
With 'Reproducibility' we mean that someone else (perhaps you in the future) can repeat the steps you performed and get the same result. See the Reproducible Research Task View.
# Data reproducibility
The easiest way to share a (preferable small) data frame is to use a basic function
dput(). It will export an R object in a plain text form.
Note: Before making the example data below, make sure you're in an empty folder you can write to. Run
getwd() and read
?setwd if you need to change folders.
dput(mtcars, file = 'df.txt')
Then, anyone can load the precise R object to their GlobalEnvironment using the
df <- dget('df.txt')
For larger R objects, there are a number of ways of saving them reproducibly. See Input and output .
# Package reproducibility
Package reproducibility is a very common issue in reproducing some R code. When various packages get updated, some interconnections between them may break. The ideal solution for the problem is to reproduce the image of the R code writer's machine on your computer at the date when the code was written. And here comes
Starting from 2014-09-17, the authors of the package make daily copies of the whole CRAN package repository to their own mirror repository -- Microsoft R Archived Network. So, to avoid package reproduciblity issues when creating a reproducible R project, all you need is to:
- Make sure that all your packages (and R version) are up-to-date.
checkpoint::checkpoint('YYYY-MM-DD')line in your code.
checkpoint will create a directory
.checkpoint in your R_home directory (
"~/"). To this technical directory it will install all the packages, that are used in your project. That means,
checkpoint looks through all the
.R files in your project directory to pick up all the
require() calls and install all the required packages in the form they existed at CRAN on the specified date.
PRO You are freed from the package reproducibility issue.
CONTRA For each specified date you have to download and install all the packages that are used in a certain project that you aim to reproduce. That may take quite a while.
To create reproducible results, all sources of variation need to be fixed. For instance, if a (pseudo)random number generator is used, the seed needs to be fixed if you want to recreate the same results. Another way to reduce variation is to combine text and computation in the same document.