# Fault-tolerant/resilient code
# Using tryCatch()
We're defining a robust version of a function that reads the HTML code from a given URL. Robust in the sense that we want it to handle situations where something either goes wrong (error) or not quite the way we planned it to (warning). The umbrella term for errors and warnings is condition
# Function definition using tryCatch
readUrl <- function(url) {
out <- tryCatch(
########################################################
# Try part: define the expression(s) you want to "try" #
########################################################
{
# Just to highlight:
# If you want to use more than one R expression in the "try part"
# then you'll have to use curly brackets.
# Otherwise, just write the single expression you want to try and
message("This is the 'try' part")
readLines(con = url, warn = FALSE)
},
########################################################################
# Condition handler part: define how you want conditions to be handled #
########################################################################
# Handler when a warning occurs:
warning = function(cond) {
message(paste("Reading the URL caused a warning:", url))
message("Here's the original warning message:")
message(cond)
# Choose a return value when such a type of condition occurs
return(NULL)
},
# Handler when an error occurs:
error = function(cond) {
message(paste("This seems to be an invalid URL:", url))
message("Here's the original error message:")
message(cond)
# Choose a return value when such a type of condition occurs
return(NA)
},
###############################################
# Final part: define what should happen AFTER #
# everything has been tried and/or handled #
###############################################
finally = {
message(paste("Processed URL:", url))
message("Some message at the end\n")
}
)
return(out)
}
# Testing things out
Let's define a vector of URLs where one element isn't a valid URL
urls <- c(
"http://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html",
"http://en.wikipedia.org/wiki/Xz",
"I'm no URL"
)
And pass this as input to the function we defined above
y <- lapply(urls, readUrl)
# Processed URL: http://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html
# Some message at the end
#
# Processed URL: http://en.wikipedia.org/wiki/Xz
# Some message at the end
#
# URL does not seem to exist: I'm no URL
# Here's the original error message:
# cannot open the connection
# Processed URL: I'm no URL
# Some message at the end
#
# Warning message:
# In file(con, "r") : cannot open file 'I'm no URL': No such file or directory
# Investigating the output
length(y)
# [1] 3
head(y[[1]])
# [1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"
# [2] "<html><head><title>R: Functions to Manipulate Connections</title>"
# [3] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">"
# [4] "<link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\">"
# [5] "</head><body>"
# [6] ""
y[[3]]
# [1] NA
# Parameters
Parameter | Details |
---|---|
expr | In case the "try part" was completed successfully tryCatch will return the last evaluated expression. Hence, the actual value being returned in case everything went well and there is no condition (i.e. a warning or an error) is the return value of readLines . Note that you don't need to explicilty state the return value via return as code in the "try part" is not wrapped insided a function environment (unlike that for the condition handlers for warnings and error below) |
warning/error/etc | Provide/define a handler function for all the conditions that you want to handle explicitly. AFAIU, you can provide handlers for any type of conditions (not just warnings and errors, but also custom conditions; see simpleCondition and friends for that) as long as the name of the respective handler function matches the class of the respective condition (see the Details part of the doc for tryCatch ). |
finally | Here goes everything that should be executed at the very end, regardless if the expression in the "try part" succeeded or if there was any condition. If you want more than one expression to be executed, then you need to wrap them in curly brackets, otherwise you could just have written finally = <expression> (i.e. the same logic as for "try part". |
# Remarks
# tryCatch
tryCatch
returns the value associated to executing expr
unless there's a condition: a warning or an error. If that's the case, specific return values (e.g. return(NA)
above) can be specified by supplying a handler function for the respective conditions (see arguments warning
and error
in ?tryCatch
). These can be functions that already exist, but you can also define them within tryCatch
(as we did above).
# Implications of choosing specific return values of the handler functions
As we've specified that NA
should be returned in case of an error in the "try part", the third element in y
is NA
. If we'd have chosen NULL
to be the return value, the length of y
would just have been 2
instead of 3
as lapply
will simply "ignore/drop" return values that are NULL
. Also note that if you don't specify an explicit return value via return
, the handler functions will return NULL
(i.e. in case of an error or a warning condition).
# "Undesired" warning message
When the third element of our urls
vector hits our function, we get the following warning in addition to the fact that an error occurs (readLines
first complains that it can't open the connection via a warning before actually failing with an error):
Warning message:
In file(con, "r") : cannot open file 'I'm no URL': No such file or directory
An error "wins" over a warning, so we're not really interested in the warning in this particular case. Thus we have set warn = FALSE
in readLines
, but that doesn't seem to have any effect. An alternative way to suppress the warning is to use
suppressWarnings(readLines(con = url))
instead of
readLines(con = url, warn = FALSE)