debugging.Rmd

# Debugging and testing drake projects {#debugging}

```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = TRUE
)
options(
  drake_make_menu = FALSE,
  drake_clean_menu = FALSE,
  warnPartialMatchArgs = FALSE,
  crayon.enabled = FALSE,
  readr.show_progress = FALSE,
  tidyverse.quiet = TRUE
)
```

```{r, message = FALSE, warning = FALSE, echo = FALSE}
library(drake)
library(tidyverse)
```

This chapter aims to help users detect and diagnose problems with large complex workflows.

## Debugging failed targets

### Diagnosing errors

When a target fails, `drake` tries to tell you.

```{r, error = TRUE}
large_dataset <- function() {
  data.frame(x = rnorm(1e6), y = rnorm(1e6))
}

expensive_analysis <- function(data) {
  # More operations go here.
  tricky_operation(data)
}

tricky_operation <- function(data) {
  # Expensive code here.
  stop("there is a bug somewhere.")
}

plan <- drake_plan(
  data = large_dataset(),
  analysis = expensive_analysis(data)
)

make(plan)
```

`diagnose()` recovers the metadata on targets. For failed targets, this includes an error object.

```{r}
error <- diagnose(analysis)$error
error

names(error)
```

Using the call stack, you can trace back the location of the error. Once you know roughly where to find the bug, you can troubleshoot interactively.

```{r}
invisible(lapply(tail(error$calls, 3), print))
```

### Interactive debugging

The clues from `diagnose()` help us go back and inspect the failing code. `debug()` is an interactive debugging tool which helps you verify exactly what is going wrong. Below, `make(plan)` pauses execution and turn interactive control over to you inside `tricky_operation()`.

```{r, eval = FALSE}
debug(tricky_operation)
make(plan) # Pauses at tricky_operation(data).
undebug(tricky_operation) # Undoes debug().
```

`drake`'s own `drake_debug()` function is nearly equivalent.

```{r, eval = FALSE}
drake_debug(analysis, plan) # Pauses at the command expensive_analysis(data).
```

`browser()` is similar, but it affords you finer control over to pause execution

```{r, eval = FALSE}
tricky_operation <- function(data) {
  # Expensive code here.
  browser() # Pauses right here to give you control.
  stop("there is a bug somewhere.")
}

make(plan)
```

### Efficient trial and error

If you are using `drake`, then chances are your targets are computationally expensive and the long runtimes make debugging difficult. To speed up trial and error, run the plan on a small dataset when you debug and repair things.

```{r, eval = FALSE}
plan <- drake_plan(
  data = head(large_dataset()),       # Just work with the first few rows.
  analysis = expensive_analysis(data) # Runs faster now.
)
```

```{r, eval = FALSE}
tricky_operation <- ... # Try to fix the function.

debug(tricky_operation) #  Set up to debug interactively.

make(plan) # Try to run the workflow.
```

After a lot of quick trial and error, we finally fix the function and run it on the small data.

```{r}
tricky_operation <- function(data) {
  # Good code goes here.
}

make(plan)
```

Now, that the code works, it is time to scale back up to the large data. Use `make(plan, recover = TRUE)` to salvage old targets from before the debugging process.

```{r}
plan <- drake_plan(
  data = large_dataset(),             # Use the large data again.
  analysis = expensive_analysis(data) # Should be repaired now.
)

make(plan, recover = TRUE)
```


## Why do my targets keep rerunning?

Consider the following completed workflow.

```{r}
load_mtcars_example()
make(my_plan)
```

At this point, if you change the `reg1()` function, then `make()` will automatically detect and rerun downstream targets such as `regression1_large`.

```{r}
reg1 <- function (d) {
  lm(y ~ 1 + x, data = d)
}

make(my_plan)
```

```{r, echo = FALSE}
reg1 <- function (d) {
  lm(y ~ 2 + x, data = d)
}
```


In general, targets are "outdated" or "invalidated" they are out of sync with their dependencies. If a target is outdated, the next `make()` automatically detects discrepancies and rebuild the affected targets. Usually, this automation adds convenience, saves time, and ensures reproducibility in the face of long runtimes.

However, it can be frustrating when `drake` detects outdated targets when you think everything is up to date. If this happens, it is important to understand

1. How your workflow fits together.
2. Which targets are outdated.
3. Why your targets are outdated.
4. Strategies to prevent unexpected changes in the future.

`drake`'s utility functions offer clues to guide you.

### How your workflow fits together

`drake` automatically analyzes your plan and functions to understand how your targets depend on each other. It assembles this information in a directed acyclic graph (DAG) which you can visualize and explore.

```{r}
vis_drake_graph(my_plan)
```

To get a more localized version of the graph, use `deps_target()`. Unlike `vis_drake_graph()`, `deps_target()` gives you a more granular view of the dependencies of an individual target.

```{r}
deps_target(regression1_large, my_plan)

deps_target(report, my_plan)
```

To understand how `drake` detects dependencies in the first place, use `deps_code()`. This is what `drake` first sees when it reads your plan and functions to understand the dependencies.

```{r}
deps_code(quote(
  suppressWarnings(summary(regression1_large$residuals))
))

deps_code(quote(
  knit(knitr_in("report.Rmd"), file_out("report.md"), quiet = TRUE)
))
```


If `drake` detects new dependencies you were unaware of, that could be a reason why your targets are out of date.

### Which targets are outdated

Graphing utilities like `vis_drake_graph()` label the outdated targets, but sometimes it is helpful to get a more programmatic view. 

```{r}
outdated(my_plan)
```

### Why your targets are outdated

The `deps_profile()` function offers clues.

```{r}
deps_profile(regression1_small, my_plan)
```

From the data frame above, `regression1_small` is outdated because an R object dependency changed since the last `make()`. `drake` does not hold on to enough information to tell you precisely which object is the culprit, but functions like `vis_drake_graph()`, `deps_target()`, and `deps_code()` can help narrow down the possibilities.


### Strategies to prevent unexpected changes in the future

`drake` is sensitive to changing functions in your global environment, and this sensitivity can invalidate targets unexpectedly. Whenever you plan to run `make()`, it is always best to restart your R session and load your packages and functions into a fresh clean workspace. [`r_make()`](https://docs.ropensci.org/drake/reference/r_make.html) does all this cleaning and prep work for you automatically, and it is more robust and dependable  (and childproofed) than ordinary `make()`. To read more, visit <https://books.ropensci.org/drake/projects#safer-interactivity>.

## More help

The [GitHub issue tracker](https://github.com/ropensci/drake/issues) is the best place to request help with your specific use case.