-
Notifications
You must be signed in to change notification settings - Fork 26
/
debugging.Rmd
233 lines (161 loc) · 6.98 KB
/
debugging.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
# Debugging and testing drake projects {#debugging}
```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
warning = TRUE
)
options(
drake_make_menu = FALSE,
drake_clean_menu = FALSE,
warnPartialMatchArgs = FALSE,
crayon.enabled = FALSE,
readr.show_progress = FALSE,
tidyverse.quiet = TRUE
)
```
```{r, message = FALSE, warning = FALSE, echo = FALSE}
library(drake)
library(tidyverse)
```
This chapter aims to help users detect and diagnose problems with large complex workflows.
## Debugging failed targets
### Diagnosing errors
When a target fails, `drake` tries to tell you.
```{r, error = TRUE}
large_dataset <- function() {
data.frame(x = rnorm(1e6), y = rnorm(1e6))
}
expensive_analysis <- function(data) {
# More operations go here.
tricky_operation(data)
}
tricky_operation <- function(data) {
# Expensive code here.
stop("there is a bug somewhere.")
}
plan <- drake_plan(
data = large_dataset(),
analysis = expensive_analysis(data)
)
make(plan)
```
`diagnose()` recovers the metadata on targets. For failed targets, this includes an error object.
```{r}
error <- diagnose(analysis)$error
error
names(error)
```
Using the call stack, you can trace back the location of the error. Once you know roughly where to find the bug, you can troubleshoot interactively.
```{r}
invisible(lapply(tail(error$calls, 3), print))
```
### Interactive debugging
The clues from `diagnose()` help us go back and inspect the failing code. `debug()` is an interactive debugging tool which helps you verify exactly what is going wrong. Below, `make(plan)` pauses execution and turn interactive control over to you inside `tricky_operation()`.
```{r, eval = FALSE}
debug(tricky_operation)
make(plan) # Pauses at tricky_operation(data).
undebug(tricky_operation) # Undoes debug().
```
`drake`'s own `drake_debug()` function is nearly equivalent.
```{r, eval = FALSE}
drake_debug(analysis, plan) # Pauses at the command expensive_analysis(data).
```
`browser()` is similar, but it affords you finer control over to pause execution
```{r, eval = FALSE}
tricky_operation <- function(data) {
# Expensive code here.
browser() # Pauses right here to give you control.
stop("there is a bug somewhere.")
}
make(plan)
```
### Efficient trial and error
If you are using `drake`, then chances are your targets are computationally expensive and the long runtimes make debugging difficult. To speed up trial and error, run the plan on a small dataset when you debug and repair things.
```{r, eval = FALSE}
plan <- drake_plan(
data = head(large_dataset()), # Just work with the first few rows.
analysis = expensive_analysis(data) # Runs faster now.
)
```
```{r, eval = FALSE}
tricky_operation <- ... # Try to fix the function.
debug(tricky_operation) # Set up to debug interactively.
make(plan) # Try to run the workflow.
```
After a lot of quick trial and error, we finally fix the function and run it on the small data.
```{r}
tricky_operation <- function(data) {
# Good code goes here.
}
make(plan)
```
Now, that the code works, it is time to scale back up to the large data. Use `make(plan, recover = TRUE)` to salvage old targets from before the debugging process.
```{r}
plan <- drake_plan(
data = large_dataset(), # Use the large data again.
analysis = expensive_analysis(data) # Should be repaired now.
)
make(plan, recover = TRUE)
```
## Why do my targets keep rerunning?
Consider the following completed workflow.
```{r}
load_mtcars_example()
make(my_plan)
```
At this point, if you change the `reg1()` function, then `make()` will automatically detect and rerun downstream targets such as `regression1_large`.
```{r}
reg1 <- function (d) {
lm(y ~ 1 + x, data = d)
}
make(my_plan)
```
```{r, echo = FALSE}
reg1 <- function (d) {
lm(y ~ 2 + x, data = d)
}
```
In general, targets are "outdated" or "invalidated" they are out of sync with their dependencies. If a target is outdated, the next `make()` automatically detects discrepancies and rebuild the affected targets. Usually, this automation adds convenience, saves time, and ensures reproducibility in the face of long runtimes.
However, it can be frustrating when `drake` detects outdated targets when you think everything is up to date. If this happens, it is important to understand
1. How your workflow fits together.
2. Which targets are outdated.
3. Why your targets are outdated.
4. Strategies to prevent unexpected changes in the future.
`drake`'s utility functions offer clues to guide you.
### How your workflow fits together
`drake` automatically analyzes your plan and functions to understand how your targets depend on each other. It assembles this information in a directed acyclic graph (DAG) which you can visualize and explore.
```{r}
vis_drake_graph(my_plan)
```
To get a more localized version of the graph, use `deps_target()`. Unlike `vis_drake_graph()`, `deps_target()` gives you a more granular view of the dependencies of an individual target.
```{r}
deps_target(regression1_large, my_plan)
deps_target(report, my_plan)
```
To understand how `drake` detects dependencies in the first place, use `deps_code()`. This is what `drake` first sees when it reads your plan and functions to understand the dependencies.
```{r}
deps_code(quote(
suppressWarnings(summary(regression1_large$residuals))
))
deps_code(quote(
knit(knitr_in("report.Rmd"), file_out("report.md"), quiet = TRUE)
))
```
If `drake` detects new dependencies you were unaware of, that could be a reason why your targets are out of date.
### Which targets are outdated
Graphing utilities like `vis_drake_graph()` label the outdated targets, but sometimes it is helpful to get a more programmatic view.
```{r}
outdated(my_plan)
```
### Why your targets are outdated
The `deps_profile()` function offers clues.
```{r}
deps_profile(regression1_small, my_plan)
```
From the data frame above, `regression1_small` is outdated because an R object dependency changed since the last `make()`. `drake` does not hold on to enough information to tell you precisely which object is the culprit, but functions like `vis_drake_graph()`, `deps_target()`, and `deps_code()` can help narrow down the possibilities.
### Strategies to prevent unexpected changes in the future
`drake` is sensitive to changing functions in your global environment, and this sensitivity can invalidate targets unexpectedly. Whenever you plan to run `make()`, it is always best to restart your R session and load your packages and functions into a fresh clean workspace. [`r_make()`](https://docs.ropensci.org/drake/reference/r_make.html) does all this cleaning and prep work for you automatically, and it is more robust and dependable (and childproofed) than ordinary `make()`. To read more, visit <https://books.ropensci.org/drake/projects#safer-interactivity>.
## More help
The [GitHub issue tracker](https://github.com/ropensci/drake/issues) is the best place to request help with your specific use case.