Reproducible session management #765

wlandau · 2019-03-03T03:24:48Z

Summary

This PR implements the proposal in #761 (comment). We can achieve reproducible session management if we rely on a master R script file that returns a drake_config() object. See the help file of r_make() for details. Usage:

writeLines(
  c(
    "library(drake)",
    "load_mtcars_example()",
    "drake_config(my_plan)"
  ),
  "_drake.R" # default value of the `source` argument
)
cat(readLines("_drake.R"), sep = "\n")
#> library(drake)
#> load_mtcars_example()
#> drake_config(my_plan)

library(drake)
r_outdated()
#>  [1] "coef_regression1_large" "coef_regression1_small"
#>  [3] "coef_regression2_large" "coef_regression2_small"
#>  [5] "large"                  "regression1_large"     
#>  [7] "regression1_small"      "regression2_large"     
#>  [9] "regression2_small"      "report"                
#> [11] "small"                  "summ_regression1_large"
#> [13] "summ_regression1_small" "summ_regression2_large"
#> [15] "summ_regression2_small"
r_make()
r_outdated()
#> character(0)

^{Created on 2019-03-02 by the reprex package (v0.2.1)}

Packages callr + parallel (forks specifically) can be a dangerous combination (ref: r-lib/processx#113) but simple tests of multicore clustermq and future parallelism seem to run just fine in r_make().

cc:

Related GitHub issues and pull requests

Ref: Environments, reproducibility, and consistent target validation #761

Checklist

I have read drake's code of conduct, and I agree to follow its rules.
I have listed any substantial changes in the development news.
I have added testthat unit tests to tests/testthat to confirm that any new features or functionality work correctly.
I have tested this pull request locally with devtools::check()
This pull request is ready for review.
I think this pull request is ready to merge.

codecov-io · 2019-03-03T03:30:21Z

Codecov Report

Merging #765 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #765   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          72     73    +1     
  Lines        6121   6178   +57     
=====================================
+ Hits         6121   6178   +57

Impacted Files	Coverage Δ
R/api-make.R	`100% <ø> (ø)`	⬆️
R/exec-session.R	`100% <ø> (ø)`	⬆️
R/api-callr.R	`100% <100%> (ø)`
R/preprocess-config.R	`100% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed72097...5bb65c1. Read the comment docs.

wlandau · 2019-03-03T13:05:12Z

Merging, re #761 (comment).

re ropensci/drake#765

lorenzwalthert · 2019-03-10T22:33:38Z

This might be a dumb question but what exactly is the advantage of the approach with r_make() compared to running a fresh R session from the terminal, sourcing relevant scripts and build the plan (e.g. in RStudio as a build rutine) using make()? I think we talked about it at some point but I could not find it anymore. I wrote a blog post about this (and other things) which I have not published yet (and which I think I have to review in the light of the 7.0 release). You can preview here: https://5c6f2832b2861fbc63604c01--lorenzwalthert.netlify.com/post/drake-workflow-proposal/

wlandau · 2019-03-11T04:06:11Z

It is mainly about childproofing. It is easy to forget to start a fresh session and source the correct setup scripts in the correct order, especially for new users, and the consequences of getting it wrong are painful. For a package that claims to aid reproducibility, this is important.

It also turns out to be convenient in practice. I am using r_vis_drake_graph() a lot for my own internal workflows now, and I feel like I can relax and trust the tool more, even if I am not being super disciplined about session management.

wlandau · 2019-03-11T04:14:40Z

Also, I skimmed the draft of your blog post, and those are some super valuable workarounds for drake <= 6.2.1. Sorry about the timing. As you were writing, the functionality and best practices were changing. You might have a look at the new DSL and the newly revamped chapter on projects.

Things I like about your post that are still current:

Running drake in the background. If your workflow runs long enough for drake to be useful, you are probably going to want to run it in batch mode in the background anyway.
Unit tests inside a drake workflow.
drogger. drake tries not to work too hard at console logging, and I think this fills a gap.

lorenzwalthert · 2019-03-11T10:28:45Z

Thanks, I will have a look at the two links. The reason this post was not published is because drogger and teamtools is not ready for prime time. Also, I think other packages such as drogger can fill the logging gap. It helps keeping drake lightweight and also it is kind of orthogonal to drake because logging has many other applications.

@tjmahr

* Change the video player in the drake tech note The default PowToon player is having problems, so I propose a switch to Wistia. * Add technote: drake version 7.0.0 * Try to fix authors field * Rm mtcars beforehand Needed to consistently reproduce `lock_envir = FALSE` behavior. Also add a link. * Reknit drake tech note ropensci/drake#756 * Reknit to remove superfluous message * Add a tag * Update drake 7.0.0 tech note... ...re ropensci/drake#762 * Add more thanks * Update a header * Clean up error messages in drake itself * Minor edit * Minor elaboration * Minor update: reduced verbosity in drake * Mention some future work * Update drake tech note re ropensci/drake#765 * Clean up the section on interactive sessions * Fix a typo * Fix a link * Edit recap section in drake tech note * Sync recap * Mention @tjmahr Ref: ropensci/drake#775 * Last-minute edits to the drake v7 tech note * Update word choice * Knit in batch mode * Mention literate programming * Address https://github.com/ropensci/roweb2/pull/423/files#r264418051 * Address https://github.com/ropensci/roweb2/pull/423/files#r264418238 * Add a mention Reduced trigger verbosity suggested by @aedobbyn: ropensci/drake@ec050a4 * Alphabetize mentions by first name

wlandau-lilly added 3 commits March 2, 2019 22:02

Propose callr-like API for #761

f59a3f8

Add a test

9fc8f55

Add backup tests

c0406a8

wlandau added difficulty: advanced Chicago R Unconference labels Mar 3, 2019

wlandau self-assigned this Mar 3, 2019

wlandau added topic: reproducibility DO NOT MERGE ⚠️ labels Mar 3, 2019

wlandau mentioned this pull request Mar 3, 2019

Environments, reproducibility, and consistent target validation #761

Closed

Try to make AppVeyor install drake

e9fe905

wlandau-lilly added 3 commits March 2, 2019 22:41

Tweak appveyor.yml

87bd22d

Minor edit

27c2fa8

Add a test

4ff065c

wlandau force-pushed the 761 branch from 30e12eb to 4ff065c Compare March 3, 2019 04:12

wlandau-lilly added 3 commits March 2, 2019 23:18

Use callr::r() rather than callr::r_vanilla()

c0bc4d4

Mention r_make() in README

4a49cac

Label r_*() experimental

5bb65c1

wlandau removed the DO NOT MERGE ⚠️ label Mar 3, 2019

wlandau merged commit d2cf41d into master Mar 3, 2019

wlandau deleted the 761 branch March 3, 2019 13:05

wlandau pushed a commit to wlandau/roweb2 that referenced this pull request Mar 3, 2019

Update drake tech note

479edf7

re ropensci/drake#765

This was referenced Mar 3, 2019

Document the r_*() functions ropensci-books/drake#66

Closed

Document the new r_*() functions wlandau/drake-examples#15

Closed

wlandau mentioned this pull request Aug 15, 2019

Replace the make() / r_make() menu with a message. #987

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducible session management #765

Reproducible session management #765

wlandau commented Mar 3, 2019 •

edited

Loading

codecov-io commented Mar 3, 2019 •

edited

Loading

wlandau commented Mar 3, 2019

lorenzwalthert commented Mar 10, 2019 •

edited

Loading

wlandau commented Mar 11, 2019 •

edited

Loading

wlandau commented Mar 11, 2019

lorenzwalthert commented Mar 11, 2019

Reproducible session management #765

Reproducible session management #765

Conversation

wlandau commented Mar 3, 2019 • edited Loading

Summary

Related GitHub issues and pull requests

Checklist

codecov-io commented Mar 3, 2019 • edited Loading

Codecov Report

wlandau commented Mar 3, 2019

lorenzwalthert commented Mar 10, 2019 • edited Loading

wlandau commented Mar 11, 2019 • edited Loading

wlandau commented Mar 11, 2019

lorenzwalthert commented Mar 11, 2019

wlandau commented Mar 3, 2019 •

edited

Loading

codecov-io commented Mar 3, 2019 •

edited

Loading

lorenzwalthert commented Mar 10, 2019 •

edited

Loading

wlandau commented Mar 11, 2019 •

edited

Loading