Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible session management #765

Merged
merged 10 commits into from
Mar 3, 2019
Merged

Reproducible session management #765

merged 10 commits into from
Mar 3, 2019

Conversation

wlandau
Copy link
Member

@wlandau wlandau commented Mar 3, 2019

Summary

This PR implements the proposal in #761 (comment). We can achieve reproducible session management if we rely on a master R script file that returns a drake_config() object. See the help file of r_make() for details. Usage:

writeLines(
  c(
    "library(drake)",
    "load_mtcars_example()",
    "drake_config(my_plan)"
  ),
  "_drake.R" # default value of the `source` argument
)
cat(readLines("_drake.R"), sep = "\n")
#> library(drake)
#> load_mtcars_example()
#> drake_config(my_plan)

library(drake)
r_outdated()
#>  [1] "coef_regression1_large" "coef_regression1_small"
#>  [3] "coef_regression2_large" "coef_regression2_small"
#>  [5] "large"                  "regression1_large"     
#>  [7] "regression1_small"      "regression2_large"     
#>  [9] "regression2_small"      "report"                
#> [11] "small"                  "summ_regression1_large"
#> [13] "summ_regression1_small" "summ_regression2_large"
#> [15] "summ_regression2_small"
r_make()
r_outdated()
#> character(0)

Created on 2019-03-02 by the reprex package (v0.2.1)

Packages callr + parallel (forks specifically) can be a dangerous combination (ref: r-lib/processx#113) but simple tests of multicore clustermq and future parallelism seem to run just fine in r_make().

cc:

Related GitHub issues and pull requests

Checklist

  • I have read drake's code of conduct, and I agree to follow its rules.
  • I have listed any substantial changes in the development news.
  • I have added testthat unit tests to tests/testthat to confirm that any new features or functionality work correctly.
  • I have tested this pull request locally with devtools::check()
  • This pull request is ready for review.
  • I think this pull request is ready to merge.

@codecov-io
Copy link

codecov-io commented Mar 3, 2019

Codecov Report

Merging #765 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #765   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          72     73    +1     
  Lines        6121   6178   +57     
=====================================
+ Hits         6121   6178   +57
Impacted Files Coverage Δ
R/api-make.R 100% <ø> (ø) ⬆️
R/exec-session.R 100% <ø> (ø) ⬆️
R/api-callr.R 100% <100%> (ø)
R/preprocess-config.R 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed72097...5bb65c1. Read the comment docs.

@wlandau
Copy link
Member Author

wlandau commented Mar 3, 2019

Merging, re #761 (comment).

@wlandau wlandau merged commit d2cf41d into master Mar 3, 2019
@wlandau wlandau deleted the 761 branch March 3, 2019 13:05
wlandau pushed a commit to wlandau/roweb2 that referenced this pull request Mar 3, 2019
@lorenzwalthert
Copy link
Contributor

lorenzwalthert commented Mar 10, 2019

This might be a dumb question but what exactly is the advantage of the approach with r_make() compared to running a fresh R session from the terminal, sourcing relevant scripts and build the plan (e.g. in RStudio as a build rutine) using make()? I think we talked about it at some point but I could not find it anymore. I wrote a blog post about this (and other things) which I have not published yet (and which I think I have to review in the light of the 7.0 release). You can preview here: https://5c6f2832b2861fbc63604c01--lorenzwalthert.netlify.com/post/drake-workflow-proposal/

@wlandau
Copy link
Member Author

wlandau commented Mar 11, 2019

It is mainly about childproofing. It is easy to forget to start a fresh session and source the correct setup scripts in the correct order, especially for new users, and the consequences of getting it wrong are painful. For a package that claims to aid reproducibility, this is important.

It also turns out to be convenient in practice. I am using r_vis_drake_graph() a lot for my own internal workflows now, and I feel like I can relax and trust the tool more, even if I am not being super disciplined about session management.

@wlandau
Copy link
Member Author

wlandau commented Mar 11, 2019

Also, I skimmed the draft of your blog post, and those are some super valuable workarounds for drake <= 6.2.1. Sorry about the timing. As you were writing, the functionality and best practices were changing. You might have a look at the new DSL and the newly revamped chapter on projects.

Things I like about your post that are still current:

  • Running drake in the background. If your workflow runs long enough for drake to be useful, you are probably going to want to run it in batch mode in the background anyway.
  • Unit tests inside a drake workflow.
  • drogger. drake tries not to work too hard at console logging, and I think this fills a gap.

@lorenzwalthert
Copy link
Contributor

Thanks, I will have a look at the two links. The reason this post was not published is because drogger and teamtools is not ready for prime time. Also, I think other packages such as drogger can fill the logging gap. It helps keeping drake lightweight and also it is kind of orthogonal to drake because logging has many other applications.

stefaniebutland pushed a commit to ropensci-archive/roweb2 that referenced this pull request Mar 18, 2019
* Change the video player in the drake tech note

The default PowToon player is having problems,
so I propose a switch to Wistia.

* Add technote: drake version 7.0.0

* Try to fix authors field

* Rm mtcars beforehand

Needed to consistently reproduce `lock_envir = FALSE` behavior.
Also add a link.

* Reknit drake tech note

ropensci/drake#756

* Reknit to remove superfluous message

* Add a tag

* Update drake 7.0.0 tech note...

...re ropensci/drake#762

* Add more thanks

* Update a header

* Clean up error messages in drake itself

* Minor edit

* Minor elaboration

* Minor update: reduced verbosity in drake

* Mention some future work

* Update drake tech note

re ropensci/drake#765

* Clean up the section on interactive sessions

* Fix a typo

* Fix a link

* Edit recap section in drake tech note

* Sync recap

* Mention @tjmahr

Ref: ropensci/drake#775

* Last-minute edits to the drake v7 tech note

* Update word choice

* Knit in batch mode

* Mention literate programming

* Address https://github.com/ropensci/roweb2/pull/423/files#r264418051

* Address https://github.com/ropensci/roweb2/pull/423/files#r264418238

* Add a mention

Reduced trigger verbosity suggested by @aedobbyn:
ropensci/drake@ec050a4

* Alphabetize mentions by first name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants