Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of strings_in_dots warning is unclear #758

Closed
MilesMcBain opened this issue Feb 28, 2019 · 4 comments
Closed

Documentation of strings_in_dots warning is unclear #758

MilesMcBain opened this issue Feb 28, 2019 · 4 comments
Assignees

Comments

@MilesMcBain
Copy link
Contributor

MilesMcBain commented Feb 28, 2019

Hi,

I've cooked up a drake plan that looks like this:

analysis_plan_monthly <-
  drake_plan(
    hexes = get_hexes(),

    incidents = get_hex_incidents(hexes),

    categorised_incidents = categorise_incidents(incidents),

    incident_time_series_mth = incident_monthly_timeseries(categorised_incidents,
                                                           data_epoch,
                                                           data_end_date),

    monthly_model_fits = fit_monthly_models(incident_time_series_mth),

    forecast_input = forecast_input_df(incident_time_series_mth,
                                       years_ahead,
                                       forecast_epoch,
                                       data_epoch),

    future_rates = forecast_means(monthly_model_fits, forecast_input),

    future_states = mutate(future_rates, n_incidents = round(.fitted)),
    ## not a true realisation of a simulation at the forecast rate. instead of
    ## this I just output a rounded version of the rate, assuming simulations
    ## would converge to this.

    output_db = write_db(future_states, file_out("demand_forecast.db"))
  )

And I get a warning:

Warning message:
Converting double-quotes to single-quotes because the `strings_in_dots` argument is missing. Use the file_in(), file_out(), and knitr_in() functions to work with files in your commands. To remove this warning, either call `drake_plan()` with `strings_in_dots = "literals"` or use `pkgconfig::set_config("drake::strings_in_dots" = "literals")`. 

I don't understand what this means and it is not mentioned in the manual. I thought it was trying to suggest that I might not use literal file names and instead use a variable. So I changed the plan to look like:

analysis_plan_monthly <-
  drake_plan(
    hexes = get_hexes(),

    incidents = get_hex_incidents(hexes),

    categorised_incidents = categorise_incidents(incidents),

    incident_time_series_mth = incident_monthly_timeseries(categorised_incidents,
                                                           data_epoch,
                                                           data_end_date),

    monthly_model_fits = fit_monthly_models(incident_time_series_mth),

    forecast_input = forecast_input_df(incident_time_series_mth,
                                       years_ahead,
                                       forecast_epoch,
                                       data_epoch),

    future_rates = forecast_means(monthly_model_fits, forecast_input),

    future_states = mutate(future_rates, n_incidents = round(.fitted)),
    ## not a true realisation of a simulation at the forecast rate. instead of
    ## this I just output a rounded version of the rate, assuming simulations
    ## would converge to this.

    output_db = write_db(future_states, file_out(output_db_file))
  )

Where output_db_file is defined in my make.R: output_db_file <- "demand_forecast.db". If I do this, drake no longer recognises the output file as a target. I.e. if I deleted it, it is not regenerated.

My understanding from the help for file_out() is that strings_in_dots is deprecated. Please help me understand what the correct way to specify a target output file is. I may be able to suggest some changes to the documentation from there.

@wlandau
Copy link
Member

wlandau commented Feb 28, 2019

TL;DR

Sorry about the confusion, @MilesMcBain. Your first usage of file_out() is completely correct. In the current development version, which will go to CRAN on March 10, your first plan will work as is.

library(drake)
packageVersion("drake")
#> [1] '6.2.1.9005'
drake_plan(
  output_db = write_db(x, file_out("demand_forecast.db"))
)
#> # A tibble: 1 x 2
#>   target    command                                    
#>   <chr>     <expr>                                     
#> 1 output_db write_db(x, file_out("demand_forecast.db"))

Created on 2019-02-28 by the reprex package (v0.2.1)

But in the current CRAN version (<= 6.2.1), to fully utilize the most current file API (file_out() etc.) users must either

  1. Set strings_in_dots = "literals" inside every call to drake_plan() that has strings, or
  2. Call pkgconfig::set_config("drake::strings_in_dots" = "literals") once per session.
library(drake)
packageVersion("drake")
#> [1] '6.2.1'
drake_plan(
  output_db = write_db(x, file_out("demand_forecast.db")),
  strings_in_dots = "literals"
)
#> # A tibble: 1 x 2
#>   target    command                                        
#>   <chr>     <chr>                                          
#> 1 output_db "write_db(x, file_out(\"demand_forecast.db\"))"

Created on 2019-02-28 by the reprex package (v0.2.1)

This odd requirement has to do with the process of gradually depreciating strings_in_dots.

Background

strings_in_dots dates back to the early days of drake in late 2016. Aspiring to remake (whose development had waned prematurely) I wanted drake to detect each target's dependencies implicitly from each command. That made files tricky. Some character strings were files names, while others had no special meaning. Knowing almost no metaprogramming at the time, I decided that single-quoted strings would denote files and double-quoted ones would denote regular literals. As I inevitably found out, parse() turns single quotes into double quotes, so I implemented the strings_in_dots argument to fight the parser. Not my best design choice.

plan_fragment_1 <- drake_plan(
  data = read.csv('data.csv')
)
plan_fragment_2 <- drake_plan(
  out = analyze_data(data, method = "rf"),
  strings_in_dots = "literals"
)
full_plan <- rbind(plan_fragment_1, plan_fragment_2)

In early 2018, @krlmlr suggested that we use language to mark input and output files (#232). I then learned more about static code analysis and implemented file_out(), file_in(), and knitr_in().

drake_plan(
  data = read.csv(file_in("data.csv")),
  out = analyze_data(data, method = "rf"),
  strings_in_dots = "literals"
)

From that point on, I have been trying to gracefully deprecate the single-quoted file interface and switch to the newer API (file_out(), file_in(), and knitr_in()). To use that new API, all strings must be interpreted as literals. To make sure old projects did not abruptly break, however, the default value of strings_in_dots unfortunately needed to remain "filenames" for a while longer. In version 7.0.0, we will finally be rid of strings_in_dots altogether. (drake_plan(strings_in_dots = "some_value") will throw a deprecation warning and otherwise have no effect.)

Documentation

I try to keep the pkgdown site and the manual up to date with the development version, which unfortunately means the CRAN version is not always synced. This was not a problem originally when the vignettes were part of drake itself, but then the docs grew too big to fit comfortably in the actual package (#332). I wonder if there is a way to host two different branches of the same repo at two different URLs using GitHub Pages.

@wlandau wlandau closed this as completed Feb 28, 2019
@wlandau
Copy link
Member

wlandau commented Feb 28, 2019

From this post, it looks like repos and URLs are 1:1.

@MilesMcBain
Copy link
Contributor Author

Okay cool, so I have come in at an awkward moment. I really appreciate the explanation @wlandau, the warning now makes sense to me with that context.

My first experience with drake has been otherwise extremely positive and I am enjoying it a lot. 👍

@wlandau
Copy link
Member

wlandau commented Mar 1, 2019

I think your timing is great, re #761 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants