Releases: tidyverse/dbplyr
dbplyr 2.1.0
New features
-
Thanks to @mgirlich, dbplyr gains support for key verbs from tidyr:
pivot_longer()
(#532),pivot_wider()
(#543),expand()
(#538),
complete()
(#538),replace_na()
(#538),fill()
(#566). -
@mgirlich is now a dbplyr author in recognition of his significant and
sustained contributions. -
across()
implementation has been rewritten to support more inputs:
it now translates formulas (#525), works with SQL functions that don't have
R translations (#534), and work withNULL
(#554) -
summarise()
now supports argument.groups
(@mgirlich, #584).
SQL translation
-
All backends:
str_sub()
,substr()
andsubstring()
get better
translations (#577). Most importantly, the results of using negative
locations should match the underlying R implementations more closely. -
MS SQL:
-
as.integer()
andas.integer64()
translations cast first toNUMERIC
to avoid CASTing weirdness (@DavidPatShuiFong, #496). -
Assumes a boolean context inside of
[
(#546) -
str_sub()
withend = -1
now works (#577).
-
-
Redshift:
lag()
andlead()
lose thedefault
parameter since it's
not supported (@hdplsa, #548). -
SQLite: custom translation of
full_join()
andright_join()
(@mgirlich, #536).
Minor improvements and bug fixes
-
RPostgreSQL backend warns if
temporary = TRUE
since temporary tables are
not supported byRPostgres::dbWriteTable()
(#574). -
count()
method provides closer match to dplyr semantics (#347). -
db_connection_describe()
no longer uses partial matching (@mgirlich, #564). -
pull()
no longerselect()
s the result when there's already only
one variable (#562). -
select()
no longer relocates grouping variables to the front
(@mgirlich, #568). and informs when adding missing grouping variables
(@mgirlich, #559). -
tbl.src_dbi(...)
now passed on totbl_sql()
(#530).
dbplyr 2.0.0
(To become dbplyr 2.0.0)
dplyr 1.0.0 compatibility
-
across()
is now translated into individual SQL statements (#480). -
rename()
andselect()
support dplyr 1.0.0 tidyselect syntax (apart from
predicate functions which can't easily work on computed queries) (#502). -
relocate()
makes it easy to move columns (#494) andrename_with()
makes
it easy to rename columns programmatically (#502). -
slice_min()
,slice_max()
, andslice_order()
are now supported.
slice_head()
andslice_tail()
throw clear error messages (#394)
SQL generation
-
Documentation has been radically improved with new topics for each major
verb and each backend giving more details about the SQL translation. -
intersect()
,union()
andsetdiff()
gain anall
argument to add the
ALL
argument (#414). -
Join functions gains a
na_matches
argument that allows you to control
whether or notNA
(NULL
) values match otherNA
values. The default is
"never"
, which is the usual behaviour in databases. You can set
na_matches = "na"
to match R's usual join behaviour (#180). Additional
arguments error (instead of being silently swallowed) (#382). -
Joins now only use aliases where needed to disambiguate columns; this should
make generated queries more readable. -
Subqueries no longer include an
ORDER BY
clause. This is not part of the
SQL spec, and has very limited support across databases. Now such queries
generate a warning suggesting that you move yourarrange()
call later in
the pipeline (#276). (There's one exception:ORDER BY
is still generated
ifLIMIT
is present; this tends to affect the returns rows but not
necessarily their order). -
Subquery names are now scoped within the query. This makes query text
deterministic which helps some query optimisers/cachers (#336). -
sql_optimise()
now can partially optimise a pipeline; due to an unfortunate
bug it previously gave up too easily. -
in_schema()
quotes each input individually (#287) (usesql()
to opt out
of quoting, if needed). AndDBI::Id()
should work anywhere that
in_schema()
does.
SQL translation
-
Experimental new SAP HANA backend (#233). Requires the latest version of odbc.
-
All backends:
-
You can now use
::
in translations, so that (e.g.)dbplyr::n()
is
translated tocount(*)
(#207). -
[[
can now also translate numeric indices (#520). -
%/%
now generates a clear error message; previously it was translated to
/
which is not correct (#108). -
n()
is translated tocount(*)
instead ofcount()
(#343). -
sub_str()
translation is more consistent in edge cases (@ianmcook). -
All
median()
(@lorenzwalthert, #483),pmin()
,pmax()
(#479),sd()
andvar()
functions have anna.rm
argument that warns once when not
TRUE
. This makes them consistent withmean()
andsum()
. -
substring()
is now translated the same way assubstr()
(#378).
-
-
blob vectors can now be used with
!!
and
!!!
operators, for example infilter()
(@okhoma, #433) -
MySQL uses standard SQL for index creation.
-
MS SQL translation does better a distinguishing between bit and boolean
(#377, #318).if
andifelse
once again generateIIF
, creating
simpler expressions.as.*()
function usesTRY_CAST()
instead
ofCAST()
for version 11+ (2012+) (@DavidPatShuiFong, #380). -
odbc no longer translates
count()
; this was an accidental inclusion. -
Oracle translation now depends on Oracle 12c, and uses a "row-limiting"
clause forhead()
. It gains translations fortoday()
andnow()
, and
improvedas.Date()
translation (@rlh1994, #267). -
PostgreSQL: new translations for lubridate period functions
years()
,
months()
,days()
, andfloor_date()
(@bkkkk, #333) and stringr functions
str_squish()
,str_remove()
, andstr_remove_all()
(@shosaco). -
New RedShift translations when used with
RPostgres::Redshift()
. -
SQLite gains translations for lubridate functions
today()
,now()
,
year()
,month()
,day()
,hour()
,minute()
,second()
,yday()
(#262), and correct translation formedian()
(#357).
Extensibility
If you are the author of a dbplyr backend, please see vignette("backend-2")
for details.
-
New
dbplyr_edition()
generic allows you to opt-in to the 2nd edition of
the dbplyr API. -
db_write_table()
now callsDBI::dbWriteTable()
instead of nine generics
that formerly each did a small part:db_create_indexes()
,db_begin()
,
db_rollback()
,db_commit()
,db_list_tables()
,drop_drop_table()
,
db_has_table()
,db_create_table()
, anddb_data_types()
. You can
now delete the methods for these generics.db_query_rows()
is no longer used; it appears that it hasn't been used
for some time, so if you have a method, you can delete it. -
DBI::dbQuoteIdentifier()
is now used instead ofsql_escape_ident()
and
DBI::dbQuoteString()
instead ofsql_escape_string()
. -
A number of
db_*
generics have been replaced with new SQL generation
generics:dplyr::db_analyze()
->dbplyr::sql_table_analyze()
dplyr::db_create_index()
->dbplyr::sql_table_index()
dplyr::db_explain()
->dbplyr::sql_queriy_explain()
dplyr::db_query_fields()
->dbplyr::sql_query_fields()
dplyr::db_save_query()
->dbplyr::sql_query_save()
This makes them easier to test and is an important part of the process of
moving all database generics in dbplyr (#284). -
A number of other generics have been renamed to facilitate the move from
dplyr to dbplyr:dplyr::sql_select()
->dbplyr::sql_query_select()
dplyr::sql_join()
->dbplyr::sql_query_join()
dplyr::sql_semi_join()
->dbplyr::sql_query_semi_join()
dplyr::sql_set_op()
->dbplyr::sql_query_set_op()
dplyr::sql_subquery()
->dbplyr::sql_query_wrap()
dplyr::db_desc()
->dbplyr::db_connection_describe()
-
New
db_temporary_table()
generic makes it easier to work with databases
that require temporary tables to be specially named. -
New
sql_expr_matches()
generic allows databases to use more efficient
alternatives when determine if two values "match" (i.e. like equality but
a pair ofNULL
s will also match). For more details, see
https://modern-sql.com/feature/is-distinct-from -
New
sql_join_suffix()
allows backends to control the default suffixes
used (#254).
Minor improvements and bug fixes
-
All old lazy eval shims have been removed. These have been deprecated for
some time. -
Date-time escaping methods for Athena and Presto have moved to the packages
where they belong. -
Attempting to embed a Shiny reactive in a query now gives a helpful error
(#439). -
copy_lahman()
andcopy_nycflights13()
(and hencenycflights13_sqlite()
)
and friends now return DBI connections rather than the now deprecated
src_dbi()
(#440). -
copy_to()
can nowoverwrite
when table is specified with schema (#489),
and gains anin_transaction
argument used to optionally suppress the
transaction wrapper (#368). -
distinct()
no longer duplicates column if grouped (#354). -
transmute()
now correctly tracks variables it needs when creating
subqueries (#313). -
mutate()
grouping variables no longer generates a downstream error (#396) -
mutate()
correctly generates subqueries when you re-use the same variable
three or more times (#412). -
window_order()
overrides ordering, rather than appending to it.
dbplyr 1.4.4
-
Internally
DBI::dbExecute()
now usesimmediate = TRUE
; this improves
support for session-scoped temporary tables in MS SQL (@krlmlr, #438). -
Subqueries with
ORDER BY
useTOP 9223372036854775807
instead of
TOP 100 PERCENT
on SQL Server for compatibility with Azure Data Warehouse
(#337, @alexkyllo). -
escape()
now supportsblob
vectors using newsql_escape_raw()
generic. It enables using blob variables in
dplyr verbs, for example to filter nvarchar values by UTF-16 blobs
(see r-dbi/DBI#215 (comment)).
(@okhoma, #433) -
Added
setOldClass()
calls for"ident"
and"ident_q"
classes for
compatibility with dplyr 1.0.0 (#448, @krlmlr). -
Postgres
str_detect()
translation uses same argument names as stringr,
and gains anegate
argument (#444). -
semi_join()
andanti_join()
now correctly support thesql_on
argument
(#443, @krlmlr).
dbplyr 1.4.3
-
dbplyr now uses RPostgres (instead of RPostgreSQL) and RMariaDB (instead of
RMySQL) for its internal tests and data functions (#427). -
The Date and POSIXt methods for
escape()
now use exported
sql_escape_date()
andsql_escape_datetime()
generics to allow backend
specific formatting of date and datetime literals. These are used to
provide methods for Athena and Presto backends (@OssiLehtinen, #384, #391). -
first()
,last()
,nth()
,lead()
andlag()
now respect the
window_frame()
(@krlmlr, #366). -
SQL server: new translations for
str_flatten()
(@PauloJhonny, #405). -
SQL server: temporary datasets are now session-local, not global (#401).
-
Postgres: correct
str_detect()
,str_replace()
andstr_replace_all()
translation (@shosaco, #362).
dbplyr 1.4.2
-
Fix bug when partially evaluating unquoting quosure containing a single
symbol (#317) -
Fixes for rlang and dpylr compatibility.
dbplyr 1.4.1
Minor improvements to SQL generation
dbplyr 1.4.0
Breaking changes
-
Error: `con` must not be NULL
: If you see this error, it probably means
that you have probably forgotten to passcon
down to a dbplyr function.
Previously, dbplyr defaulted to usingsimulate_dbi()
which introduced
subtle escaping bugs. (It's also possible I have forgotten to pass it
somewhere that the dbplyr tests don't pick up, so if you can't figure it
out, please let me know). -
Subsetting (
[[
,$
, and[
) functions are no longer evaluated locally.
This makes the translation more consistent and enables useful new idioms
for modern databases (#200).
New features
-
MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/)
and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain
support for window functions, available in Maria DB 10.2, MySQL 8.0, and
SQLite 3.25 (#191). -
Overall, dplyr generates many fewer subqueries:
-
Joins and semi-joins no longer add an unneeded subquery (#236). This is
faciliated by the newbare_identifier_ok
argument tosql_render()
;
the previous argument was calledroot
and confused me. -
Many sequences of
select()
,rename()
,mutate()
, andtransmute()
can
be collapsed into a single query, instead of always generating a subquery
(#213).
-
-
New
vignette("sql")
describes some advantages of dbplyr over SQL (#205) and
gives some advice about writing how to write literal SQL inside of dplyr,
when you you need to (#196). -
New
vignette("reprex")
gives some hints on creating reprexes that work
anywhere (#117). This is supposrted by a newtbl_memdb()
that
matches the existingtbl_lazy()
. -
All
..._join()
functions gain ansql_on
argument that allows specifying
arbitrary join predicates in SQL code (#146, @krlmlr).
SQL translations
-
New translations for some lubridate functions:
today()
,now()
,
year()
,month()
,day()
,hour()
,minute()
,
second()
,quarter()
, ``yday()(@colearendt, @derekmorr). Also added new translation for
as.POSIXct()`. -
New translations for stringr functions:
str_c()
,str_sub()
,
str_length()
,str_to_upper()
,str_to_lower()
, andstr_to_title()
(@colearendt). Non-translated stringr functions throw a clear error. -
New translations for bitwise operations:
bitwNot()
,bitwAnd()
,bitwOr()
,
bitwXor()
,bitwShiftL()
, andbitwShiftR()
. Unlike the base R functions,
the translations do not coerce arguments to integers (@davidchall, #235). -
New translation for
x[y]
toCASE WHEN y THEN x END
. This enables
sum(a[b == 0])
to work as you expect from R (#202).y
needs to be
a logical expression; if not you will likely get a type error from your
database. -
New translations for
x$y
andx[["y"]]
tox.y
, enabling you to index
into nested fields in databases that provide them (#158). -
The
.data
and.env
pronouns of tidy evaluation are correctly translated
(#132). -
New translation for
median()
andquantile()
. Works for all ANSI compliant
databases (SQL Server, Postgres, MariaDB, Teradata) and has custom
translations for Hive. Thanks to @edavidaja for researching the SQL variants!
(#169) -
na_if()
is correct translated toNULLIF()
(rather thanNULL_IF
) (#211). -
n_distinct()
translation throws an error when given more than one argument.
(#101, #133). -
New default translations for
paste()
,paste0()
, and the hyperbolic
functions (these previously were only available for ODBC databases). -
Corrected translations of
pmin()
andpmax()
toLEAST()
andGREATEST()
for ANSI compliant databases (#118), toMIN()
andMAX()
for SQLite, and
to an error for SQL server. -
New translation for
switch()
to the simple form ofCASE WHEN
(#192).
SQL simulation
SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.
-
SQL simulation has been overhauled. It now works reliably, is better
documented, and always uses ANSI escaping (i.e.`
for field
names and'
for strings). -
tbl_lazy()
now actually puts adbplyr::src
in the$src
field. This
shouldn't affect any downstream code unless you were previously working
around this weird difference betweentbl_lazy
andtbl_sql
classes.
It also includes thesrc
class in its class, and when printed,
shows the generated SQL (#111).
Database specific improvements
-
MySQL/MariaDB
-
Translations also applied to connections via the odbc package
(@colearendt, #238) -
Basic support for regular expressions via
str_detect()
and
str_replace_all()
(@colearendt, #168). -
Improved translation for
as.logical(x)
toIF(x, TRUE, FALSE)
.
-
-
Oracle
-
Postgres
- Basic support for regular expressions via
str_detect()
and
str_replace_all()
(@colearendt, #168).
- Basic support for regular expressions via
-
SQLite
explain()
translation now generatesEXPLAIN QUERY PLAN
which
generates a higher-level, more human friendly explanation.
-
SQL server
-
Improved translation for
as.logical(x)
toCAST(x as BIT)
(#250). -
Translates
paste()
,paste0()
, andstr_c()
to+
. -
copy_to()
method applies temporary table name transformation
earlier so that you can now overwrite temporary tables (#258). -
db_write_table()
method uses correct argument name for
passing along field types (#251).
-
Minor improvements and bug fixes
-
Aggregation functions only warn once per session about the use of
na.rm = TRUE
(#216). -
table names generated by
random_table_name()
have the prefix
"dbplyr_", which makes it easier to find them programmatically
(@mattle24, #111) -
Functions that are only available in a windowed (
mutate()
) query now
throw an error when called in a aggregate (summarise()
) query (#129) -
arrange()
understands the.by_group
argument, making it possible
sort by groups if desired. The default isFALSE
(#115) -
distinct()
now handles computed variables likedistinct(df, y = x + y)
(#154). -
escape()
,sql_expr()
andbuild_sql()
no longer acceptcon = NULL
as
a shortcut forcon = simulate_dbi()
. This made it too easy to forget to
passcon
along, introducing extremely subtle escaping bugs.win_over()
gains acon
argument for the same reason. -
New
escape_ansi()
always uses ANSI SQL 92 standard escaping (for use
in examples and documentation). -
mutate(df, x = NULL)
dropsx
from the output, just like when working with
local data frames (#194). -
partial_eval()
processes inlined functions (including rlang lambda
functions). This makes dbplyr work with more forms of scoped verbs like
df %>% summarise_all(~ mean(.))
,df %>% summarise_all(list(mean))
(#134). -
sql_aggregate()
now takes an optional argumentf_r
for passing to
check_na_rm()
. This allows the warning to show the R function name rather
than the SQL function name (@sverchkov, #153). -
sql_infix()
gains apad
argument for the rare operator that doesn't
need to be surrounded by spaces. -
sql_prefix()
no longer turns SQL functions into uppercase, allowing for
correct translation of case-sensitive SQL functions (#181, @mtoto). -
summarise()
gives a clear error message if you refer to a variable
created in that samesummarise()
(#114). -
New
sql_call2()
which is torlang::call2()
assql_expr()
is to
rlang::expr()
. -
show_query()
andexplain()
usecat()
rather than message. -
union()
,union_all()
,setdiff()
andintersect()
do a better job
of matching columns across backends (#183).
dbplyr 1.3.0
- Now support for dplyr 0.8.0 (#190) and R 3.1.0
API changes
-
Calls of the form
dplyr::foo()
are now evaluated in the database,
rather than locally (#197). -
vars
argument totbl_sql()
has been formally deprecated; it hasn't
actually done anything for a while (#3254). -
src
andtbl
objects now include a class generated from the class of
the underlying connection object. This makes it possible for dplyr backends
to implement different behaviour at the dplyr level, when needed. (#2293)
SQL translation
-
x %in% y
is now translated toFALSE
ify
is empty (@mgirlich, #160). -
New
as.integer64(x)
translation toCAST(x AS BIGINT)
(#3305) -
case_when
now translates with a ELSE clause if a formula of the form
TRUE~<RHS>
is provided . (@cderv, #112) -
cummean()
now generatesAVG()
notMEAN()
(#157) -
str_detect()
now uses correct parameter order (#3397) -
MS SQL
-
Oracle
- Custom
db_drop_table()
now only drops tables if they exist (#3306) - Custom
setdiff()
translation (#3493) - Custom
db_explain()
translation (#3471)
- Custom
-
SQLite
- Correct translation for
as.numeric()
/as.double()
(@chris-park, #171).
- Correct translation for
-
Redshift
substr()
translation improved (#3339)
Minor improvements and bug fixes
-
copy_to()
will only remove existing table whenoverwrite = TRUE
and the
table already exists, eliminating a confusing "NOTICE" from PostgreSQL
(#3197). -
partial_eval()
handles unevaluated formulas (#184). -
pull.tbl_sql()
now extracts correctly from grouped tables (#3562). -
sql_render.op()
now correctly forwards thecon
argument (@kevinykuo, #73).
dbplyr 1.2.1
- Forward compatibility fixes for rlang 0.2.0
dbplyr 1.2.0
New top-level translations
-
New translations for
- MS Access (#2946) (@DavisVaughan)
- Oracle, via odbc or ROracle (#2928, #2732, @edgararuiz)
- Teradata.
- Redshift.
-
dbplyr now supplies appropriate translations for the RMariaDB and
RPostgres packages (#3154). We generally recommend using these packages
in favour of the older RMySQL and RPostgreSQL packages as they are
fully DBI compliant and tested with DBItest.
New features
-
copy_to()
can now "copy" tbl_sql in the same src, providing another
way to cache a query into a temporary table (#3064). You can also
copy_to
tbl_sqls from another source, andcopy_to()
will automatically
collect then copy. -
Initial support for stringr functions:
str_length()
,str_to_upper()
,
str_to_lower()
,str_replace_all()
,str_detect()
,str_trim()
.
Regular expression support varies from database to database, but most
simple regular expressions should be ok.
Tools for developers
-
db_compute()
gains ananalyze
argument to matchdb_copy_to()
. -
New
remote_name()
,remote_con()
,remote_src()
,remote_query()
and
remote_query_plan()
provide a standard API for get metadata about a
remote tbl (#3130, #2923, #2824). -
New
sql_expr()
is a more convenient building block for low-level SQL
translation (#3169). -
New
sql_aggregate()
andwin_aggregate()
for generating SQL and windowed
SQL functions for aggregates. These take one argument,x
, and warn if
na.rm
is notTRUE
(#3155).win_recycled()
is equivalent to
win_aggregate()
and has been soft-deprecated. -
db_write_table
now needs to return the table name
Minor improvements and bug fixes
-
Multiple
head()
calls in a row now collapse to a single call. This avoids
a printing problem with MS SQL (#3084). -
escape()
now works with integer64 values from the bit64 package (#3230) -
if
,ifelse()
, andif_else()
now correctly scope the false condition
so that it only applies to non-NULL conditions (#3157) -
ident()
andident_q()
handle 0-length inputs better, and should
be easier to use with S3 (#3212) -
in_schema()
should now work in more places, particularly incopy_to()
(#3013, @baileych) -
SQL generation for joins no longer gets stuck in a endless loop if you
request an empty suffix (#3220). -
mutate()
has better logic for splitting a single mutate into multiple
subqueries (#3095). -
Improved
paste()
andpaste0()
support in MySQL, PostgreSQL (#3168),
and RSQLite (#3176). MySQL and PostgreSQL gain support forstr_flatten()
which behaves likepaste(x, collapse = "-")
(but for technical reasons
can't be implemented as a straightforward translation ofpaste()
). -
same_src.tbl_sql()
now performs correct comparison instead of always
returningTRUE
. This means thatcopy = TRUE
once again allows you to
perform cross-database joins (#3002). -
select()
queries no longer alias column names unnecessarily
(#2968, @DavisVaughan). -
select()
andrename()
are now powered by tidyselect,
fixing a few renaming bugs (#3132, #2943, #2860). -
summarise()
once again performs partial evaluation before database
submission (#3148). -
test_src()
makes it easier to access a single test source.
Database specific improvements
-
MS SQL
-
Better support for temporary tables (@Hong-Revo)
-
Different translations for filter/mutate contexts for:
NULL
evaluation
(is.na()
,is.null()
), logical operators (!
,&
,&&
,|
,||
),
and comparison operators (==
,!=
,<
,>
,>=
,<=
)
-
-
MySQL:
copy_to()
(viadb_write_table()
) correctly translates logical
variables to integers (#3151). -
odbc: improved
n()
translation in windowed context. -
SQLite: improved
na_if
translation (@cwarden) -
PostgreSQL: translation for
grepl()
added (@zozlak)