Skip to content

Latest commit

 

History

History
93 lines (75 loc) · 2.66 KB

README.md

File metadata and controls

93 lines (75 loc) · 2.66 KB

dbt-materialization-load-with

A dbt package for materialization with data loading Note: This is a PoC-like DBT package for Redshift.

This DBT package mainly provides materialization for Redshift. There are two materializations in the package, load_table and load_incremental. These are based on the concept of loading data into a temporary table using Redshift's COPY command, and then executing the table or incremental materialization based on the loaded temporary table.

Installation

Add to your packages.yml

packages:
  - git: "https://github.com/mashiike/dbt-materialization-load-with"
    revision: v0.0.0

It is also useful to prepare the following macro.

{% macro is_incremental() %}
  {{ return(materialization_load_with.is_incremental()) }}
{% endmacro %}

{% macro load_temporary_table() %}
  {{ return(materialization_load_with.load_temporary_table()) }}
{% endmacro %}

QuickStart

Assuming you have a CSV like the following, when you load it, it will look like this

csv:

id,name,age
1,hoge,18
2,fuga,28
3,piyo,38

simple_load_table.sql:

{{config(
    materialized='load_table',
    load_columns=[
      {'name':'id',   'data_type':'integer'},
      {'name':'name', 'data_type':'varchar'},
      {'name':'age', 'data_type':'integer'},
    ],
    from='s3://example-com/path/to/csv',
    iam_role=env_var('IAM_ROLE_ARN'),
    copy_option="REGION '"~env_var('AWS_DEFAULT_REGION','us-east-1')~"' FORMAT csv IGNOREHEADER 1",
)}}

select *
from {{load_temporary_table()}}

If you want to update it with an addendum, it will look like this simple_load_incremental.sql:

{{config(
    materialized='load_table',
    load_columns=[
      {'name':'id',   'data_type':'integer'},
      {'name':'name', 'data_type':'varchar'},
      {'name':'age', 'data_type':'integer'},
    ],
    from='s3://example-com/path/to/csv',
    iam_role=env_var('IAM_ROLE_ARN'),
    copy_option="REGION '"~env_var('AWS_DEFAULT_REGION','us-east-1')~"' FORMAT csv IGNOREHEADER 1",
)}}

select *
from {{load_temporary_table()}} as l
{%- if is_incremental() %}
where not exists (
  select 1 
  from {{ this }} as t
  where l.id = t.id
)
{%- endif %}

Note: In the case of load_incremental, it is difficult to switch the config dynamically; it is very useful if the from is fixed or can be specified by an environment variable.

LICENSE

MIT

However, some of the code has been modified from https://github.com/dbt-labs/dbt-core, the original license of which is here