Rin is a Redshift data Importer by SQS messaging.
- (Someone) creates a S3 object.
- S3 event notifications will send to a message to SQS.
- Rin will fetch messages from SQS, and publish a "COPY" query to Redshift.
$ brew install fujiwara/tap/rin
$ docker pull ghcr.io/fujiwara/rin:v1.1.3
Configuring Amazon S3 Event Notifications.
- Create SQS queue.
- Attach SQS access policy to the queue. Example Walkthrough 1:
- Enable Event Notifications on a S3 bucket.
- Run
rin
process with configuration for using the SQS and S3.
queue_name: my_queue_name # SQS queue name
credentials:
aws_region: ap-northeast-1
redshift:
host: localhost
port: 5439
dbname: test
user: test_user
password: '{{ must_env "REDSHIFT_PASSWORD" }}'
schema: public
reconnect_on_error: true # disconnect Redshift on error occurred
s3:
bucket: test.bucket.test
region: ap-northeast-1
sql_option: "JSON 'auto' GZIP" # COPY SQL option
# define import target mappings
targets:
- s3:
key_prefix: test/foo/ignore
discard: true # Do not import and do not try following targets. Matches only.
- redshift:
table: foo
s3:
key_prefix: test/foo
- redshift:
schema: xxx
table: bar
s3:
key_prefix: test/bar
break: true # Do not try following targets.
- redshift:
schema: $1 # expand by key_regexp captured value.
table: $2
s3:
key_regexp: test/schema-([a-z]+)/table-([a-z]+)/
- redshift:
host: redshift.example.com # override default section in this target
port: 5439
dbname: example
user: example_user
password: example_pass
schema: public
table: example
s3:
bucket: redshift.example.com
region: ap-northeast-1
key_prefix: logs/example/
sql_option: "CSV DELIMITER ',' ESCAPE"
A configuration file is parsed by kayac/go-config.
go-config expands environment variables using syntax {{ env "FOO" }}
or {{ must_env "FOO" }}
in a configuration file.
When the password for Redshift is empty, Rin will try call GetClusterCredentials API to get a temporary password for the cluster.
Rin requires credentials for SQS and Redshift.
credentials.aws_access_key_id
andcredentials.aws_secret_access_key
- used for SQS and Redshift(COPY query and Data API access).
credentials.aws_iam_role
- used for Redshift COPY query only.
- for SQS and Redshift Data API, Rin will try to get a instance credentials.
Rin waits new SQS messages and processing it continually.
$ rin -config config.yaml [-debug]
-config
also accepts HTTP/S3/File URL to specify the location of configuration file.
For example,
$ rin -config s3://rin-config.my-bucket/config.yaml
Rin process new SQS messages and exit.
$ rin -config config.yaml -batch [-debug]
A CLI option -max-execution-time
is set max execution time for running SQS worker and batch process.
Rin has two ways to connect to Redshift.
postgres
driver is the default. Rin connects to Redshift with PostgreSQL protocol over TCP in the VPC network.
host
, port
, user
and password
fields are required in the redshift
section.
redshift:
driver: postgres # default
host: localhost
port: 5439
user: test_user
password: '{{ must_env "REDSHIFT_PASSWORD" }}'
redshift-data
driver connects to Redshift via Redshift Data API.
Redshift Data API does not require a VPC network.
With provisoned cluster, driver
, cluster
and user
are required.
redshift:
driver: redshift-data
cluster: your-cluster-name
user: test_user
With Redshift serverless, driver
, workgroup
are required.
redshift:
driver: redshift-data
workgroup: your-workgroup-name