Skip to content

Releases: snowplow-incubator/snowplow-bigquery-loader

Version 1.7.0

20 Nov 12:59
Compare
Choose a tag to compare

Since this release new customizable user-agent header is set for BigQuery and Pubsub clients.

Changelog

  • Set GCP user agent header for BQ and Pubsub (#363)

Version 1.6.8

16 Oct 09:55
Compare
Choose a tag to compare

In a few different places, we are sending messages to Pubsub topics in the BQ Loader. Starting from this version, the size of the message will be checked before sending it to topic. If its size exceeds maximum allowed size limit, loader will create SizeViolation bad row that inludes trimmed version of the original message and send it to bad row topic.

Also, this version includes a few dependency bumps for potential security vulnerabilities.

Changelog

  • Create SizeViolation bad row for oversized messages sent to Pubsub (#361)
  • Bump sbt-snowplow-release to 0.3.1 (#353)
  • Bump netty to 4.1.100.Final (#362)
  • Bump org.json to 20231013 (#362)

Version 1.6.7

13 Sep 07:39
Compare
Choose a tag to compare

This version updates Snowplow Analytics SDK to 3.1.0 and mitigates vulnerabilities in downstream libraries.
Analytics SDK 3.1.0 disables strict field length validation, which prevents loader_recovery_error bad rows when certain fields exceed hardcoded limit.

Bump analytics-sdk to 3.1.0 (#356)
Bump google-cloud-bigquery version (#358)

1.6.6

04 Sep 11:09
Compare
Choose a tag to compare
Prepare for 1.6.6 release

Version 1.6.4

27 Jan 12:22
Compare
Choose a tag to compare

This is a patch release to address performance issues found with the previous release, version 1.6.3.

Anyone using the 1.6.3 version may see these kinds of problems with their loader:

  • High number of undelivered pubsub messages
  • Low rate of events getting loaded into BigQuery
  • High CPU usage and memory usage by the loader

Upgrading your loader to version 1.6.4 should fix these problems immediately. Upgrading is as simple as pulling the latest docker images:

docker pull snowplow/snowplow-bigquery-streamloader:1.6.4
docker pull snowplow/snowplow-bigquery-mutator:1.6.4
docker pull snowplow/snowplow-bigquery-repeater:1.6.4

Changelog

  • Bump http4s to 0.23.18 (#337)
  • Remove dropwizard from the config (#341)
  • Remove loadMode from the config (#339)
  • Fix benchmarks (#336)

Version 1.6.3

13 Jan 11:56
Compare
Choose a tag to compare

A maintenance release to bump dependencies to newer versions.

Starting from this release, we are no longer publishing the old deprecated dataflow-variant of the BigQuery loader. We are only supporting the streaming version of the app. See the deprecation notice from July 2022.

Changelog

  • Update copyright notice to 2023 (#333)
  • Bump sbt-snowplow-release to 0.2.1 (#332)
  • Bump protobuf-java to 3.21.12 (#331)
  • Bump netty to 4.1.86.Final (#330)
  • Remove support for dataflow variant of BigQuery loader (#329)
  • Bump jackson to 2.14.1 (#328)
  • Bump http4s to 0.23.17 (#327)

Version 1.6.2

23 Dec 10:06
Compare
Choose a tag to compare

A patch release to fix a bug in the 1.6.1-distroless docker images in which non-ASCII characters would get replaced with the "�" character when loading to BigQuery.

The bug was introduced in version 1.6.1, and only affected the "distroless" docker images, not the regular docker images.

Changelog

  • Bump sbt-snowplow-release to 0.2.0 (#325)
  • Explicitly use UTF-8 charset (#325)

Version 1.6.1

07 Dec 12:47
Compare
Choose a tag to compare

This release adds some resilience improvements to how the loader retries writing events to BigQuery and writing failed events to GCS in case of certain transient errors. We want all Snowplow apps to be robust to error scenarios, so we are always looking for opportunities to make improvements to retries and recoveries.

Changelog

  • Use sbt-snowplow-release plugin to build docker images (#322)
  • Repeater: Retry on errors writing to storage (#301)
  • Streamloader: Retry stopped valid inserts from a failed batch (#320)

Version 1.6.0

29 Nov 09:39
Compare
Choose a tag to compare

This release introduces a cache. We compared 1.6.0 against the latest stable version under high load and found that the cached version was able to load events faster than the control.

Each time an event is loaded into BigQuery in the stream loader, that event will conform to one of a limited number of schemas defined as valid for your pipeline.

We identified an operation that was occurring each time that we loaded an event into BigQuery that only needed to happen the first time each schema is loaded. So, we introduced a cache for that operation.

Version 1.5.2

02 Nov 17:43
Compare
Choose a tag to compare

A maintenance release to bump the docker base image to a newer version.