-
Notifications
You must be signed in to change notification settings - Fork 0
/
company.team.api_python_sql.yml
63 lines (57 loc) · 2.01 KB
/
company.team.api_python_sql.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
id: api_python_sql
namespace: company.team
tasks:
- id: api
type: io.kestra.plugin.core.http.Request
uri: https://dummyjson.com/products
- id: python
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: python:slim
beforeCommands:
- pip install polars
outputFiles:
- products.csv
warningOnStdErr: false
script: |
import polars as pl
data = {{ outputs.api.body | jq('.products') | first }}
df = pl.from_dicts(data)
df.glimpse()
df.select(["brand", "price"]).write_csv("products.csv")
- id: sql_query
type: io.kestra.plugin.jdbc.duckdb.Query
inputFiles:
in.csv: "{{ outputs.python.outputFiles['products.csv'] }}"
sql: |
SELECT brand, round(avg(price), 2) as avg_price
FROM read_csv_auto('{{ workingDir }}/in.csv', header=True)
GROUP BY brand
ORDER BY avg_price DESC;
store: true
extend:
title: Extract data from a REST API, process it in Python with Polars in a
Docker container, then run DuckDB query and preview results as a table in
the Outputs tab
description: >-
This flow will download a file from a REST API, process it with Python and
SQL, and store the result in the internal storage.
- the `api` http Request task uses a public API — to interact with a private
API endpoint, check the task documentation for examples on how to
authenticate your request
- the `python` task runs in a separate Docker container and installs
`polars` before starting the script
- the DuckDB query task uses data from a previous task and outputs the query
result to internal storage.
The Outputs tab provides a well-formatted table with the query results.
tags:
- API
- Python
- DuckDB
- SQL
- Outputs
ee: false
demo: true
meta_description: This flow will download a file from a REST API, process it
with Python and SQL, and store the result in the internal storage.