Skip to content

Commit

Permalink
Support running actions after insertion
Browse files Browse the repository at this point in the history
  • Loading branch information
francojreyes committed Feb 20, 2024
1 parent baae05e commit bba9e8a
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 17 deletions.
25 changes: 13 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,16 +169,17 @@ When a table is created, it is automatically tracked in Hasura and added to the

#### Parameters

| name | type | required | description |
|------------------------|--------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `metadata` | object | Yes | Instructions for creating/inserting into PostgreSQL tables. |
| `metadata.table_name` | str | Yes | Name of table to create/insert into.<br/><br/>Must match name of table created in `metadata.sql_up` (case insensitive). |
| `metadata.columns` | list[str] | Yes | List of column names that require insertion.<br/><br/>Must match column names in table created in `metadata.sql_up`, as well as the keys of each object in `payload` (case sensitive). |
| `metadata.write_mode` | str | No | One of `"overwrite"` or `"append"`.<br/><br/>Defaults to `"overwrite"`. |
| `metadata.sql_execute` | str | No | SQL command to run *before* the insertion. |
| `metadata.sql_up` | str | Yes | SQL commands used to set UP (create) a table to store the scraped data, as well as any related data types. |
| `metadata.sql_down` | str | Yes | SQL commands to tear DOWN (drop) all objects created by `metadata.sql_up`.<br/><br/>Should use the CASCADE option when dropping, otherwise the script may fail unexpectedly when other tables rely on this one. |
| `payload` | list[object] | Yes | List of objects to insert into the database.<br/><br/>Ideally, this is simply the JSON output of the scraper. |
| name | type | required | description |
|-----------------------|--------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `metadata` | object | Yes | Instructions for creating/inserting into PostgreSQL tables. |
| `metadata.table_name` | str | Yes | Name of table to create/insert into.<br/><br/>Must match name of table created in `metadata.sql_up` (case insensitive). |
| `metadata.columns` | list[str] | Yes | List of column names that require insertion.<br/><br/>Must match column names in table created in `metadata.sql_up`, as well as the keys of each object in `payload` (case sensitive). |
| `metadata.write_mode` | str | No | One of `"overwrite"` or `"append"`.<br/><br/>Defaults to `"overwrite"`. |
| `metadata.sql_before` | str | No | SQL command to run *before* the insertion. |
| `metadata.sql_after` | str | No | SQL command to run *after* the insertion. |
| `metadata.sql_up` | str | Yes | SQL commands used to set UP (create) a table to store the scraped data, as well as any related data types. |
| `metadata.sql_down` | str | Yes | SQL commands to tear DOWN (drop) all objects created by `metadata.sql_up`.<br/><br/>Should use the CASCADE option when dropping, otherwise the script may fail unexpectedly when other tables rely on this one. |
| `payload` | list[object] | Yes | List of objects to insert into the database.<br/><br/>Ideally, this is simply the JSON output of the scraper. |


#### Example Request
Expand Down Expand Up @@ -208,9 +209,9 @@ If you want to connect multiple scrapers to the same table, for example if you h

Both scrapers should maintain an up-to-date copy of the `sql_up` and `sql_down` commands sent to Hasuragres. Furthermore, if you need to update these commands, please be sure to update all scrapers around the same time without much delay between each. If at any point the scrapers have different versions of the SQL, then any inserts will simply drop the table and all data from the other scraper(s).

It is also important that you make use of the `sql_execute` and `write_mode` fields of the insert metadata. By default, inserts are set to truncate the table they insert to, which would only allow data from one scraper at any one time. For multiple scrapers, they should each be in `"append"` mode so that scrapers can add on to the data from other scrapers.
It is also important that you make use of the `sql_before` and `write_mode` fields of the insert metadata. By default, inserts are set to truncate the table they insert to, which would only allow data from one scraper at any one time. For multiple scrapers, they should each be in `"append"` mode so that scrapers can add on to the data from other scrapers.

Also, `sql_execute` should contain commands(s) to remove only those rows that were previously inserted by the scraper - it may be useful to add some field to the schema that identifies the source of each row if there is no easy way to distinguish between the data sources.
Also, `sql_before` should contain commands(s) to remove only those rows that were previously inserted by the scraper - it may be useful to add some field to the schema that identifies the source of each row if there is no easy way to distinguish between the data sources.

## Testing Scrapers

Expand Down
11 changes: 6 additions & 5 deletions app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@

class Metadata(BaseModel):
table_name: str
sql_execute: Optional[str] = Field(None, description='command to execute before running anything else')
sql_before: Optional[str] = Field(None, description='command to execute before running the insert')
sql_after: Optional[str] = Field(None, description='command to execute after running the insert')
sql_up: str # SQL to set UP table and related data types/indexes
sql_down: str # SQL to tear DOWN a table (should be the opp. of up)
columns: list[str] # list of column names that require insertion
Expand Down Expand Up @@ -149,16 +150,16 @@ def insert(metadata: Metadata, payload: list[Any]):
raise HTTPException(status_code=400, detail=err_msg)

try:
# Execute whatever SQL is required
if metadata.sql_execute:
cur.execute(metadata.sql_execute)
if metadata.sql_before:
cur.execute(metadata.sql_before)

execute_upsert(metadata, payload)

if metadata.write_mode == 'overwrite':
# Delete rows not in payload
execute_delete(metadata, payload)

if metadata.sql_after:
cur.execute(metadata.sql_after)
except (Exception, Error) as error:
err_msg = "Error while inserting into PostgreSQL table: " + str(error)
print(err_msg)
Expand Down

0 comments on commit bba9e8a

Please sign in to comment.