You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey! Just found out about erdantic through Python Bytes. Looks great and love messing with it so far.
I work with a lot of Pydantic models to facilitate PySpark transformations - given a model, you can read, transform, and validate a raw file or loaded DataFrame against a model. Since it's built on Pydantic, it allows some nice features (nesting models, ease of documentation, etc.) and encourages declarative/composable pipelines.
However, instead of composing fields as collections of other models, I convert data from one model to the next. Most examples look like this:
importdatetimeimportdecimalfrompydanticimportBaseModel, Fieldfrompyspark.sqlimportfunctionsasF# describe raw data as model to facilitate read and preprocessing stepsclassRawFinancialStatement(BaseModel):
acct: str=Field(pattern=r"\d{5}")
descr: strposted: datetime.date=Field(
ge=datetime.date(2024, 1, 1), le=datetime.date(2024, 12, 31)
)
amount: decimal.Decimal# for all files, read-in using model's schema, union together, then transform and validate against modelraw_data=RawFinancialStatement.read(
source=["path/to/file.csv", "path/to/another_file.csv"]
)
# convert intermediate model to expected model for analytical workflowsclassCommonFinancialStatement(BaseModel): # or inherits from a defined business modelaccount_number: str=Field(alias="acct")
account_description: str=Field(alias="descr")
date_effective: datetime.date=Field(alias="posted")
date_posted: datetime.date=Field(alias="posted")
net_amount: decimal.Decimal=Field(alias="amount")
user_posted: str=Field(
default=F.when(F.col("acct").startswith("A"), "USER1").otherwise("USER2")
)
# transform and validate data against modelprocessed_data=CommonFinancialStatement.transform(data=raw_data).validate()
Using erdantic, would it be possible to construct an ER diagram between multiple models that simply describe how data is transformed? Please let me know if I need to explain my use case some more - thank you!
The text was updated successfully, but these errors were encountered:
lucas-nelson-uiuc
changed the title
Using erdantic to Display Custom Data Transformations
Feature Request: Using erdantic to Display Custom Data Transformations
Oct 13, 2024
lucas-nelson-uiuc
changed the title
Feature Request: Using erdantic to Display Custom Data Transformations
Feature Request: Visualize Data Transformations Between Pydantic Models
Oct 13, 2024
I'd like to better understand your use case. Some questions here:
Where does the CommonFinancialStatement.transform come from? Is this a custom factory method on the CommonFinancialStatement that you've written?
Where is there metadata that explicitly links the RawFinancialStatement and CommonFinancialStatement models? As a practical consideration, we need this metadata in order to know the relationship between these two models in order to build the diagram.
Can you sketch what you think this diagram would look like?
Hey! Just found out about
erdantic
through Python Bytes. Looks great and love messing with it so far.I work with a lot of Pydantic models to facilitate PySpark transformations - given a model, you can read, transform, and validate a raw file or loaded DataFrame against a model. Since it's built on Pydantic, it allows some nice features (nesting models, ease of documentation, etc.) and encourages declarative/composable pipelines.
However, instead of composing fields as collections of other models, I convert data from one model to the next. Most examples look like this:
Using
erdantic
, would it be possible to construct an ER diagram between multiple models that simply describe how data is transformed? Please let me know if I need to explain my use case some more - thank you!The text was updated successfully, but these errors were encountered: