A Flyte SDK (v2) version of this plugin is available as flyteplugins-pandera.
Pandera
flytekitplugins-pandera
Flytekit python natively supports many data types, including a FlyteSchema type for type-annotating pandas DataFrames. The Flytekit Pandera plugin provides an alternative for defining DataFrame schemas by integrating with Pandera, a runtime data validation tool for pandas DataFrames.
pip install flytekitplugins-panderaQuick Start(example, may need adjustment)
See full examplespip install flytekitplugins-pandera
from flytekit import task, workflow
from flytekitplugins.pandera import ValidationConfig, PandasReportRenderer, PanderaPandasTransformer
config = ValidationConfig(...)
@task
def my_task() -> None:
...
@workflow
def my_workflow() -> None:
my_task()Available Imports (3)
Configuration type for Pandera.
extends dataclass — configuration or data structure for plugin setup
from flytekitplugins.pandera import ValidationConfig
Configuration type for Pandera.
extends class — configuration or data structure for plugin setup
from flytekitplugins.pandera import PandasReportRenderer
Configuration type for Pandera.
extends TypeTransformer — converts python types to/from flyte-native types
from flytekitplugins.pandera import PanderaPandasTransformer
Dependencies
Related Plugins
Great Expectations
Great Expectations helps enforce data quality. The plugin supports the usage of Great Expectations as task and type.
whylogs
whylogs is an open source library for logging any kind of data. With whylogs,
Dask
Flyte can execute dask jobs natively on a Kubernetes Cluster, which manages the virtual dask cluster's lifecycle
Modin
Modin is a pandas-accelerator that helps handle large datasets. It is a light-weight extension that is similar to the pandas API. It uses the concept of parallelism to reduce overhead, and improve the performance of pandas operations by leveraging the compute resources available.