Skip to content

✨ validate_or_build function introduced#34

Merged
benrutter merged 1 commit intomainfrom
validate-or-build
Apr 11, 2025
Merged

✨ validate_or_build function introduced#34
benrutter merged 1 commit intomainfrom
validate-or-build

Conversation

@benrutter
Copy link
Copy Markdown
Owner

I'm pretty excited about this one, not sure how many people's workflows it'll fit with, but seems like a really great way to get off the ground with introducing testing for a whole range of datasets.

Functionality looks something like this:

import pandas as pd
import wimsey

from somewhere import storage_options

df = (
    pd.read_csv("some_csv.csv")
    .pipe(wimsey.validate_or_build, "s3://my-test-store/some_csv/tests.json")
    .assign(i_dont_know=lambda df: df["just_giving_an_example"] + df["i guess"])
    ...
)

If this is, say, a data engineering pipeline, the first time it runs, it'll build out some tests, and then it'll test against those each further time.

@benrutter benrutter merged commit ca9cac9 into main Apr 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant