Feat/add optional prek prepush by tetelio · Pull Request #3997 · dlt-hub/dlt

tetelio · 2026-05-29T11:46:32Z

Not a general prek/pre-commit rollout. This PR does not adopt prek as an incremental formatter or per-file lint hook on staged changes. We use prek only to install a pre-push hook; the logic lives in tools/prek.py — a scoped fingerprint gate that runs full make fl / make test-common-p when needed. One clear purpose: optional local safety before push, without re-running those commands on every push.

Summary

Remote destination CI is slow and expensive. When those jobs start earlier in the pipeline, pushing without local lint and common tests costs everyone — failed runs block the queue and burn shared minutes.

This PR adds an optional, opt-in pre-push helper (prek) so developers can catch issues before push without re-running heavy checks every time. CI is unchanged. Nothing is enforced repo-wide.

What it does

Check	Runs when stale on push
Lint	`make fl`
Common tests	`make test-common-p`

For each check, the gate hashes tracked files in scope (see [tool.dlt.prepush.fingerprints.*] in pyproject.toml) and compares to local pass history in .prek/.state.toml (gitignored, up to 50 fingerprints per check for branch hopping).

Pre-push only: git push skips a check when its fingerprint is already in history. Out-of-scope edits (e.g. .github/workflows/) do not invalidate lint.
Manual commands always run: make fl and make test-common-p always execute in full when you invoke them. The cache is not consulted — only the hook (git push), make prek, and make prek-dry use it.
Manual runs still record: After make install-prepush-hooks, successful manual runs update the same pass history, so the next push can skip.
Lint before tests when both are due. Bypass once: git push --no-verify.

Setup

make dev (and cd docs && make dev for docs lint in make fl)
Copy .prek/local.example.toml → .prek/local.toml
Set mode per check: off / auto / confirm (default example: lint auto, common tests off)
make install-prepush-hooks

Remove with make uninstall-prepush-hooks. All local config and state stay gitignored.

Modes

Mode	When scope is stale on push
`off`	Skip this check
`auto`	Run the command; abort push on failure
`confirm`	Prompt first; declining aborts push

Preview: make prek-dry. Run the gate now: make prek.

More detail: .prek/README.md. Short pointer in CONTRIBUTING.md.

Examples

Stale tree — hook runs lint

# edit dlt/common/schema.py, commit, push (no make fl since edit)
git push → make fl (~10–20s) → pass recorded → push continues
# next push, same scoped tree → lint skipped

Already ran locally — push skips

make fl                    # always full run; records pass on success
git push                   # fingerprint in history → no make fl on push

Failure — push blocked, no state update

git push → make fl fails → push aborted → fix, commit, push again

cloudflare-workers-and-pages · 2026-05-29T11:52:50Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	docs	`a2ea0ed`	Commit Preview URL Branch Preview URL	May 29 2026, 08:52 PM

zilto

Typical usage of `prek` / `pre-commit` hooks

This PR shows the little surface required for adding prek: https://github.com/dlt-hub/dlthub/pull/413/changes

If you want to use it, uv run prek install (optional args for pre-push hooks). If you don't, don't install it.

The .pre-commit-config.yaml defines a flat list of things to run. They will automatically be ran in parallel with isolated uv environments (e.g., a venv with ruff only for formatting).

For dlt-hub/dlt, the simplest is probably having several local hooks and pointing to make commands. Then, we can progressively build better solutions
example: https://github.com/dlt-hub/dlthub/blob/366768e9fc89a2bbec7a8634bc77e916df8281f4/.pre-commit-config.yaml#L38 (I suggest using the native ruff-format hooks rather than piping to make format because it removes incremental processing)

No explicit invocation required, it runs on every commit. It runs incrementally on the files that are staged only, it ignores your random files that are pending. It prints are report to stdout.

If you want to skip it for a single commit, add -n / --no-verify

Current path

There seems to be a few layers of abstractions / redirection that we could remove. Each layer we add is something we need to maintain (e.g., tools/prek.py, tools/tests/test_prek.py) and adds cognitive load for humans and agents.

My understanding of the current path (this is not a sequence, this is nesting subprocesses):

make: manually run make fl; this triggers several commands sequentially, in parallel, and conditionally
- python: make fl calls uv run python -m tools.prek with a flag --record lint. This is a custom CLI with custom parsing
  - prek(?): dynamically call the other make command defined in Makefile. By default prek uses its own Python virtual environment, but language: system disables that (in .prek/prek.toml)
    - make: call make test_common_p which will pickup the uv environment to run tests in parallel
      - python: make test_common_p calls pytest

zilto · 2026-05-29T13:03:22Z

The configuration can be defined in pyproject.toml. This would reduce file sprawl.

If devs want to override stuff locally, then can:

not install the prek hook (those are installed under .git/); this PR will have no effect on them

override prek config with a local config that is not committed

zilto · 2026-05-29T13:09:27Z

move to pyproject.toml

zilto · 2026-05-29T13:14:23Z

I have a very hard time treasoning about this file. What is the motivation?

prek is intented to run incrementally. If you set pass_filenames = true, it will only run for files that are on staging.

zilto · 2026-05-29T13:14:43Z

+	prek install --hook-type pre-push --config .prek/prek.toml -f
+	@touch .prek/.enabled
+
+uninstall-prepush-hooks: ## Remove prek pre-push hook (no-op if none; fails if hook is not from prek)


uv run prek uninstall

zilto · 2026-05-29T13:34:49Z

Also, we'll definitely want to unify the env and tooling under ./ and ./docs. Though, maybe this is out of scope for this PR if it adds complexity?

tetelio · 2026-05-29T19:18:02Z

@zilto quick context on this PR.

It targets one problem: we want destination CI to start earlier without everyone clogging the queue with pushes that would fail lint or common tests anyway. prek is just the hook installer and stash helper. The fingerprint cache lives in tools/prek.py because prek cannot do scoped skip logic on its own.

This system is not for devs who already run individual lints and test files. Its for people who want a simple before push safety net so parallel CI can be more aggressive with less risk.

Regarding your current path concern:
Two different worlds:

When you run make fl yourself Make does format and lint like always. If hooks are installed, the Makefile appends one line at the end: python -m tools.prek --record lint. That only hashes scoped files and updates .prek/.state.toml. prek is not involved. Python does not call make again.

When you run make test-common-p yourself Same pattern, separate command. Make runs pytest. Optional --record test_common_p at the end. Nothing to do with make fl.

When you git push This is the only time prek appears. The hook runs python -m tools.prek (no --record). That script checks the fingerprint against pass history. If stale, it subprocesses make fl or make test-common-p. If already recorded, it skips.

So prek is hook installer plus stash on push. tools/prek.py is the cache logic prek cannot do. Manual makes stay normal makes with a tiny record tail. The extra layer only activates on push, not on every make invocation.

zilto · 2026-05-29T19:52:50Z

It targets one problem: we want destination CI to start earlier without everyone clogging the queue with pushes that would fail lint or common tests anyway.

AFAIU, this is the whole point of git hooks / pre-commit / prek. The code stays on my laptop until it succeeds some basic checks. If the code succeeds these checks on my laptop THEN I can push it to GitHub where CI is triggered

The problem

As I understand, the issue is that this DAG is deemed too slow. We would want the later nodes (on the right) to run earlier.

As I explain in details in #3224 our make commands for linting, formatting, and type checking run in sequence and are highly inefficient. With prek, we could run them in parallel, isolated environment with shared uv cache, and fail-early flag. We could even cache the setup step in GitHub Action Artifacts because the versions of prek tools are pinned (mypy, ruff, etc.) and don't change every commit.

It's also possible to rearchitecture the DAG

Concerns

I'm concerned by having complex and custom built developer tooling. This means we have to document everything, test everything, teach and learn a custom tool vs. aligning on standards used by other projects. This is what is currently producing a ton of pain in our docs

I read every file and I'm not confident I understand what this those or the problem it solves.

zilto

prek has a workspace mode that is worth consulting for combining the ./docs and ./ environments

Instead of this custom concept of scopes, a user can simply install the hooks they care about from the flat list in .pre-commit-config.yaml with prek install HOOK

zilto · 2026-05-29T20:00:41Z

 line-length = 100
 preview = true

+[tool.prek.scopes.lint]


If this is a custom tool, it shouldn't write under tool.prek namespace. This would break if prek introduced this in their config.

Also, the word scope has a definition in prek https://prek.j178.dev/configuration/?h=scop#scope-per-project

tetelio · 2026-05-29T20:36:39Z

@zilto i have renamed the scopes so there is no name collision, you were right ther.

On the other hand, this PR already parallelizes make lint a lot. I dont know how much time it took for you in the past, but now it takes 7s to run make lint-paralllel (without web) and it takes 13s to make fl (including docs/website)

The main problem is that you are looking at a problem that i am not trying to solve and measure the PR by if it solves that problem. From your PR #3224 :

Commit hooks should be fast i.e., something you run every time you save a file. This includes formatting, linting, but excludes type checking, testing IMO.

I am simply not addressing that. If you wnat to add precommit hooks for your own dev flow i think its a very good idea (and even it would be nice if you share them).
But this PR is about what happens before I push to remote. And when i push to remote i want to make sure that i have linted like CI expectes and that the "faster" tests that i can run locally have been run. Thats it. Incrementally formatting is just not what the goal is here.

About restructuring the DAG: maybe it could help. But quite more complex.

One last comment:

AFAIU, this is the whole point of git hooks / pre-commit / prek. The code stays on my laptop until it succeeds some basic checks. If the code succeeds these checks on my laptop THEN I can push it to GitHub where CI is triggered

Yes, taht is what hooks are for. But if hooks implmeented the idiomatic way (prek) are not enough to cover our custom needs, then they need custom logic (in this case a fingerprinted history of commands run to not repeeat them at push time).

zilto · 2026-05-29T21:04:33Z

But this PR is about what happens before I push to remote. And when i push to remote i want to make sure that i have linted like CI expectes and that the "faster" tests that i can run locally have been run.

When I read this, I see:

fail_fast: true
repos:
  # hypothetical formatter
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.1
    hooks:
      # Run the formatter
      - id: ruff-format
        args: ["--output-format", "concise"]
        # default stage is pre-commit
        
  # optional pre-push test hook
  - repo: local  # defined in this project
    hooks:
      - id: common-tests
        name: common-tests
        entry: make   
        args: ["test-common-p"]
        language: system  # use local uv project venv instead of prek's venv
        types: ["python"]  # only trigger if commit includes Python files
        pass_filenames: False   # run command "once" instead of "once per file"
        files: [...]  # hook only triggers if files matching this pattern are included in the `git push`
        verbose: true
        stages: ["pre-push"] # run only on `git push`

This is opt-in. As a developer, I can completely ignore this, I can. This is the default config and this can be tuned per developer:

uv run prek install will install only pre-commit hooks; will run before allowing to commit.
git commit -m "my message" -n will skip the commit hook
uv run prek install -t pre-push will also install the "run tests before push"
git push -n to skip the hook
if you don't want to run the hooks on every event, you can manually run them uv run prek run HOOK

zilto · 2026-05-29T21:19:27Z

+    "docs/docs_tools/snippets/lint_setup/mypy.ini",
+]
+
+[tool.dlt.prepush.fingerprints.test_common_p]


shouldn't be dlt in case we introduce configuration in pyproject.toml (requested feature #2377; a pattern found in many tools). I suggest a name that won't have collision like _tools, _devtools, _dlt-dev-tools

tetelio added 7 commits May 29, 2026 09:03

Add prek checks

284058c

Add better structure of commands

038bd47

Better code orga

63e8e02

Fix typing

0c68673

FL

7a71894

Fix makefile and contributing

7a04500

Update toml

3c295c0

tetelio mentioned this pull request May 29, 2026

Feat/even more faster testing #3996

Open

tetelio added 3 commits May 29, 2026 14:22

Use tomli to support 3.10

aa5b630

ADdd tomli to uv lock from docs too

15ba2a0

Parallel linting with make fl

0f1999f

tetelio requested review from burnash, rudolfix and zilto May 29, 2026 12:58

tetelio marked this pull request as ready for review May 29, 2026 12:59

zilto reviewed May 29, 2026

View reviewed changes

tetelio self-assigned this May 29, 2026

tetelio added 3 commits May 29, 2026 21:07

Move config to pyproject.toml. Add multiline cache

1d624e1

Fix docs

2e0f2ae

FL

96708d4

This comment has been minimized.

Sign in to view

zilto reviewed May 29, 2026

View reviewed changes

tetelio added 2 commits May 29, 2026 22:17

Rename scopes

9363cf9

FL

a2ea0ed

zilto reviewed May 29, 2026

View reviewed changes

Conversation

tetelio commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

Setup

Modes

Examples

Uh oh!

cloudflare-workers-and-pages Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

zilto left a comment

Choose a reason for hiding this comment

Typical usage of prek / pre-commit hooks

Current path

Uh oh!

zilto May 29, 2026

Choose a reason for hiding this comment

Uh oh!

zilto May 29, 2026

Choose a reason for hiding this comment

Uh oh!

zilto May 29, 2026

Choose a reason for hiding this comment

Uh oh!

zilto May 29, 2026

Choose a reason for hiding this comment

Uh oh!

zilto commented May 29, 2026

Uh oh!

This comment has been minimized.

tetelio commented May 29, 2026

Uh oh!

zilto commented May 29, 2026

The problem

Concerns

Uh oh!

zilto left a comment

Choose a reason for hiding this comment

Uh oh!

zilto May 29, 2026

Choose a reason for hiding this comment

Uh oh!

tetelio commented May 29, 2026

Uh oh!

zilto commented May 29, 2026

Uh oh!

zilto May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tetelio commented May 29, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented May 29, 2026 •

edited

Loading

Typical usage of `prek` / `pre-commit` hooks

zilto May 29, 2026 •

edited

Loading