Feat/add optional prek prepush#3997
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | a2ea0ed | Commit Preview URL Branch Preview URL |
May 29 2026, 08:52 PM |
zilto
left a comment
There was a problem hiding this comment.
Typical usage of prek / pre-commit hooks
This PR shows the little surface required for adding prek: https://github.com/dlt-hub/dlthub/pull/413/changes
If you want to use it, uv run prek install (optional args for pre-push hooks). If you don't, don't install it.
The .pre-commit-config.yaml defines a flat list of things to run. They will automatically be ran in parallel with isolated uv environments (e.g., a venv with ruff only for formatting).
- For
dlt-hub/dlt, the simplest is probably having severallocalhooks and pointing tomakecommands. Then, we can progressively build better solutions - example: https://github.com/dlt-hub/dlthub/blob/366768e9fc89a2bbec7a8634bc77e916df8281f4/.pre-commit-config.yaml#L38 (I suggest using the native
ruff-formathooks rather than piping tomake formatbecause it removes incremental processing)
No explicit invocation required, it runs on every commit. It runs incrementally on the files that are staged only, it ignores your random files that are pending. It prints are report to stdout.
If you want to skip it for a single commit, add -n / --no-verify
Current path
There seems to be a few layers of abstractions / redirection that we could remove. Each layer we add is something we need to maintain (e.g., tools/prek.py, tools/tests/test_prek.py) and adds cognitive load for humans and agents.
My understanding of the current path (this is not a sequence, this is nesting subprocesses):
- make: manually run
make fl; this triggers several commands sequentially, in parallel, and conditionally- python:
make flcallsuv run python -m tools.prekwith a flag--record lint. This is a custom CLI with custom parsing- prek(?): dynamically call the other
makecommand defined inMakefile. By default prek uses its own Python virtual environment, butlanguage: systemdisables that (in.prek/prek.toml)- make: call
make test_common_pwhich will pickup theuvenvironment to run tests in parallel- python:
make test_common_pcallspytest
- python:
- make: call
- prek(?): dynamically call the other
- python:
There was a problem hiding this comment.
The configuration can be defined in pyproject.toml. This would reduce file sprawl.
If devs want to override stuff locally, then can:
- not install the prek hook (those are installed under
.git/); this PR will have no effect on them - override prek config with a local config that is not committed
There was a problem hiding this comment.
I have a very hard time treasoning about this file. What is the motivation?
prek is intented to run incrementally. If you set pass_filenames = true, it will only run for files that are on staging.
| prek install --hook-type pre-push --config .prek/prek.toml -f | ||
| @touch .prek/.enabled | ||
|
|
||
| uninstall-prepush-hooks: ## Remove prek pre-push hook (no-op if none; fails if hook is not from prek) |
|
Also, we'll definitely want to unify the env and tooling under |
This comment has been minimized.
This comment has been minimized.
|
@zilto quick context on this PR. It targets one problem: we want destination CI to start earlier without everyone clogging the queue with pushes that would fail lint or common tests anyway. prek is just the hook installer and stash helper. The fingerprint cache lives in tools/prek.py because prek cannot do scoped skip logic on its own. This system is not for devs who already run individual lints and test files. Its for people who want a simple before push safety net so parallel CI can be more aggressive with less risk. Regarding your current path concern: When you run make fl yourself Make does format and lint like always. If hooks are installed, the Makefile appends one line at the end: python -m tools.prek --record lint. That only hashes scoped files and updates .prek/.state.toml. prek is not involved. Python does not call make again. When you run make test-common-p yourself Same pattern, separate command. Make runs pytest. Optional --record test_common_p at the end. Nothing to do with make fl. When you git push This is the only time prek appears. The hook runs python -m tools.prek (no --record). That script checks the fingerprint against pass history. If stale, it subprocesses make fl or make test-common-p. If already recorded, it skips. So prek is hook installer plus stash on push. tools/prek.py is the cache logic prek cannot do. Manual makes stay normal makes with a tiny record tail. The extra layer only activates on push, not on every make invocation. |
AFAIU, this is the whole point of git hooks / pre-commit / prek. The code stays on my laptop until it succeeds some basic checks. If the code succeeds these checks on my laptop THEN I can push it to GitHub where CI is triggered The problem
As I understand, the issue is that this DAG is deemed too slow. We would want the later nodes (on the right) to run earlier. As I explain in details in #3224 our It's also possible to rearchitecture the DAG ConcernsI'm concerned by having complex and custom built developer tooling. This means we have to document everything, test everything, teach and learn a custom tool vs. aligning on standards used by other projects. This is what is currently producing a ton of pain in our docs I read every file and I'm not confident I understand what this those or the problem it solves. |
zilto
left a comment
There was a problem hiding this comment.
prek has a workspace mode that is worth consulting for combining the ./docs and ./ environments
Instead of this custom concept of scopes, a user can simply install the hooks they care about from the flat list in .pre-commit-config.yaml with prek install HOOK
| line-length = 100 | ||
| preview = true | ||
|
|
||
| [tool.prek.scopes.lint] |
There was a problem hiding this comment.
If this is a custom tool, it shouldn't write under tool.prek namespace. This would break if prek introduced this in their config.
Also, the word scope has a definition in prek https://prek.j178.dev/configuration/?h=scop#scope-per-project
|
@zilto i have renamed the scopes so there is no name collision, you were right ther. On the other hand, this PR already parallelizes The main problem is that you are looking at a problem that i am not trying to solve and measure the PR by if it solves that problem. From your PR #3224 :
I am simply not addressing that. If you wnat to add precommit hooks for your own dev flow i think its a very good idea (and even it would be nice if you share them). About restructuring the DAG: maybe it could help. But quite more complex. One last comment:
Yes, taht is what hooks are for. But if hooks implmeented the idiomatic way (prek) are not enough to cover our custom needs, then they need custom logic (in this case a fingerprinted history of commands run to not repeeat them at push time). |
When I read this, I see: fail_fast: true
repos:
# hypothetical formatter
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.1
hooks:
# Run the formatter
- id: ruff-format
args: ["--output-format", "concise"]
# default stage is pre-commit
# optional pre-push test hook
- repo: local # defined in this project
hooks:
- id: common-tests
name: common-tests
entry: make
args: ["test-common-p"]
language: system # use local uv project venv instead of prek's venv
types: ["python"] # only trigger if commit includes Python files
pass_filenames: False # run command "once" instead of "once per file"
files: [...] # hook only triggers if files matching this pattern are included in the `git push`
verbose: true
stages: ["pre-push"] # run only on `git push`This is opt-in. As a developer, I can completely ignore this, I can. This is the default config and this can be tuned per developer:
|
| "docs/docs_tools/snippets/lint_setup/mypy.ini", | ||
| ] | ||
|
|
||
| [tool.dlt.prepush.fingerprints.test_common_p] |
There was a problem hiding this comment.
shouldn't be dlt in case we introduce configuration in pyproject.toml (requested feature #2377; a pattern found in many tools). I suggest a name that won't have collision like _tools, _devtools, _dlt-dev-tools

Summary
Remote destination CI is slow and expensive. When those jobs start earlier in the pipeline, pushing without local lint and common tests costs everyone — failed runs block the queue and burn shared minutes.
This PR adds an optional, opt-in pre-push helper (prek) so developers can catch issues before push without re-running heavy checks every time. CI is unchanged. Nothing is enforced repo-wide.
What it does
make flmake test-common-pFor each check, the gate hashes tracked files in scope (see
[tool.dlt.prepush.fingerprints.*]inpyproject.toml) and compares to local pass history in.prek/.state.toml(gitignored, up to 50 fingerprints per check for branch hopping).git pushskips a check when its fingerprint is already in history. Out-of-scope edits (e.g..github/workflows/) do not invalidate lint.make flandmake test-common-palways execute in full when you invoke them. The cache is not consulted — only the hook (git push),make prek, andmake prek-dryuse it.make install-prepush-hooks, successful manual runs update the same pass history, so the next push can skip.git push --no-verify.Setup
make dev(andcd docs && make devfor docs lint inmake fl).prek/local.example.toml→.prek/local.tomlmodeper check:off/auto/confirm(default example: lintauto, common testsoff)make install-prepush-hooksRemove with
make uninstall-prepush-hooks. All local config and state stay gitignored.Modes
offautoconfirmPreview:
make prek-dry. Run the gate now:make prek.More detail:
.prek/README.md. Short pointer inCONTRIBUTING.md.Examples
Stale tree — hook runs lint
Already ran locally — push skips
Failure — push blocked, no state update