Skip to content

Feat/add optional prek prepush#3997

Open
tetelio wants to merge 15 commits into
develfrom
feat/add-optional-prek-prepush
Open

Feat/add optional prek prepush#3997
tetelio wants to merge 15 commits into
develfrom
feat/add-optional-prek-prepush

Conversation

@tetelio
Copy link
Copy Markdown
Contributor

@tetelio tetelio commented May 29, 2026

Not a general prek/pre-commit rollout. This PR does not adopt prek as an incremental formatter or per-file lint hook on staged changes. We use prek only to install a pre-push hook; the logic lives in tools/prek.py — a scoped fingerprint gate that runs full make fl / make test-common-p when needed. One clear purpose: optional local safety before push, without re-running those commands on every push.

Summary

Remote destination CI is slow and expensive. When those jobs start earlier in the pipeline, pushing without local lint and common tests costs everyone — failed runs block the queue and burn shared minutes.

This PR adds an optional, opt-in pre-push helper (prek) so developers can catch issues before push without re-running heavy checks every time. CI is unchanged. Nothing is enforced repo-wide.

What it does

Check Runs when stale on push
Lint make fl
Common tests make test-common-p

For each check, the gate hashes tracked files in scope (see [tool.dlt.prepush.fingerprints.*] in pyproject.toml) and compares to local pass history in .prek/.state.toml (gitignored, up to 50 fingerprints per check for branch hopping).

  • Pre-push only: git push skips a check when its fingerprint is already in history. Out-of-scope edits (e.g. .github/workflows/) do not invalidate lint.
  • Manual commands always run: make fl and make test-common-p always execute in full when you invoke them. The cache is not consulted — only the hook (git push), make prek, and make prek-dry use it.
  • Manual runs still record: After make install-prepush-hooks, successful manual runs update the same pass history, so the next push can skip.
  • Lint before tests when both are due. Bypass once: git push --no-verify.

Setup

  1. make dev (and cd docs && make dev for docs lint in make fl)
  2. Copy .prek/local.example.toml.prek/local.toml
  3. Set mode per check: off / auto / confirm (default example: lint auto, common tests off)
  4. make install-prepush-hooks

Remove with make uninstall-prepush-hooks. All local config and state stay gitignored.

Modes

Mode When scope is stale on push
off Skip this check
auto Run the command; abort push on failure
confirm Prompt first; declining aborts push

Preview: make prek-dry. Run the gate now: make prek.

More detail: .prek/README.md. Short pointer in CONTRIBUTING.md.

Examples

Stale tree — hook runs lint

# edit dlt/common/schema.py, commit, push (no make fl since edit)
git push → make fl (~10–20s) → pass recorded → push continues
# next push, same scoped tree → lint skipped

Already ran locally — push skips

make fl                    # always full run; records pass on success
git push                   # fingerprint in history → no make fl on push

Failure — push blocked, no state update

git push → make fl fails → push aborted → fix, commit, push again

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 29, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
docs a2ea0ed Commit Preview URL

Branch Preview URL
May 29 2026, 08:52 PM

@tetelio tetelio requested review from burnash, rudolfix and zilto May 29, 2026 12:58
@tetelio tetelio marked this pull request as ready for review May 29, 2026 12:59
Copy link
Copy Markdown
Collaborator

@zilto zilto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typical usage of prek / pre-commit hooks

This PR shows the little surface required for adding prek: https://github.com/dlt-hub/dlthub/pull/413/changes

If you want to use it, uv run prek install (optional args for pre-push hooks). If you don't, don't install it.

The .pre-commit-config.yaml defines a flat list of things to run. They will automatically be ran in parallel with isolated uv environments (e.g., a venv with ruff only for formatting).

No explicit invocation required, it runs on every commit. It runs incrementally on the files that are staged only, it ignores your random files that are pending. It prints are report to stdout.

If you want to skip it for a single commit, add -n / --no-verify

Current path

There seems to be a few layers of abstractions / redirection that we could remove. Each layer we add is something we need to maintain (e.g., tools/prek.py, tools/tests/test_prek.py) and adds cognitive load for humans and agents.

My understanding of the current path (this is not a sequence, this is nesting subprocesses):

  • make: manually run make fl; this triggers several commands sequentially, in parallel, and conditionally
    • python: make fl calls uv run python -m tools.prek with a flag --record lint. This is a custom CLI with custom parsing
      • prek(?): dynamically call the other make command defined in Makefile. By default prek uses its own Python virtual environment, but language: system disables that (in .prek/prek.toml)
        • make: call make test_common_p which will pickup the uv environment to run tests in parallel
          • python: make test_common_p calls pytest

Comment thread .prek/prek.toml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration can be defined in pyproject.toml. This would reduce file sprawl.

If devs want to override stuff locally, then can:

  1. not install the prek hook (those are installed under .git/); this PR will have no effect on them
  2. override prek config with a local config that is not committed

Comment thread .prek/scopes.toml Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to pyproject.toml

Comment thread tools/prek.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a very hard time treasoning about this file. What is the motivation?

prek is intented to run incrementally. If you set pass_filenames = true, it will only run for files that are on staging.

Comment thread Makefile
prek install --hook-type pre-push --config .prek/prek.toml -f
@touch .prek/.enabled

uninstall-prepush-hooks: ## Remove prek pre-push hook (no-op if none; fails if hook is not from prek)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv run prek uninstall

@zilto
Copy link
Copy Markdown
Collaborator

zilto commented May 29, 2026

Also, we'll definitely want to unify the env and tooling under ./ and ./docs. Though, maybe this is out of scope for this PR if it adds complexity?

@tetelio tetelio self-assigned this May 29, 2026
@blacksmith-sh

This comment has been minimized.

@tetelio
Copy link
Copy Markdown
Contributor Author

tetelio commented May 29, 2026

@zilto quick context on this PR.

It targets one problem: we want destination CI to start earlier without everyone clogging the queue with pushes that would fail lint or common tests anyway. prek is just the hook installer and stash helper. The fingerprint cache lives in tools/prek.py because prek cannot do scoped skip logic on its own.

This system is not for devs who already run individual lints and test files. Its for people who want a simple before push safety net so parallel CI can be more aggressive with less risk.

Regarding your current path concern:
Two different worlds:

When you run make fl yourself Make does format and lint like always. If hooks are installed, the Makefile appends one line at the end: python -m tools.prek --record lint. That only hashes scoped files and updates .prek/.state.toml. prek is not involved. Python does not call make again.

When you run make test-common-p yourself Same pattern, separate command. Make runs pytest. Optional --record test_common_p at the end. Nothing to do with make fl.

When you git push This is the only time prek appears. The hook runs python -m tools.prek (no --record). That script checks the fingerprint against pass history. If stale, it subprocesses make fl or make test-common-p. If already recorded, it skips.

So prek is hook installer plus stash on push. tools/prek.py is the cache logic prek cannot do. Manual makes stay normal makes with a tiny record tail. The extra layer only activates on push, not on every make invocation.

@zilto
Copy link
Copy Markdown
Collaborator

zilto commented May 29, 2026

It targets one problem: we want destination CI to start earlier without everyone clogging the queue with pushes that would fail lint or common tests anyway.

AFAIU, this is the whole point of git hooks / pre-commit / prek. The code stays on my laptop until it succeeds some basic checks. If the code succeeds these checks on my laptop THEN I can push it to GitHub where CI is triggered

The problem

image

As I understand, the issue is that this DAG is deemed too slow. We would want the later nodes (on the right) to run earlier.

As I explain in details in #3224 our make commands for linting, formatting, and type checking run in sequence and are highly inefficient. With prek, we could run them in parallel, isolated environment with shared uv cache, and fail-early flag. We could even cache the setup step in GitHub Action Artifacts because the versions of prek tools are pinned (mypy, ruff, etc.) and don't change every commit.

It's also possible to rearchitecture the DAG

Concerns

I'm concerned by having complex and custom built developer tooling. This means we have to document everything, test everything, teach and learn a custom tool vs. aligning on standards used by other projects. This is what is currently producing a ton of pain in our docs

I read every file and I'm not confident I understand what this those or the problem it solves.

Copy link
Copy Markdown
Collaborator

@zilto zilto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prek has a workspace mode that is worth consulting for combining the ./docs and ./ environments

Instead of this custom concept of scopes, a user can simply install the hooks they care about from the flat list in .pre-commit-config.yaml with prek install HOOK

Comment thread pyproject.toml Outdated
line-length = 100
preview = true

[tool.prek.scopes.lint]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a custom tool, it shouldn't write under tool.prek namespace. This would break if prek introduced this in their config.

Also, the word scope has a definition in prek https://prek.j178.dev/configuration/?h=scop#scope-per-project

@tetelio
Copy link
Copy Markdown
Contributor Author

tetelio commented May 29, 2026

@zilto i have renamed the scopes so there is no name collision, you were right ther.

On the other hand, this PR already parallelizes make lint a lot. I dont know how much time it took for you in the past, but now it takes 7s to run make lint-paralllel (without web) and it takes 13s to make fl (including docs/website)

The main problem is that you are looking at a problem that i am not trying to solve and measure the PR by if it solves that problem. From your PR #3224 :

Commit hooks should be fast i.e., something you run every time you save a file. This includes formatting, linting, but excludes type checking, testing IMO.

I am simply not addressing that. If you wnat to add precommit hooks for your own dev flow i think its a very good idea (and even it would be nice if you share them).
But this PR is about what happens before I push to remote. And when i push to remote i want to make sure that i have linted like CI expectes and that the "faster" tests that i can run locally have been run. Thats it. Incrementally formatting is just not what the goal is here.

About restructuring the DAG: maybe it could help. But quite more complex.

One last comment:

AFAIU, this is the whole point of git hooks / pre-commit / prek. The code stays on my laptop until it succeeds some basic checks. If the code succeeds these checks on my laptop THEN I can push it to GitHub where CI is triggered

Yes, taht is what hooks are for. But if hooks implmeented the idiomatic way (prek) are not enough to cover our custom needs, then they need custom logic (in this case a fingerprinted history of commands run to not repeeat them at push time).

@zilto
Copy link
Copy Markdown
Collaborator

zilto commented May 29, 2026

But this PR is about what happens before I push to remote. And when i push to remote i want to make sure that i have linted like CI expectes and that the "faster" tests that i can run locally have been run.

When I read this, I see:

fail_fast: true
repos:
  # hypothetical formatter
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.1
    hooks:
      # Run the formatter
      - id: ruff-format
        args: ["--output-format", "concise"]
        # default stage is pre-commit
        
  # optional pre-push test hook
  - repo: local  # defined in this project
    hooks:
      - id: common-tests
        name: common-tests
        entry: make   
        args: ["test-common-p"]
        language: system  # use local uv project venv instead of prek's venv
        types: ["python"]  # only trigger if commit includes Python files
        pass_filenames: False   # run command "once" instead of "once per file"
        files: [...]  # hook only triggers if files matching this pattern are included in the `git push`
        verbose: true
        stages: ["pre-push"] # run only on `git push`

This is opt-in. As a developer, I can completely ignore this, I can. This is the default config and this can be tuned per developer:

  • uv run prek install will install only pre-commit hooks; will run before allowing to commit.
  • git commit -m "my message" -n will skip the commit hook
  • uv run prek install -t pre-push will also install the "run tests before push"
  • git push -n to skip the hook
  • if you don't want to run the hooks on every event, you can manually run them uv run prek run HOOK

Comment thread pyproject.toml
"docs/docs_tools/snippets/lint_setup/mypy.ini",
]

[tool.dlt.prepush.fingerprints.test_common_p]
Copy link
Copy Markdown
Collaborator

@zilto zilto May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be dlt in case we introduce configuration in pyproject.toml (requested feature #2377; a pattern found in many tools). I suggest a name that won't have collision like _tools, _devtools, _dlt-dev-tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants