Skip to content

adamcharnock/deslop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deslop

A lint-style detector for common AI writing patterns. Built on spaCy for grammatical understanding (POS tags, dependency parsing, lemmatisation) plus targeted regex for character-level patterns. No LLM in the loop. Designed to run as a pre-commit hook so that when you (or your coding agent) write prose, the obvious AI tells get caught before you ship.

What it catches

Seven categories of detection, drawn from Wikipedia's Signs of AI writing, the Max Planck (2024) word-frequency study, and community-cataloged AI tells:

Lexical / phrase rules (lemma-aware via spaCy)

Rule What it flags
SLOP001 Cliché phrases (a testament to, navigate the complexities of, ~50 more)
SLOP002 Single-word AI tells (delve, tapestry, multifaceted, ...)
SLOP003 Density-flagged words (robust, pivotal, leverage, ... when overused)
SLOP020 Parallelism clichés (not just X but Y, it's not X, it's Y, ...)
SLOP021 Imperative-negation parallelism (Sandbox the call. Not the process.)
SLOP022 Cliché sentence openers (Certainly, Moreover, Furthermore, ...)

Punctuation / formatting rules

Rule What it flags
SLOP010 Em-dash density (only when both density AND raw count are high)
SLOP011 Exclamation density
SLOP012 Em-dash listicle (dash followed by 3+ comma-separated items)
SLOP030 Emoji-led bullet points
SLOP031 Repeated bold-headed bullet pattern (* **Header:** description)
SLOP032 Decorative unicode (, , in prose)

Density / readability rules (catches dense business prose, not just AI tells)

Rule What it flags
SLOP040 Uniform sentence length (low coefficient of variation, info)
SLOP050 Wall-of-text paragraph (over 120 words by default)
SLOP051 Semicolon stitching (more than 1 semicolon per paragraph)
SLOP053 Difficult readability (Flesch reading ease below 40, info)

POS-aware prose rules (require spaCy)

Rule What it flags
SLOP060 Passive voice density (>30% of finite verbs are passive)
SLOP061 Nominalisation overuse (-tion/-ment/-ity density per 100 words)
SLOP062 Weak-verb density (be/have/do/get/make as >50% of root verbs)
SLOP063 Noun-pile syndrome (Kubernetes infrastructure cost optimization strategy, info)

Run deslop --list-rules for the live list.

What it does NOT do

  • It does not prove that text was AI-written. It detects patterns that AI overuses. Heavily edited AI text passes; well-written human prose can also occasionally trip a rule.
  • It does not call any LLM or remote API. Everything runs locally — spaCy's small English model (~50 MB) provides POS tagging and dependency parsing, and the rest is regex and statistics.
  • It is for self-policing your own output, not for accusing others.

If you want a probability score, use a perplexity-based detector. This is a lint, not a classifier.

Install

pip install deslop

This pulls in spaCy 3.7+ and the en_core_web_sm English model (~50 MB) as part of the install — no separate python -m spacy download step needed.

Use as a pre-commit hook

Add to .pre-commit-config.yaml:

repos:
  - repo: https://github.com/adamcharnock/deslop
    rev: v0.1.0
    hooks:
      - id: deslop

The hook runs against staged Markdown / plain-text / RST files on every commit.

Use directly

deslop README.md docs/*.md
deslop --list-rules
echo "in today's fast-paced world we delve into things" | deslop --stdin

Exit code is 0 when clean, 1 when any blocking finding is reported. By default, only warning and error severities block — info findings are reported but pass. Adjust with --fail-on={info,warning,error}. Use --max-findings N to allow up to N blocking findings before failing.

Configure

Either a .deslop.toml at repo root, or a [tool.deslop] table in pyproject.toml:

[tool.deslop]
# Em-dash density: requires BOTH density and raw-count to fire, so two dashes
# in a short paragraph are stylistic, not flagged.
em_dash_per_500_words = 14
min_em_dashes = 6
exclamation_per_500_words = 2
sentence_cv_threshold = 0.35
min_sentences_for_uniformity = 8
overused_word_threshold = 3
min_words_for_density = 200
# Density / readability:
max_paragraph_words = 120
max_semicolons_per_paragraph = 1
flesch_reading_ease_floor = 40
disable = ["SLOP032", "SLOP040"]

Inline ignores

Per-line escape, mirroring how other linters do it:

We delve into things. <!-- deslop: ignore -->
We delve into things. <!-- deslop: ignore=SLOP002 -->

How code blocks are handled

Fenced code blocks (``` and ~~~) and inline code spans (`) are blanked out before any rule runs, so prose checks don't fire on identifiers or sample strings.

Caveats and known limits

  • Some flagged words are perfectly legitimate vocabulary. The density-based rule (SLOP003) only fires when a word appears repeatedly in the same document. The single-use list (SLOP002) is intentionally conservative.
  • Sentence segmentation uses a simple regex and can miscount with unusual punctuation. The uniformity rule defaults to info severity for that reason.
  • Regex-based detection is easy to evade. That is the point: this catches unedited slop, which is the failure mode you actually want to avoid when committing prose.

About

Catch AI writing patterns in prose before you commit. A spaCy-based lint with rules for cliché phrases, em-dash listicles, passive voice density, wall-of-text paragraphs, and more. Pre-commit-friendly, runs locally, no LLM in the loop.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages