Skip to content

rxp017/lean-mcts-windows

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Lean-MCTS-Windows: Autonomous Formal Verification Engine

Bypassing OS constraints to build a programmatic bridge between Python and the Lean 4 proof assistant on native Windows.


🚀 Overview

This repository hosts a fully functional, local Monte Carlo Tree Search (MCTS) loop running over the Lean 4 formal theorem prover, engineered natively for Windows. It couples a programmatic feedback loop (utilizing LeanDojo) with a stateful model rotation strategy via the Gemini API to search for formal mathematical proofs.

Project Status: Paused. The core infrastructure, OS-level patches, interactive REPL backtracking, and search queue are fully built and verified. The mathematical logic scaling is open-sourced for the community to solve.


🏗️ System Architecture

graph TD
    A[MCTS Search Queue] -->|Select Node| B(State Expansion)
    B -->|Current Proof State| C[Gemini LLM Arsenal]
    C -->|Rotate Models/Bypass RPM| D{Generate 3 Tactics}
    D -->|Tactic 1| E[Lean 4 Compiler Physics Engine]
    D -->|Tactic 2| E
    D -->|Tactic 3| E
    E -->|ProofFinished| F[🏆 Success & Path Closed]
    E -->|TacticState| G[✅ Add new node to Queue]
    E -->|Rejected/Error| H[❌ Drop branch / Hermes Backtrack]
Loading

🛠️ Windows-Native Engineering & Achievements

Existing formal verification research tools are heavily Unix-centric. Porting these architectures to a native Windows environment requires solving complex system-level constraints:

1. 📂 Bypassing Windows File-Locking (WinError 32)

  • The Issue: Python's standard tempfile.NamedTemporaryFile retains an active write-lock on the file handle. On Windows, this lock blocks the spawned Lean compiler process from opening and reading the modified source code.
  • The Fix: We patched the Dojo lifecycle to explicitly .close() file handles immediately after writing the tactic state modifications, cleanly releasing locks before launching the compiler, and safely removing them in try...finally blocks.

2. ⚡ Patched PopenSpawn Process Lifecycles

  • The Issue: pexpect.spawn is unavailable on Windows, forcing the use of pexpect.popen_spawn.PopenSpawn. However, PopenSpawn lacks the .isalive() and .exitstatus attributes used by LeanDojo to manage compiler states, causing immediate crashes.
  • The Fix: Patched the process management interface to query the underlying process state directly using .poll() is None and retrieve exit codes via .returncode.

3. 🔠 Standard Stream Unicode Safety

  • The Issue: Windows console outputs default to legacy codepages (e.g. CP1252), causing fatal crashes when the Python engine attempts to print Lean’s formal logic symbols (such as the turnstile , implication , or complex field ).
  • The Fix: Reconfigured sys.stdout and sys.stderr streams to use utf-8 and monkey-patched tempfile.NamedTemporaryFile to enforce UTF-8 text writing.

4. 🔀 Hermes Backtracking Protocol

  • The Issue: Purely linear tactic insertion is fragile; if the prover gets stuck, it cannot recover.
  • The Fix: Implemented a stateful backtrack stack (hermes_loop). The engine allows manual or automated agent logic to issue undo commands, popping the active compiler state and safely reverting the proof depth to explore alternative logical branches.

5. 🔄 API Round-Robin Quota Management

  • The Issue: High-frequency tree search queries trigger fast rate-limiting (RPM/TPM caps) on free-tier LLM API endpoints.
  • The Fix: Integrated a round-robin rotation mechanism (AVAILABLE_MODELS) that dynamically cycles through Gemini models (gemini-3.5-flash, gemini-3.1-flash-lite, gemini-2.5-flash, etc.) to distribute request load.

🛑 The Brutal Reality: Why We Paused

While we successfully pointed this local framework at open problems like the Riemann Hypothesis, scaling a general-purpose LLM to solve frontier mathematics revealed several hard truths:

  1. Hallucination is Not "Solved" by compilers: While Lean 4 acts as an absolute logical bouncer (rejecting invalid moves), it cannot guide proof search strategy. General-purpose LLMs without specialized mathematical search priors tend to perform a random walk in a mathematically infinite search space.
  2. The "Sorry" Stub Illusion: Stubbing complex functions (like the Riemann Zeta function) with sorry creates an un-provable wall. If the logical foundations and definition structures of the mathematics are not fully defined in the environment, the LLM cannot "unfold" or reason about them.
  3. General LLMs vs. Specialized Policy Networks: High-tier mathematical verification (e.g., DeepMind's AlphaProof) requires training specialized policy and value networks on formal mathematical libraries to narrow search branching. API calls to generic chat models are insufficient for complex theorem search spaces.

🚀 Next Steps & Community Roadmap

To take this framework to the next level, we invite the community to fork this repository and build:

  1. MiniF2F / Lean Workbook Integration: Replace the Riemann Hypothesis target with the MiniF2F benchmark dataset. Test the engine's true search capabilities on a suite of solvable, self-contained olympiad-level math problems.
  2. Local vLLM Integration: Strip the cloud API round-robin and host a local specialized prover model (like DeepSeek-Prover or InternLM-Math) to run local, zero-cost, high-velocity search.
  3. Parallel Value Network Scoring: Implement a true Monte Carlo Tree Search that scores proof states using a value network to prune bad branches early.

📚 Reference Implementations & Literature

If you are developing in this space, study the state-of-the-art architectures:


📬 Contact & Contributions

Contributions, forks, and discussions are welcome. Let's build the future of automated reasoning.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors