Skip to content

labcaesar/digraph-of-thoughts

Repository files navigation

Digraph of Thoughts (DoT)

DoT is a high-performance distributed reasoning framework for Large Language Models. It extends the Graph of Thoughts paradigm into a directed acyclic graph (DAG) structure, enabling LLMs to solve complex multi-step problems by exploring parallel reasoning paths, merging insights across branches, and dynamically managing the search budget through a vectorized control policy.

DoT is architected for distributed execution at scale: it uses MPI (via mpi4py) for the hierarchical reasoning orchestration layer and Ray Serve with vLLM for high-throughput, GPU-accelerated model inference.

Key Achievements

DoT bridges the gap between algorithmic reasoning and bare-metal HPC orchestration, shifting the execution bottleneck from synchronous CPU logic strictly back to physical accelerators.

  • Massive Throughput & Cost Reduction: Bare-metal evaluations demonstrate up to a 122.5× speedup over sequential Structured Reasoning Traversals (SRTs). Extrapolated to a 10,000-task heuristic search, DoT compresses the workload makespan from 116 days to 15.5 hours, reducing estimated cloud compute costs by 99%.
  • Elimination of Parallel Starvation: By decoupling state synchronization and deterministic tool execution from the tensor backend, DoT's asynchronous control plane hides high-latency operations behind continuous GPU compute, sustaining 94.0% median GPU pressure.
  • Deterministic Agentic Offloading: Requires all deterministic operations and math to be offloaded to an external Python sandbox (CodeSandbox). This prevents context-window saturation and hallucination loops, allowing the LLM to act purely as a stateless router for latent synthesis.
  • Scale-Out MPI Orchestration: Proven scalability across multi-node, multi-GPU clusters. Dual-payload I/O isolation cuts dispatch bandwidth by 87.5%, while strict tail-latency preemption sustains 67.5% strong-scaling and 77.0% weak-scaling makespan efficiency on 16-node multi-hop topologies.

Documentation

  • System Architecture Overview — The three-tier MPI hierarchy (GlobalOrchestrator → NodeCoordinator → MPIWorker), the Ray Serve gRPC serving layer, and end-to-end task lifecycle.
  • Performance & Throughput — Asynchronous pipelining, sequence-parallel decoding, KV-cache affinity routing, and NUMA topology binding.
  • Accuracy & Reasoning — The 5×5 vectorized controller, speculative ghost nodes, DAG convergence, and deterministic code sandbox verification.
  • Benchmarking & Profiling — Slurm dispatcher workflows, NUMA binding modes, Python cProfile analysis, and PyTorch vLLM tracing.
  • Configuration Reference — A complete breakdown of all config.yaml parameters, default values, and operational behaviors for tuning the framework.

Getting Started

Prerequisites

  • Python 3.10+
  • CUDA-compatible GPUs (for vLLM inference)
  • An MPI implementation (e.g., OpenMPI 4.x+)

Installation

# Clone the repository
git clone https://github.com/eshama1/digraph-of-thoughts.git
cd digraph-of-thoughts

# Run the automated setup script (creates ENV/, installs all dependencies)
bash scripts/utils/setup_env.sh

# Activate the environment
source ENV/bin/activate

Tip

If you already have a pre-configured Python environment (e.g., a shared conda or venv from a different project), you can symlink it to ENV/ to allow the project scripts to access it directly: ln -s /path/to/your/env ENV.

Model Weights

Download model weights to the project root (e.g., ./Llama-3.3-70B-Instruct). The dispatcher uses rsync to synchronize these to $SCRATCH (node-local fast storage) at job startup to reduce multi-node cold-boot time.

Important

Large models (70B+) must be stored in the project root. The dispatcher automatically detects and syncs them. Do not place weights in results/ or ENV/.

Running Locally

For local debugging and small-scale verification, use the lightweight run_dot.py entry point. It runs the full DoT algorithm using a single-process subprocess sandbox — no MPI or Ray cluster required:

python scripts/core/run_dot.py --task sorting --test 032

For distributed execution across a Slurm cluster, see the Benchmarking & Profiling docs.

Repository Structure

digraph_of_thoughts/        # Core package
  algorithm/                # Reasoning algorithm (Scheduler, Controller, Graph, Sandbox)
    core/                   # AsyncPipelinedScheduler, DoTController, PromptManager
    graph/                  # DoTGraph (DAG) and DoTNode state machine
    agentic/                # CodeSandbox, RayExecutor, LocalExecutor
    utils/                  # Config, serialization, XML parsing, logging
  orchestration/            # Distributed execution layer
    core.py                 # DoTEngine — high-level entry point for single/multi-task runs
    mpi/                    # Three-tier MPI hierarchy
      orchestrator.py       # GlobalOrchestrator (Rank 0) — task dispatch and hedging
      coordinator.py        # NodeCoordinator (local Rank 0) — per-node worker pool
      worker.py             # MPIWorker — per-rank reasoning execution
      common.py             # Shared MPI tags, serialization, and persistence helpers
  model_serving/            # vLLM on Ray Serve
    deployment.py           # Ray Serve deployment lifecycle
    vllm_service.py         # gRPC streaming service with metrics collection
    vllm_client.py          # Async gRPC client (used by MPI workers)
    local_service.py        # Single-GPU AsyncLLMEngine backend (no Ray)
    openai_service.py       # OpenAI API compatibility layer
    mock_service.py         # CPU-only synthetic backend for control-plane profiling

scripts/
  core/                     # Primary execution entry points
    run_dot.py              # Lightweight local runner (subprocess sandbox)
    run_distributed_dot.py  # Full MPI+Ray distributed runner
  orchestration/            # Cluster lifecycle management
    deploy_model.sh         # Deploy vLLM on Ray Serve
    wait_for_model.py       # Poll until Ray Serve reaches RUNNING state
    wait_for_workers.py     # Poll until all MPI workers are ready
    shutdown_deployment.py  # Gracefully terminate Ray Serve and flush metrics
  analysis/                 # Post-run result processing and visualization
  benchmarking/             # Sweep scripts, ablations, microbenchmarks, scaling batches
  utils/                    # Setup, proto generation, config override helpers

examples/                   # Task-specific implementations (Sorting, BBEH, etc.)
docs/                       # Extended architectural documentation
benchmark_dispatcher.py     # Root-level Slurm dispatch orchestrator
default_config.yaml         # Default algorithm and serving configuration

Acknowledgments

This project uses evaluation datasets from the Graph of Thoughts (GoT) framework. We thank the authors for open-access benchmarks.

The baseline comparison scripts (IO, CoT, ToT, GoT) in scripts/benchmarking/GoT_test_scripts/ are adapted from the original GoT implementations.

Citation

If you use this framework or the included GoT-based examples in your research, please cite both our work and the original GoT paper:

@article{shama2026dot,
  title   = {{Digraph of Thoughts: Scalable Distributed Reasoning with Large Language Models}},
  author  = {Shama, Ethan and Grant, Ryan E.},
  year    = 2026,
  note    = {Under review at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '26)}
}
@article{besta2024got,
  title = {{Graph of Thoughts: Solving Elaborate Problems with Large Language Models}},
  author = {Besta, Maciej and Blach, Nils and Kubicek, Ales and Gerstenberger, Robert and
            Gianinazzi, Lukas and Gajda, Joanna and Lehmann, Tomasz and Podstawski, Micha{\l}
            and Niewiadomski, Hubert and Nyczyk, Piotr and Hoefler, Torsten},
  year = 2024,
  month = {Mar},
  journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
  volume = 38,
  number = 16,
  pages = {17682--17690},
  publisher = {AAAI Press},
  doi = {10.1609/aaai.v38i16.29720},
  url = {https://ojs.aaai.org/index.php/AAAI/article/view/29720}
}

License

This project is licensed under the BSD-style license. See the LICENSE file for details.

About

A high-performance, distributed reasoning framework for LLMs utilizing asynchronous MPI and dynamic Directed Acyclic Graphs (DAGs)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors