Digraph of Thoughts (DoT)

DoT is a high-performance distributed reasoning framework for Large Language Models. It extends the Graph of Thoughts paradigm into a directed acyclic graph (DAG) structure, enabling LLMs to solve complex multi-step problems by exploring parallel reasoning paths, merging insights across branches, and dynamically managing the search budget through a vectorized control policy.

DoT is architected for distributed execution at scale: it uses MPI (via mpi4py) for the hierarchical reasoning orchestration layer and Ray Serve with vLLM for high-throughput, GPU-accelerated model inference.

Key Achievements

DoT bridges the gap between algorithmic reasoning and bare-metal HPC orchestration, shifting the execution bottleneck from synchronous CPU logic strictly back to physical accelerators.

Massive Throughput & Cost Reduction: Bare-metal evaluations demonstrate up to a 122.5× speedup over sequential Structured Reasoning Traversals (SRTs). Extrapolated to a 10,000-task heuristic search, DoT compresses the workload makespan from 116 days to 15.5 hours, reducing estimated cloud compute costs by 99%.
Elimination of Parallel Starvation: By decoupling state synchronization and deterministic tool execution from the tensor backend, DoT's asynchronous control plane hides high-latency operations behind continuous GPU compute, sustaining 94.0% median GPU pressure.
Deterministic Agentic Offloading: Requires all deterministic operations and math to be offloaded to an external Python sandbox (CodeSandbox). This prevents context-window saturation and hallucination loops, allowing the LLM to act purely as a stateless router for latent synthesis.
Scale-Out MPI Orchestration: Proven scalability across multi-node, multi-GPU clusters. Dual-payload I/O isolation cuts dispatch bandwidth by 87.5%, while strict tail-latency preemption sustains 67.5% strong-scaling and 77.0% weak-scaling makespan efficiency on 16-node multi-hop topologies.

Documentation

System Architecture Overview — The three-tier MPI hierarchy (GlobalOrchestrator → NodeCoordinator → MPIWorker), the Ray Serve gRPC serving layer, and end-to-end task lifecycle.
Performance & Throughput — Asynchronous pipelining, sequence-parallel decoding, KV-cache affinity routing, and NUMA topology binding.
Accuracy & Reasoning — The 5×5 vectorized controller, speculative ghost nodes, DAG convergence, and deterministic code sandbox verification.
Benchmarking & Profiling — Slurm dispatcher workflows, NUMA binding modes, Python cProfile analysis, and PyTorch vLLM tracing.
Configuration Reference — A complete breakdown of all config.yaml parameters, default values, and operational behaviors for tuning the framework.

Getting Started

Prerequisites

Python 3.10+
CUDA-compatible GPUs (for vLLM inference)
An MPI implementation (e.g., OpenMPI 4.x+)

Installation

# Clone the repository
git clone https://github.com/eshama1/digraph-of-thoughts.git
cd digraph-of-thoughts

# Run the automated setup script (creates ENV/, installs all dependencies)
bash scripts/utils/setup_env.sh

# Activate the environment
source ENV/bin/activate

Tip

If you already have a pre-configured Python environment (e.g., a shared conda or venv from a different project), you can symlink it to ENV/ to allow the project scripts to access it directly: ln -s /path/to/your/env ENV.

Model Weights

Download model weights to the project root (e.g., ./Llama-3.3-70B-Instruct). The dispatcher uses rsync to synchronize these to $SCRATCH (node-local fast storage) at job startup to reduce multi-node cold-boot time.

Important

Large models (70B+) must be stored in the project root. The dispatcher automatically detects and syncs them. Do not place weights in results/ or ENV/.

Running Locally

For local debugging and small-scale verification, use the lightweight run_dot.py entry point. It runs the full DoT algorithm using a single-process subprocess sandbox — no MPI or Ray cluster required:

python scripts/core/run_dot.py --task sorting --test 032

For distributed execution across a Slurm cluster, see the Benchmarking & Profiling docs.

Repository Structure

digraph_of_thoughts/        # Core package
  algorithm/                # Reasoning algorithm (Scheduler, Controller, Graph, Sandbox)
    core/                   # AsyncPipelinedScheduler, DoTController, PromptManager
    graph/                  # DoTGraph (DAG) and DoTNode state machine
    agentic/                # CodeSandbox, RayExecutor, LocalExecutor
    utils/                  # Config, serialization, XML parsing, logging
  orchestration/            # Distributed execution layer
    core.py                 # DoTEngine — high-level entry point for single/multi-task runs
    mpi/                    # Three-tier MPI hierarchy
      orchestrator.py       # GlobalOrchestrator (Rank 0) — task dispatch and hedging
      coordinator.py        # NodeCoordinator (local Rank 0) — per-node worker pool
      worker.py             # MPIWorker — per-rank reasoning execution
      common.py             # Shared MPI tags, serialization, and persistence helpers
  model_serving/            # vLLM on Ray Serve
    deployment.py           # Ray Serve deployment lifecycle
    vllm_service.py         # gRPC streaming service with metrics collection
    vllm_client.py          # Async gRPC client (used by MPI workers)
    local_service.py        # Single-GPU AsyncLLMEngine backend (no Ray)
    openai_service.py       # OpenAI API compatibility layer
    mock_service.py         # CPU-only synthetic backend for control-plane profiling

scripts/
  core/                     # Primary execution entry points
    run_dot.py              # Lightweight local runner (subprocess sandbox)
    run_distributed_dot.py  # Full MPI+Ray distributed runner
  orchestration/            # Cluster lifecycle management
    deploy_model.sh         # Deploy vLLM on Ray Serve
    wait_for_model.py       # Poll until Ray Serve reaches RUNNING state
    wait_for_workers.py     # Poll until all MPI workers are ready
    shutdown_deployment.py  # Gracefully terminate Ray Serve and flush metrics
  analysis/                 # Post-run result processing and visualization
  benchmarking/             # Sweep scripts, ablations, microbenchmarks, scaling batches
  utils/                    # Setup, proto generation, config override helpers

examples/                   # Task-specific implementations (Sorting, BBEH, etc.)
docs/                       # Extended architectural documentation
benchmark_dispatcher.py     # Root-level Slurm dispatch orchestrator
default_config.yaml         # Default algorithm and serving configuration

Acknowledgments

This project uses evaluation datasets from the Graph of Thoughts (GoT) framework. We thank the authors for open-access benchmarks.

The baseline comparison scripts (IO, CoT, ToT, GoT) in scripts/benchmarking/GoT_test_scripts/ are adapted from the original GoT implementations.

Citation

If you use this framework or the included GoT-based examples in your research, please cite both our work and the original GoT paper:

@article{shama2026dot,
  title   = {{Digraph of Thoughts: Scalable Distributed Reasoning with Large Language Models}},
  author  = {Shama, Ethan and Grant, Ryan E.},
  year    = 2026,
  note    = {Under review at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '26)}
}

@article{besta2024got,
  title = {{Graph of Thoughts: Solving Elaborate Problems with Large Language Models}},
  author = {Besta, Maciej and Blach, Nils and Kubicek, Ales and Gerstenberger, Robert and
            Gianinazzi, Lukas and Gajda, Joanna and Lehmann, Tomasz and Podstawski, Micha{\l}
            and Niewiadomski, Hubert and Nyczyk, Piotr and Hoefler, Torsten},
  year = 2024,
  month = {Mar},
  journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
  volume = 38,
  number = 16,
  pages = {17682--17690},
  publisher = {AAAI Press},
  doi = {10.1609/aaai.v38i16.29720},
  url = {https://ojs.aaai.org/index.php/AAAI/article/view/29720}
}

License

This project is licensed under the BSD-style license. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
digraph_of_thoughts		digraph_of_thoughts
docs		docs
examples		examples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseline_config.yaml		baseline_config.yaml
benchmark_dispatcher.py		benchmark_dispatcher.py
default_config.yaml		default_config.yaml
paper_requirements.txt		paper_requirements.txt
requirements.txt		requirements.txt
submit_benchmark.slurm		submit_benchmark.slurm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Digraph of Thoughts (DoT)

Key Achievements

Documentation

Getting Started

Prerequisites

Installation

Model Weights

Running Locally

Repository Structure

Acknowledgments

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Digraph of Thoughts (DoT)

Key Achievements

Documentation

Getting Started

Prerequisites

Installation

Model Weights

Running Locally

Repository Structure

Acknowledgments

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages