Skip to content

Latest commit

 

History

History
127 lines (87 loc) · 6.29 KB

File metadata and controls

127 lines (87 loc) · 6.29 KB

markitdown

A personal command-line wrapper around Microsoft's markitdown that adds recursive folder conversion, skip-if-up-to-date caching, OpenAI-powered image OCR, and Whisper audio transcription, all behind a single mdc command.

TL;DR

  • What: convert documents to Markdown from the terminal — one file or a whole tree.
  • How: thin CLI over Microsoft markitdown, plus optional OpenAI calls for image description and audio transcription.
  • Stack: Python ≥3.11, markitdown[all], openai, python-dotenv; packaged with hatchling, installed with uv tool.
  • Run: mdc report.pdf (writes report.pdf.md), mdc ./docs (writes ./docs.md/ mirroring the tree).

Overview

mdc is a single-purpose disk utility for turning non-text files into Markdown. It delegates the actual parsing to Microsoft's markitdown library and layers on the ergonomics a CLI needs:

  • Single file or recursive tree. Point it at a file or a directory; directories are walked recursively and the output mirrors the source layout.
  • Skip-if-up-to-date. A target .md that is newer than its source is left alone unless --force is passed, so re-running over a large folder only reconverts what changed.
  • Batch-tolerant. In tree mode, a file that fails to convert is recorded as an error and the run continues; unsupported extensions are counted and skipped rather than aborting the batch.
  • Optional OpenAI features. --ocr wires markitdown to describe images via gpt-4o-mini; --audio transcribes audio files through the Whisper API (whisper-1). Plain conversions need no API key.

The companion markitdown Claude skill calls mdc automatically when Claude Code hits a file it can't read directly.

Tech stack

  • Python ≥3.11 (.python-version pins 3.11), stdlib argparse CLI.
  • markitdown[all] — the underlying conversion engine.
  • openai — image description (gpt-4o-mini) and audio transcription (whisper-1).
  • python-dotenv — reads OPENAI_API_KEY from a fixed .env path.
  • Packaging: hatchling build backend; the mdc script entry point is declared in pyproject.toml. Dev/test uses pytest via a uv dependency group.

Getting started

Install

Install globally as a uv tool from the project directory:

uv tool install /Users/ericbaruch/Arik/dev/dev-tools/markitdown
# or, from inside the repo:
uv tool install .

This puts an mdc command on your PATH.

Reinstall gotcha. uv tool install --force . reuses cached builds and can silently ship stale code. To guarantee a fresh build after local edits:

uv tool uninstall markitdown-cli && uv tool install --reinstall --no-cache .

API keys

--ocr and --audio both require OPENAI_API_KEY. The key is read from a hardcoded path/Users/ericbaruch/Arik/dev/.env (ENV_PATH in src/markitdown_cli/config.py) — and falls back to the process environment if that file is absent. Plain conversions need no key. If a key is missing for a feature you requested, mdc exits with a clear message and a non-zero status.

Usage

# Single file → writes report.pdf.md next to the source
mdc report.pdf

# Folder (recursive) → writes ./docs.md/ as a sibling, mirroring the tree
mdc ./docs

# Explicit output (file or directory, matching the input)
mdc report.pdf -o /tmp/out.md
mdc ./docs   -o /tmp/docs-md

# Force re-conversion even if the target is newer
mdc report.pdf --force

# Describe images via gpt-4o-mini (needs OPENAI_API_KEY)
mdc photo.png --ocr

# Transcribe audio via Whisper (needs OPENAI_API_KEY)
mdc recording.mp3 --audio

# One log line per file
mdc ./docs --verbose

Flags

Flag Description
input (positional) File or directory to convert (required).
-o, --output Output file (for a file input) or output directory (for a directory input). Defaults to <source>.md / <dir>.md/.
--force Re-convert even if the target .md exists and is newer than the source.
--ocr Use OpenAI gpt-4o-mini to describe images. Requires OPENAI_API_KEY.
--audio Use the OpenAI Whisper API (whisper-1) to transcribe audio. Requires OPENAI_API_KEY.
--verbose Print one log line per file (converted / skipped / unsupported / error).

Exit codes

  • 0 — success (single-file conversion, or a tree with no errors).
  • 1 — a tree conversion completed but one or more files errored (errors are listed in the summary).
  • 2 — input not found, or a requested feature is missing its OPENAI_API_KEY.

Supported formats

.pdf · .docx .pptx .xlsx .xls · .html .htm · .csv .json .xml · .epub .zip .msg · .txt .rtf .odt · .jpg .jpeg .png .gif .bmp .tiff · .mp3 .wav .m4a .flac

(Extension matching is case-insensitive; the canonical set lives in src/markitdown_cli/formats.py.)

Architecture

src/markitdown_cli/
  __main__.py   argparse CLI, summary printing, exit-code logic (entry point: mdc)
  convert.py    convert_file / convert_tree, ConvertSummary, build_markitdown, transcribe_audio
  formats.py    SUPPORTED/AUDIO/IMAGE extension sets + is_supported()
  config.py     ENV_PATH, OPENAI_API_KEY loading, OpenAI client, MissingKeyError
  • convert_file handles one file: it short-circuits when the target is newer (unless force), routes audio files to transcribe_audio when --audio is set, and otherwise calls markitdown.
  • convert_tree walks the source recursively, mirrors the structure under the output root, and aggregates results into a ConvertSummary (converted / skipped / unsupported / errors).
  • build_markitdown constructs a plain MarkItDown instance, or — with --ocr — one wired to an OpenAI client for LLM image description.

Status

Working and used day-to-day. A pytest suite lives in tests/ (test_convert.py, test_config.py) — run it with uv run pytest. Dependabot is configured for weekly dependency updates (.github/dependabot.yml).

This is a personal, macOS-oriented tool: the OPENAI_API_KEY path in config.py is hardcoded to the author's machine. Change ENV_PATH (or set OPENAI_API_KEY in your environment) if you adopt it elsewhere.

Uninstall

uv tool uninstall markitdown-cli