Skip to content

feat: polish mode — refine an existing figure with style-guided suggestions#247

Merged
dippatel1994 merged 3 commits into
mainfrom
feat/polish-mode
Jun 11, 2026
Merged

feat: polish mode — refine an existing figure with style-guided suggestions#247
dippatel1994 merged 3 commits into
mainfrom
feat/polish-mode

Conversation

@dippatel1994

Copy link
Copy Markdown
Member

Fixes #238 (rollback half shipped in #243; this is polish mode).

paperbanana polish --input figure.png — bring your own figure:

  1. Suggest: VLM audits the figure against the venue style guide (--venue, picks up guidelines synthesize output) and proposes ≤10 concrete improvements (robust parsing: numbered/bulleted/fenced; NO_SUGGESTIONS sentinel exits unchanged)
  2. Apply: true guided edit — GoogleImagenGen gained an optional images kwarg so Gemini edits the actual figure rather than regenerating from text. Providers without guided-edit support are rejected with an actionable error (capability detected by signature; contract documented in the base class). No silent fallbacks.

--iterations N repeats suggest→apply; --num-candidates fans out the apply step in parallel; budget guard + cost summary wired like generate.

Design note: the issue assumed the refinement loop already passed images to image-gen — it doesn't (the loop conditions on images only via the Critic's VLM call), hence the small additive provider extension. 2K/4K upscaling is out of scope (provider-dependent, tracked separately if wanted).

26 new tests; suite at 853 passing.

dippatel1994 and others added 3 commits June 11, 2026 16:05
…stions

Adds `paperbanana polish --input figure.png`: a two-step flow where a VLM
audits the user-supplied figure against the venue style guide (--venue,
neurips default) and produces up to 10 concrete, actionable suggestions,
which are then applied to the original figure as a guided image edit
(the figure and the numbered suggestions both go to the image provider).

- PolishAgent (paperbanana/agents/polish.py): suggest() VLM step with
  robust list parsing (numbered/bulleted/fenced, NO_SUGGESTIONS sentinel,
  capped at 10) and apply() guided-edit step; prompts in prompts/polish/.
- Guided edits: GoogleImagenGen.generate gains an optional images kwarg
  (image-conditioned generation); callers detect support by signature.
  Providers without it are rejected with a clear error.
- CLI: --input (validated as a readable image), --venue, --output,
  --iterations (repeat suggest→apply on the result), --aspect-ratio,
  provider/model/budget/seed flags consistent with generate; suggestions
  printed to the console; cost tracked and reported with budget guard.
- Multi-candidate: --num-candidates fans the apply step out in parallel
  with per-candidate output dirs; first successful candidate is primary.

Out of scope: 2K/4K upscaling (depends on provider support; separate
concern).

Fixes #238
@dippatel1994 dippatel1994 merged commit 41b8bfe into main Jun 11, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Polish mode — refine an existing figure + critic rollback

1 participant