Skip to content
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a12e492
WIP: support mixed modality
CUHKSZzxy Apr 14, 2026
79b9953
fix mm processor kwargs, cleanup
CUHKSZzxy Apr 15, 2026
c12b02c
qwen3.5 mixed modality
CUHKSZzxy Apr 15, 2026
cd85dcf
interns1 pro mixed modality, fix kwargs
CUHKSZzxy Apr 15, 2026
dc8e388
fix generate, cleanup
CUHKSZzxy Apr 16, 2026
8d525fc
minor
CUHKSZzxy Apr 16, 2026
1e6a17d
Merge branch 'main' into mixed-modality
CUHKSZzxy Apr 16, 2026
20cd4ec
simplify
CUHKSZzxy Apr 16, 2026
71112f4
fix glm4.1v
CUHKSZzxy Apr 16, 2026
1994bfa
compatible with legacy preprocess, give up re-writing all ...
CUHKSZzxy Apr 16, 2026
f47d011
fix bugs
CUHKSZzxy Apr 16, 2026
1d40a76
minor
CUHKSZzxy Apr 16, 2026
b69fd8b
minor
CUHKSZzxy Apr 16, 2026
70d4178
minor
CUHKSZzxy Apr 16, 2026
006a8ca
fix ut
CUHKSZzxy Apr 17, 2026
7922cc3
fix qwen3vl moe
CUHKSZzxy Apr 17, 2026
2f992c9
allow modality-specific kwargs, add ut
CUHKSZzxy Apr 17, 2026
0787a8a
docs: add multi-modal input format reference (EN + ZH)
CUHKSZzxy Apr 17, 2026
8611115
docs: update video/audio URLs to official Qwen assets
CUHKSZzxy Apr 17, 2026
de923d7
docs: fix model name Qwen3.5-VL -> Qwen3.5
CUHKSZzxy Apr 17, 2026
52a46b7
fix: address PR #4531 review comments
CUHKSZzxy Apr 17, 2026
8844e26
Merge branch 'main' into mixed-modality
CUHKSZzxy Apr 17, 2026
3df1592
refactor: rename interns1_pro_ts.py to interns1_pro_time_series.py
CUHKSZzxy Apr 17, 2026
72de87b
docs: remove audio sections (not yet supported)
CUHKSZzxy Apr 22, 2026
a4ea4d7
refactor: extract preprocess helpers from VisionModel into preprocess…
CUHKSZzxy Apr 23, 2026
8ca4a89
refactor: move MultimodalSpecialTokens to constants.py, promote API d…
CUHKSZzxy Apr 24, 2026
cad03c5
minor
CUHKSZzxy Apr 24, 2026
765d352
minor
CUHKSZzxy Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/en/multi_modal/index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
Vision-Language Models
=================================

.. toctree::
:maxdepth: 2
:caption: Guides

multimodal_inputs.md

.. toctree::
:maxdepth: 2
:caption: Examples
Expand Down
Loading
Loading