Shaobin Zhuang

Email: hahahahaha@sjtu.edu.cn

Hi! I am a third-year PhD student of Shanghai Jiao Tong University advised by Prof. Yali Wang. Before that, I received my Bachelor degree from Xiamen University. I also interned at Shanghai AI Lab, Tencent WeChat and Bytedance TikTok.

I have board interest in computer vision, natural language processing and deep learning. Recently, my research mainly focuses on interective video generation and unified generation and understanding model.

I expect to graduate in Summer 2028 and am actively seeking intern opportunities. Feel free to contact me via Email: hahahahaha@sjtu.edu.cn and Wechat:15260330756.

selected publications

UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^128 for Unified Multimodal Large Language Model

Shaobin Zhuang^*, Yuang Ai^*, Jiaming Han^*, and 8 more authors

arXiv preprint arXiv:2602.14178, 2026

arXiv
BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Yuang Ai^*, Jiaming Han^*, Shaobin Zhuang^*, and 7 more authors

arXiv preprint arXiv:2602.14041, 2026

arXiv
Video-GPT via Next Clip Diffusion

Shaobin Zhuang, Zhipeng Huang, Ying Zhang, and 6 more authors

ICLR, 2026

arXiv
Wetok: Powerful discrete tokenization for high-fidelity visual reconstruction

Shaobin Zhuang, Yiwei Guo, Canmiao Fu, and 6 more authors

ICLR, 2026

arXiv
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution

Xiaohui Li, Shaobin Zhuang, Shuo Cao, and 6 more authors

ICLR, 2026

arXiv
Get in video: Add anything you want to the video

Shaobin Zhuang^*, Zhipeng Huang^*, Binxin Yang, and 7 more authors

arXiv preprint arXiv:2503.06268, 2025

arXiv
TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision

Shaobin Zhuang^*, Yiwei Guo^*, Yanbo Ding^*, and 7 more authors

ICML, 2025

arXiv
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat

Zhipeng Huang^*, Shaobin Zhuang^*, Canmiao Fu, and 7 more authors

CVPR, 2025

arXiv
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents

Zhengrong Yue, Shaobin Zhuang, Kunchang Li, and 2 more authors

CVPR, 2025

arXiv
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

Yanbo Ding, Shaobin Zhuang, Kunchang Li, and 3 more authors

AAAI, 2024

arXiv
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

Yiwei Guo, Shaobin Zhuang, Kunchang Li, and 2 more authors

NIPS, 2024

arXiv
Vlogger: Make Your Dream A Vlog

Shaobin Zhuang, Kunchang Li, Xinyuan Chen, and 4 more authors

CVPR, 2024

arXiv
Seine: Short-to-long video diffusion model for generative transition and prediction

Xinyuan Chen^*, Yaohui Wang^*, Lingjun Zhang, and 7 more authors

In ICLR, 2024

arXiv

selected publications

Visitors