publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2026

  1. uniwetok.png
    UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^128 for Unified Multimodal Large Language Model
    Shaobin Zhuang*, Yuang Ai*, Jiaming Han*, and 8 more authors
    arXiv preprint arXiv:2602.14178, 2026
  2. bitdance.png
    BitDance: Scaling Autoregressive Generative Models with Binary Tokens
    Yuang Ai*, Jiaming Han*, Shaobin Zhuang*, and 7 more authors
    arXiv preprint arXiv:2602.14041, 2026
  3. video_gpt.png
    Video-GPT via Next Clip Diffusion
    Shaobin Zhuang, Zhipeng Huang, Ying Zhang, and 6 more authors
    ICLR, 2026
  4. wetok.png
    Wetok: Powerful discrete tokenization for high-fidelity visual reconstruction
    Shaobin Zhuang, Yiwei Guo, Canmiao Fu, and 6 more authors
    ICLR, 2026
  5. linearsr.png
    LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
    Xiaohui Li, Shaobin Zhuang, Shuo Cao, and 6 more authors
    ICLR, 2026

2025

  1. get_in_video.png
    Get in video: Add anything you want to the video
    Shaobin Zhuang*, Zhipeng Huang*, Binxin Yang, and 7 more authors
    arXiv preprint arXiv:2503.06268, 2025
  2. timestep_master.png
    TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
    Shaobin Zhuang*, Yiwei Guo*, Yanbo Ding*, and 7 more authors
    ICML, 2025
  3. wegen.png
    WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
    Zhipeng Huang*, Shaobin Zhuang*, Canmiao Fu, and 7 more authors
    CVPR, 2025
  4. v_stylist.png
    V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
    Zhengrong Yue, Shaobin Zhuang, Kunchang Li, and 2 more authors
    CVPR, 2025

2024

  1. muses.png
    MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
    Yanbo Ding, Shaobin Zhuang, Kunchang Li, and 3 more authors
    AAAI, 2024
  2. transagent.png
    TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
    Yiwei Guo, Shaobin Zhuang, Kunchang Li, and 2 more authors
    NIPS, 2024
  3. vlogger.png
    Vlogger: Make Your Dream A Vlog
    Shaobin Zhuang, Kunchang Li, Xinyuan Chen, and 4 more authors
    CVPR, 2024
  4. seine.png
    Seine: Short-to-long video diffusion model for generative transition and prediction
    Xinyuan Chen*, Yaohui Wang*, Lingjun Zhang, and 7 more authors
    In ICLR, 2024