Manli Shu

About me

I’m a research scientist at Google Deepmind, where I work on Gemini multimodal understanding.

I got my Ph.D. in Computer Science from the University of Maryland, College Park (UMD) in 2024, advised by Tom Goldstein. My PhD research focuses on topics relevant to AI/ML safety and trustworthiness in both vision and language modalities.

During my years at UMD, I interned at Nvidia, Salesforce, and Google, collaborating with some amazing researchers. Before that, I did my undergrad at the University of Science and Technology of China (USTC) and graduated in 2019.

I’m always excited to explore new ideas and collaboration opportunities, so feel free to reach out!

[Google Scholar] [LinkedIn] [X]

News

[06/2025] Joined Google Deepmind, working on Gemini multimodal understanding.
[08/2024] Technical report is released: xGen-MM (BLIP-3): A Family of Open Large Multimodal, along with xGen-MM-v1.5 model release. Been driving the post-training and open-source effort in this project. [arXiv] [X Live with @AK], [🤗 Model card]
[05/2024] Released Salesforce’s multimodal large language models - xGen-MM, the enhanced continuation of our BLIP series. [Twitter], [🤗 Model card]
[05/2024] Gave an oral presentation at ICRA’24 on a previous internship project about 3D object detection. [paper] [slides]
[02/2024] Two preprints out (done at UMD). One studied data poisoning on vision-language models(arXiv), another explored adversarial attacks on LLMs(arXiv).
[01/2024] Joined Salesforce Research. Relocated to Palo Alto, CA.
[11/2023] Defended my Ph.D. dissertation 💐 👩🏻‍🎓
[09/2023] One paper accepted at NeurIPS. We studied a novel vulnerability of aligned language models from the perspective of data security. [paper][code]
[11/2022] In New Orleans attending NeurIPS. Presented the work done at Nvidia about prompt tuning for vision-language models. (Excited to attend my first in-person academic conference. I wish I had printed a bigger poster.)[paper]project page][code]

Selected Publications

For the complete list of publications, please refer to my google scholar page

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
L. Xue*, M. Shu*, A. Awadalla, J. Wang, A. Yan, S. Purushwalkam, H. Zhou, V. Prabhu, Y. Dai, M. S Ryoo, S. Kendre, J. Zhang, C. Qin, S. Zhang, C. Chen, N. Yu, J. Tan, T. Awalgaonkar, S. Heinecke, H. Wang, Y. Choi, L. Schmidt, Z. Chen, S. Savarese, J. C. Niebles, C. Xiong, R. Xu
ECCV 2024 Eval-FoMo workshop
[arXiv][🤗 Model card] [Code]
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
M. Shu, W. Nie, D.A. Huang, Z. Yu, T. Goldstein, A. Anandkumar, C. Xiao
NeurIPS 2022
[Paper] [Code] [Project page]
On the Exploitability of Instruction Tuning
M. Shu, J. Wang, C. Zhu, J. Geiping, C. Xiao, T. Goldstein
NeurIPS 2023
[arXiv] [Code]
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
A. Awadalla, L. Xue, O. Lo, M. Shu, H. Lee, E. Kumar Guha, M. Jordan, S. Shen, M. Awadalla, S. Savarese, C. Xiong, R. Xu, Y. Choi, L. Schmidt.
NeurIPS 2024
[arXiv][🤗 Dataset card]
On the Reliability of Watermarks for Large Language Models
J. Kirchenbauer*, J. Geiping*, Y. Wen, M. Shu, K. Saifullah, K. Kong, K. Fernando, A. Saha, M. Goldblum, T. Goldstein
ICLR 2024
[arXiv] [Code]
Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability
R. Levin*, M. Shu*, E. Borgnia*, F. Huang, M. Goldblum, T. Goldstein
NeurIPS 2022
[Paper] [Code]
The Close Relationship Between Contrastive Learning and Meta-Learning
R. Ni*, M. Shu*, H. Souri, M. Goldblum, T. Goldstein
ICLR 2022
[Paper] [Code]
Encoding Robustness to Image Style via Adversarial Feature Perturbation
M. Shu, Z. Wu, M. Goldblum, T. Goldstein
NeurIPS 2021
[Paper] [Code]
Adversarial Differentiable Data Augmentation for Autonomous Systems
M. Shu, Y. Shen, M.C. Lin, T. Goldstein
ICRA 2021
[Paper] [Code]
Model-Agnostic Hierarchical Attention for 3D Object Detection
M. Shu, L. Xue, R. Mart'in-Mart'in, C. Xiong, T. Goldstein, J.C. Niebles, R. Xu.
ICRA 2024
[arXiv] [Code]

Services

Conference reviewer: NeurIPS, ICML, ICLR, CVPR, ICCV, IROS
Journal reviewer: IJCV