Xuandong Zhao

E-mail: csxuandongzhao at gmail
Address: Goleta, CA 93106, USA

GitHub  /  LinkedIn  /  Twitter  /  Google Scholar

About


I am currently a Postdoctoral Researcher at UC Berkeley as part of the RDI and BAIR, working with Prof. Dawn Song. I earned my PhD in Computer Science from UC Santa Barbara, where I was advised by Prof. Yu-Xiang Wang and Prof. Lei Li. Prior to that, I graduated with a Bachelor's degree in Computer Science from Zhejiang University. I have also interned at leading tech companies including Alibaba, Microsoft and Google.

My current research interests lie in Machine Learning, Natural Language Processing, and AI Safety, with a particular focus on Responsible and Reliable Generative AI. I am always open to collaborations. If you share similar interests or see potential synergies, please feel free to reach out via email!

Selected Research


Improving LLM Safety Alignment with Dual-Objective Optimization [Paper] [Code] [HuggingFace]
Xuandong Zhao*, Will Cai*, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song
arXiv, 2025

Scalable Best-of-N Selection for Large Language Models via Self-Certainty [Paper] [Code]
Zhewei Kang*, Xuandong Zhao*, Dawn Song
arXiv, 2025

Reward Shaping to Mitigate Reward Hacking in RLHF [Paper] [Code]
Jiayi Fu*, Xuandong Zhao*, Chengyuan Yao, Heng Wang, Qi Han, Yanghua Xiao
arXiv, 2025

SoK: Watermarking for AI-Generated Content [Paper]
Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, Somesh Jha, Lei Li, Yu-Xiang Wang, Dawn Song
Proceedings of IEEE S&P 2025

An Undetectable Watermark for Generative Image Models [Paper] [Code]
Sam Gunn*, Xuandong Zhao*, Dawn Song
Proceedings of ICLR 2025

Permute-and-Flip: An Optimally Stable and Watermarkable Decoder for LLMs [Paper] [Code] [Slides]
Xuandong Zhao, Lei Li, Yu-Xiang Wang
Proceedings of ICLR 2025

Invisible Image Watermarks Are Provably Removable Using Generative AI [Paper] [Code] [Video] [Media]
Xuandong Zhao*, Kexun Zhang*, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu-Xiang Wang, Lei Li
Proceedings of NeurIPS 2024

Weak-to-Strong Jailbreaking on Large Language Models [Paper] [Code]
Xuandong Zhao*, Xianjun Yang*, Tianyu Pang, Chao Du, Lei Li, Yu-Xiang Wang, William Yang Wang
ICML 2024 the Next Generation of AI Safety Workshop

Provable Robust Watermarking for AI-Generated Text [Paper] [Code] [Video] [Demo]
Xuandong Zhao, Prabhanjan Ananth, Lei Li, Yu-Xiang Wang
Proceedings of ICLR 2024

Protecting Language Generation Models via Invisible Watermarking [Paper] [Code]
Xuandong Zhao, Yu-Xiang Wang, Lei Li
Proceedings of ICML 2023

Pre-trained Language Models Can be Fully Zero-Shot Learners [Paper] [Code] [Video] [Slides]
Xuandong Zhao, Siqi Ouyang, Zhiguo Yu, Ming Wu, Lei Li
Proceedings of ACL 2023, Oral

Provably Confidential Language Modelling [Paper] [Code] [Video]
Xuandong Zhao, Lei Li, Yu-Xiang Wang
Proceedings of NAACL 2022, Oral

All Research


Improving LLM Safety Alignment with Dual-Objective Optimization [Paper] [Code] [HuggingFace]
Xuandong Zhao*, Will Cai*, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song
arXiv, 2025
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [Paper] [Code]
Zhewei Kang*, Xuandong Zhao*, Dawn Song
arXiv, 2025
Reward Shaping to Mitigate Reward Hacking in RLHF [Paper] [Code]
Jiayi Fu*, Xuandong Zhao*, Chengyuan Yao, Heng Wang, Qi Han, Yanghua Xiao
arXiv, 2025
DIS-CO: Discovering Copyrighted Content in VLMs Training Data [Paper] [Code] [Website] [Dataset]
André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li
arXiv, 2025
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 [Paper]
Kaiwen Zhou, Chengzhi Liu, Xuandong Zhao, Shreedhar Jangam, Jayanth Srinivasa, Gaowen Liu, Dawn Song, Xin Eric Wang
arXiv, 2025
Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs [Paper]
Yepeng Liu, Xuandong Zhao, Dawn Song, Yuheng Bu
arXiv, 2025
SoK: Watermarking for AI-Generated Content [Paper]
Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, Somesh Jha, Lei Li, Yu-Xiang Wang, Dawn Song
Proceedings of IEEE S&P 2025
An Undetectable Watermark for Generative Image Models [Paper] [Code]
Sam Gunn*, Xuandong Zhao*, Dawn Song
Proceedings of ICLR 2025
Permute-and-Flip: An Optimally Stable and Watermarkable Decoder for LLMs [Paper] [Code] [Slides]
Xuandong Zhao, Lei Li, Yu-Xiang Wang
Proceedings of ICLR 2025
Multimodal Situational Safety [Paper] [Code] [Website] [Dataset]
Kaiwen Zhou*, Chengzhi Liu*, Xuandong Zhao, Anderson Compalas, Dawn Song, Xin Eric Wang
Proceedings of ICLR 2025; NeurIPS 2024 RBFM Workshop, Oral
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models [Paper] [Code] [Website]
Chejian Xu*, Jiawei Zhang*, Zhaorun Chen*, Chulin Xie*, Mintong Kang*, Yujin Potter*, Zhun Wang*, Zhuowen Yuan*, Alexander Xiong, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Ziwei Tan, Xuandong Zhao, Francesco Pinto, Zhen Xiang, Yu Gai, Zinan Lin, Dan Hendrycks, Bo Li†, Dawn Song†
Proceedings of ICLR 2025
A Practical Examination of AI-Generated Text Detectors for Large Language Models [Paper] [Code]
Brian Tufts, Xuandong Zhao, Lei Li
Findings of NAACL 2025
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification [Paper] [Code]
Yuchen Tian*, Weixiang Yan*, Qian Yang, Xuandong Zhao, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma, Dawn Song
Proceedings of AAAI 2025

Empowering Responsible Use of Large Language Models [Paper]
Xuandong Zhao
PhD Dissertation, 2024
PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage [Paper] [Code]
Yuzhou Nie, Zhun Wang, Ye Yu, Xian Wu, Xuandong Zhao, Wenbo Guo, Dawn Song
arXiv, 2024
Efficiently Identifying Watermarked Segments in Mixed-Source Texts [Paper]
Xuandong Zhao*, Chenwen Liao*, Yu-Xiang Wang, Lei Li
NeurIPS 2024 Safe Generative AI Workshop
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World [Paper] [Code]
Weixiang Yan, Haitian Liu, Tengxiao Wu, Qian Chen, Wen Wang, Haoyuan Chai, Jiayi Wang, Weishan Zhao, Yixin Zhang, Renjun Zhang, Li Zhu, Xuandong Zhao
arXiv, 2024
Evaluating Durability: Benchmark Insights into Image and Text Watermarking [Paper] [Code] [Website]
Jielin Qiu*, William Han*, Xuandong Zhao, Shangbang Long, Christos Faloutsos, Lei Li
Journal of DMLR 2024
Watermarking for Large Language Model [Paper] [Website] [Video]
Xuandong Zhao, Yu-Xiang Wang, Lei Li
Tutorials of NeurIPS 2024, Tutorials of ACL 2024
Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature [Paper] [Code]
Tong Zhou, Xuandong Zhao, Xiaolin Xu, Shaolei Ren
Proceedings of NeurIPS 2024
Invisible Image Watermarks Are Provably Removable Using Generative AI [Paper] [Code] [Video] [Media]
Xuandong Zhao*, Kexun Zhang*, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu-Xiang Wang, Lei Li
Proceedings of NeurIPS 2024
Erasing the Invisible: A Stress-Test Challenge for Image Watermarks [Paper] [Website]
Mucong Ding*, Tahseen Rabbani*, Bang An*, Souradip Chakraborty, Chenghao Deng, Mehrdad Saberi, Yuxin Wen, Xuandong Zhao, Mo Zhou, Anirudh Satheesh, Mary-Anne Hartley, Lei Li, Yu-Xiang Wang, Vishal M. Patel, Soheil Feizi, Tom Goldstein, Furong Huang
Competitions of NeurIPS 2024
MarkLLM: An Open-Source Toolkit for LLM Watermarking [Paper] [Code] [Colab]
Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, Irwin King, Philip S. Yu
System Demonstrations of EMNLP 2024
A Survey on Detection of LLMs-Generated Content [Paper] [Code]
Xianjun Yang, Liangming Pan, Xuandong Zhao, Haifeng Chen, Linda Petzold, William Yang Wang, Wei Cheng
Findings of EMNLP 2024
Mapping the Increasing Use of LLMs in Scientific Papers [Paper] [Code]
Weixin Liang*, Yaohui Zhang*, Zhengxuan Wu*, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou
Proceedings of COLM 2024
Weak-to-Strong Jailbreaking on Large Language Models [Paper] [Code]
Xuandong Zhao*, Xianjun Yang*, Tianyu Pang, Chao Du, Lei Li, Yu-Xiang Wang, William Yang Wang
ICML 2024 the Next Generation of AI Safety Workshop
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews [Paper] [Code]
Weixin Liang*, Zachary Izzo*, Yaohui Zhang*, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, James Y. Zou
Proceedings of ICML 2024, Oral; Best Presentation Runner-up Award at ICSSI 2024
DE-COP: Detecting Copyrighted Content in Language Models Training Data [Paper] [Code]
André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li
Proceedings of ICML 2024; Best Scientific Paper Award at Portuguese Responsible AI Forum
GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick [Paper] [Code]
Jiayi Fu, Xuandong Zhao, Ruihan Yang, Yuansen Zhang, Jiangjie Chen, Yanghua Xiao
Proceedings of ACL 2024
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement [Paper] [Code]
Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei Li, William Yang Wang
Proceedings of ACL 2024, Oral
Provable Robust Watermarking for AI-Generated Text [Paper] [Code] [Video] [Demo]
Xuandong Zhao, Prabhanjan Ananth, Lei Li, Yu-Xiang Wang
Proceedings of ICLR 2024

Private Prediction Strikes Back! Private Kernelized Nearest Neighbors with Individual Renyi Filter [Paper] [Code]
Yuqing Zhu, Xuandong Zhao, Chuan Guo, Yu-Xiang Wang
Proceedings of UAI 2023, Spotlight
Protecting Language Generation Models via Invisible Watermarking [Paper] [Code]
Xuandong Zhao, Yu-Xiang Wang, Lei Li
Proceedings of ICML 2023
Pre-trained Language Models Can be Fully Zero-Shot Learners [Paper] [Code] [Video] [Slides]
Xuandong Zhao, Siqi Ouyang, Zhiguo Yu, Ming Wu, Lei Li
Proceedings of ACL 2023, Oral

Distillation-Resistant Watermarking for Model Protection in NLP [Paper] [Code] [Video] [Blog]
Xuandong Zhao, Lei Li, Yu-Xiang Wang
Findings of EMNLP 2022
Provably Confidential Language Modelling [Paper] [Code] [Video]
Xuandong Zhao, Lei Li, Yu-Xiang Wang
Proceedings of NAACL 2022, Oral
Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation [Paper] [Code] [Video] [Poster]
Xuandong Zhao, Zhiguo Yu, Ming Wu, Lei Li
Findings of ACL 2022

An Optimal Reduction of TV-Denoising to Adaptive Online Learning [Paper] [Code]
Dheeraj Baby, Xuandong Zhao, Yu-Xiang Wang
Proceedings of AISTATS 2021
A Multi-Semantic Metapath Model for Large Scale Heterogeneous Network Representation Learning [Paper] [Code]
Xuandong Zhao, Jinbao Xue, Jin Yu, Xi Li, Hongxia Yang
arXiv, 2020
Predicting Alzheimer's Disease by Hierarchical Graph Convolution from Positron Emission Tomography Imaging [Paper] [Code]
Jiaming Guo*, Wei Qiu*, Xiang Li*, Xuandong Zhao, Ning Guo, Quanzheng Li
Proceedings of Big Data 2019
Multi-size Computer-aided Diagnosis of Positron Emission Tomography Images Using Graph Convolutional Networks [Paper] [Code]
Xuandong Zhao*, Xiang Li*, Ning Guo, Zhiling Zhou, Xiaxia Meng, Quanzheng Li
Proceedings of ISBI 2019

Education


UC Santa Barbara, USA

Ph.D. in Computer Science • Sept. 2019 - June 2024

Zhejiang University, China

B.E. in Computer Science • Sept. 2015 - June 2019, GPA: 3.96/4.00

Selected Honors & Awards


Rising Star in AI, KAUST, 2025

Rising Star in Adversarial Machine Learning, AdvML Workshop, 2024

Chancellor's Fellowship, UC Santa Barbara, 2019, 2021, 2023

He Zhijun Scholarship (Highest honor in ZJU CS department), 2019

Alibaba-Zhejiang News Scholarship, 2018

National Scholarship (Top 0.2% Nationwide), 2016

First Prize in Chinese Physics Olympiad (CPhO; Top 0.1% in Shanxi Province, China), 2014


Selected Recent Talks



Last update: Feb. 2025