🍼 About Me

I am a third-year Ph.D. student (硕博连读) at the School of Artificial Intelligence, Nanjing University. I am a member of the LAMDA Group (计算机软件新技术国家重点实验室), advised by Associate Professor Han-Jia Ye (叶翰嘉) and Professor De-Chuan Zhan (詹德川).

My research currently focuses on LLM RL Training and LLM Inference Routing.

🥟 Research & Publications

LLM RL Training

做了哪些事: 研究在 RL 过程中,如何构建全新的 Value Model,以及如何用 Off-policy Guidance 下稳定学习。
V0: A Generalist Value Model for Any Policy at State Zero
Yi-Kai Zhang, Zhiyuan Yao, Hongyan Hao, Yueqing Sun, Qi Gu, Hui Su, Xunliang Cai, De-Chuan Zhan, Han-Jia Ye
V0.5: Generalist Value Model as a Prior for Sparse RL Rollouts
Yi-Kai Zhang, Yueqing Sun, Hongyan Hao, Qi Gu, Xunliang Cai, De-Chuan Zhan, Han-Jia Ye
LongCat-Flash-Thinking-2601 (Contributor)
Spot Me: Bridging the Intention-Execution Gap with Expert-Guided Reinforcement Fine-tuning
Yi-Kai Zhang, Co-authors in SJTU, De-Chuan Zhan, Han-Jia Ye

LLM Inference Routing [Demo: http://lambda-router.org]

做了哪些事: 研究部署时,如何将指令路由到开源/闭源、小/大模型上,我们的工作贯穿了整个 Routing 的发展。
Let the LLM Stick to Its Strengths: Learning to Route Economical LLM
Yi-Kai Zhang, Shiyin Lu, Qing-Guo Chen, Weihua Luo, De-Chuan Zhan, Han-Jia Ye.
NeurIPS 2025
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing
Yi-Kai Zhang, De-Chuan Zhan, Han-Jia Ye.
AAAI 2025
Model Spider: Learning to Rank Pre-Trained Models Efficiently
Yi-Kai Zhang, Ting-Ji Huang, Yao-Xiang Ding, De-Chuan Zhan, Han-Jia Ye.
NeurIPS 2023 (Spotlight)

Other Related Applications

[Multimodal LLM Data Engine] ZooProbe: A Data Engine for Evaluating, Exploring, and Evolving Large-scale Training Data for Multimodal LLMs
Yi-Kai Zhang, Shiyin Lu, Qing-Guo Chen, De-Chuan Zhan, Han-Jia Ye.
ICLR 2025
[Multimodal LLM Architecture] Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang, Shiyin Lu, Yang Li, Yanqing Ma, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye.
NeurIPS 2024
[Stable Training in CV] Learning Debiased Representations via Conditional Attribute Interpolation
Yi-Kai Zhang, Qi-Wei Wang, Han-Jia Ye, De-Chuan Zhan.
CVPR 2023

🧩 Internship Experience

美团 - LongCat Life Agent Team
2025.10 - Present
阿里通义 - Agent Team
2025.08 - 2025.10
小米 - MiMo-Embodied Team
2025.05 - 2025.08
阿里国际 - Ovis Multimodal LLM Team
2024.03 - 2025.05

🍚 Education Background

Nanjing University, School of Artificial Intelligence
Ph.D. in Computer Science and Technology (Enrolled as Master in 2021)
2023.09 - Present
Nanjing University, Computer Science and Technology Department
B.Sc. in Computer Science and Technology (Minor in Math & Statistics)
2017.09 - 2021.07

🍰 Selected Awards

  • 国家奖学金 (National Scholarship) 2022

  • 南京大学优秀研究生标兵 2023

  • 挑战杯全国铜奖 - Team Leader 2023

  • 华为突出贡献奖 2023

  • 兴业银行、江苏银行等奖学金 Multiple Years

🍞 Service Work

  • 南京大学人工智能学院研究生会主席

  • 南京大学研究生常任代表

  • 学院乒乓球队队长



Github  |  Email