About
I am a Member of Technical Staff at Physical Intelligence, where I build vision–language–action (VLA) and omni models — pushing multimodal foundation models beyond perception and reasoning toward action in the physical world.
Before Physical Intelligence, I was a Principal Applied Scientist at Amazon AGI, leading multimodal understanding for the Nova model family. Earlier, I was a Staff Research Scientist at ByteDance and a Senior Applied Scientist at AWS AI, where I shipped multimodal and video models in production.
I received my Ph.D. from Rutgers University in 2018 and my B.Eng. from the University of Electronic Science and Technology of China in 2013.
My research centers on multimodal understanding across domains, with a deep focus on video understanding and a strong bias toward real-world impact.
Recent Highlights
- Apr 2026 · π0.7 — A steerable generalist robot foundation model with emergent compositional capabilities across dexterous manipulation tasks and robot platforms. Blog
- CVPR 2026 · STORM — End-to-end referring multi-object tracking with a unified MLLM for grounding and tracking. Paper
- WACV 2026 — Compact video representations for efficient long-form video understanding in large multimodal models. Paper
- Nova 2 — Multimodal reasoning and generation foundation models. Technical Report
- Nova Multimodal Embedding — Multimodal embeddings for agentic RAG and semantic search across video, image, document, and audio. Technical Report
- Nova 1 / Nova 1 Premier — Amazon’s first-generation multimodal foundation models. Nova 1 · Nova 1 Premier
- ICCV 2025 · SemiVisBooster — Semi-supervised learning with text guidance for fine-grained classification. Paper
- WACV 2025 · GEXIA — Multi-grained video–language learning at scale. Paper
- WACV 2025 · Context-Aware AD — Audio descriptions with in-video contextual awareness. Paper
- NeurIPS 2024 · Video Token Merging — Efficient token reduction for long-form video understanding. Paper
- ECCV 2024 · Text-Guided Video MAE — Masked video pretraining guided by language supervision. Paper
See the publications page for the full list, or my Google Scholar.