About Me
I am a senior applied scientist at Amazon AGI (Artificial General Intelligence), working on foundation models and leading the video understanding thread.
Before to AGI, I was a senior applied scientist at Amazon Prime Video. Prior to that, I was the senior research scientist at ByteDance/TikTok and Amazon AWS AI from 2018 to 2022, leading video related research and products. I was also one of the major contributor to open-source tools GluonCV and tools GluonMM.
I received my Ph.D. Degree (2018) at Rutgers University supervised by Prof. Ivan Marsic. I received my Bachelor’s Degree (2013) at University of Electronic Science and Technology of China.
News
- [2024] Nova Models Checkout our Nova model family. Tech report
- [2024] WACV 2025 publications: “GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning”. Paper
- [2024] WACV 2025 publications: “Now You See Me: Context-Aware Automatic Audio Description”. Paper
- [2024] NeurIPS 2024 publication: “Video token merging for long-form video understanding”. Paper
- [2024] ECCV 2024 publication: “Text-Guided Video Masked Autoencoder”. Paper
- [2023] ICCV 2023 publication: “Motion-Guided Masking for Spatiotemporal Representation Learning”. Paper
- [2023] ICCV 2023 publication: “MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation”. Paper
- [2023] CVPR 2023 publication: “Revisiting multimodal representation in contrastive learning: from patch and token embeddings to finite discrete tokens”. Paper
- [2023] ICASSP 2023 publication: “CAT: Causal Audio Transformer for Audio Classification”. Paper
- [2023] ICLR 2023 publication: “Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos”. Paper
- [2023] WACV 2023 publication: “Discrete Cosin TransFormer: Image Modeling From Frequency Domain”. Paper
- [2022] CVPR 2022 (Oral) publication: “TubeR: Tubelet Transformer for Video Action Detection”. Paper
- [2022] CVPR 2022 publication: “Id-Free Person Similarity Learning”. Paper
- [2022] CVPR 2022 (Oral) publication: “What to Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions”. Paper
- [2022] CVPR 2022 (Oral) publication: “Temporal Gradient Dropout: A Memory Efficient Strategy for Training Video Models”. Paper
- [2022] WACV 2022 Two papers accepted by WACV 2022: NUTA and SSCAP.
- [2021] NeurIPS 2021 (Spotlight) “Long Short-Term Transformer for Online Action Detection”. Paper
- [2021] GluonMM is now available Link
- [2021] ICCV 2021 publication: “VidTr: Video Transformer Without Convolutions”. Paper
- [2021] ICCV 2021 publication: “Selective Feature Compression for Efficient Activity Recognition Inference”. Paper
- [2021] CVPR 2021 publication: “Multi-Label Activity Recognition using Activity-specific Features and Activity Correlations”. Paper
- [2021] CVPR 2021 publication: “SiamMOT: Siamese Multi-Object Tracking”. Paper