About Me

I am a senior applied scientist at Amazon AGI, working on foundation models for video understanding.

Before AGI, I was a Senior research scientist at ByteDance/TikTok and Senior Applied Scientist at Amazon AI, leading video related research and products. I am also one of the major contributor to open-source tools GluonCV and tools GluonMM.

I received my Ph.D. Degree (2018) at Rutgers University supervised by Prof. Ivan Marsic. I received my Bachelor’s Degree (2013) at University of Electronic Science and Technology of China.


  • [2023] ICCV 2023 publication: “Motion-Guided Masking for Spatiotemporal Representation Learning”. Paper
  • [2023] ICCV 2023 publication: “MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation”. Paper
  • [2023] CVPR 2023 publication: “Revisiting multimodal representation in contrastive learning: from patch and token embeddings to finite discrete tokens”. Paper
  • [2023] ICASSP 2023 publication: “CAT: Causal Audio Transformer for Audio Classification”. Paper
  • [2023] ICLR 2023 publication: “Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos”. Paper
  • [2023] WACV 2023 publication: “Discrete Cosin TransFormer: Image Modeling From Frequency Domain”. Paper
  • [2022] CVPR 2022 (Oral) publication: “TubeR: Tubelet Transformer for Video Action Detection”. Paper
  • [2022] CVPR 2022 publication: “Id-Free Person Similarity Learning”. Paper
  • [2022] CVPR 2022 (Oral) publication: “What to Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions”. Paper
  • [2022] CVPR 2022 (Oral) publication: “Temporal Gradient Dropout: A Memory Efficient Strategy for Training Video Models”. Paper
  • [2022] WACV 2022 Two papers accepted by WACV 2022: NUTA and SSCAP.
  • [2021] NeurIPS 2021 (Spotlight) “Long Short-Term Transformer for Online Action Detection”. Paper
  • [2021] GluonMM is now available Link
  • [2021] ICCV 2021 publication: “VidTr: Video Transformer Without Convolutions”. Paper
  • [2021] ICCV 2021 publication: “Selective Feature Compression for Efficient Activity Recognition Inference”. Paper
  • [2021] CVPR 2021 publication: “Multi-Label Activity Recognition using Activity-specific Features and Activity Correlations”. Paper
  • [2021] CVPR 2021 publication: “SiamMOT: Siamese Multi-Object Tracking”. Paper