* denotes equal contribution
Text-Guided Video Masked Autoencoder
David Fan, Jue Wang, Shuai Liao, Zhikang Zhang, Vimal Bhat, Xinyu Li.
ECCV 2024 Paper Link
Motion-Guided Masking for Spatiotemporal Representation Learning
David Fan, Jue Wang, Leo Liao, Yi Zhu, Vimal Bhat, Hector Santos, Xinyu Li.
ICCV 2023 Paper Link
MEGA: Multimodal alignment aggregation and distillation for cinematic video segmentation
Najmeh Sadoughi, Xinyu Li Avijit Vajpayee, David Fan, Bing Shuai, Hector Santos-Villalobos, Vimal Bhat.
ICCV 2023 Paper Link
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Yuxiao Chen, Jianbo Yuan, Yu Tian, Shijie Geng, Xinyu Li, Ding Zhou, Dimitris N. Metaxas, Hongxia Yang.
CVPR 2023 Paper Link
CAT: Causal Audio Transformer for Audio Classification
Xiaoyu Liu, Hanlin Lu, Jianbo Yuan, Xinyu Li.
ICASSP 2023 Paper Link
Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos
David Fan, Deyu Yang, Xinyu Li, Vimal Bhat, Rohith MV.
ICLR2023 Workshop Paper Link
Discrete Cosin TransFormer: Image Modeling From Frequency Domain
Xinyu Li*, Yanyi Zhang, Jianbo Yuan, Hanlin Lu, Yibo Zhu.
WACV23 Paper Link
TubeR: Tubelet Transformer for Video Action Detection
Jiaojiao Zhao*, Yanyi Zhang*, Xinyu Li*,Hao Chen, Bing Shuai, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Ivan Marsic, Cees G.M. Snoek, Joseph Tighe.
cvpr22 Oral Paper Link
Id-Free Person Similarity Learning
Bing Shuai, Xinyu Li, Kaustav Kundu, Joseph Tighe.
CVPR22 Paper Link
What to Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions
A S M Iftekhar, Hao Chen, Kaustav Kundu, Xinyu Li, Joseph Tighe, Davide Modolo.
CVPR22 Oral Paper Link
Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
Feng Cheng, Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Li, Wei Xia.
CVPR22 Oral Paper Link
NUTA: Non-uniform Temporal Aggregation for Action Recognition
Xinyu Li*, Chunhui Liu*, Bing Shuai, Yi Zhu, Hao Chen, Joseph Tighe.
WACV22 Paper Link
SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation
Zhe Wang, Hao Chen, Xinyu Li, Chunhui Liu, Yuanjun Xiong, Joseph Tighe, Charless Fowlkes. WACV22 Link
Long Short-Term Transformer for Online Action Detection
Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto.
NeurIPS 2021 Spotlight Paper Link
VidTr: Video Transformer Without Convolutions
Xinyu Li*, Yanyi Zhang*, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Ivan Marsic, Joseph Tighe.
ICCV 2021 Paper Link
Selective Feature Compression for Efficient Activity Recognition Inference
Chunhui Liu*, Xinyu Li*, Hao Chen, Joseph Tighe.
ICCV 2021 Paper Link
Video Contrastive Learning with Global Context
Haofei Kuang, Yi Zhu, Zhi Zhang, Xinyu Li, Joseph Tighe, Sören Schwertfeger, Cyrill Stachniss, Mu Li.
ICCV 2021 workshop Paper Link
Multi-Label Activity Recognition using Activity-specific Features and Activity Correlations
Yanyi Zhang, Xinyu Li*, Ivan Marsic.
CVPR 2021 Paper Link
SiamMOT: Siamese Multi-Object Tracking
Bing Shuai, Andrew G. Berneshawi, Xinyu Li, Davide Modolo, Joseph Tighe.
CVPR 2021 Paper Link
A Comprehensive Study of Deep Video Action Recognition
Zhu, Yi, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, ZhiZhang, Joseph Tighe, R. Manmatha, and Mu Li. Pre-print
Directional temporal modeling for action recognition
Xinyu Li, Bing Shuai, and Joseph Tighe.
ECCV 2020 Paper Link
Application of Multi-Object Tracking with Siamese Track-RCNNto the Human in Events Dataset
Bing Shuai, Andrew G. Berneshawi, Manchen Wang, Chunhui Liu, Davide Modolo, Xinyu Li, Joseph Tighe.
ACM MM 2020 Paper Link
Speech Audio Super-Resolution for Speech Recognition
Xinyu Li, Venkata Chebiyyam, Katrin Kirchhoff.
INTERSPEECH 2019 Paper Link
Multi-stream Network With Temporal Attention For Environmental Sound Classification
Xinyu Li, Venkata Chebiyyam, Katrin Kirchhoff.
INTERSPEECH 2019 Paper Link
Mutual Correlation Attentive Factors In Dyadic Fusion Networks For Speech Emotion Recognition
Gu, Yue, Xinyu Lyu, Weijia Sun, Weitian Li, Shuhong Chen, Xinyu Li, and Ivan Marsic.
ACM MM 2019 Paper Link
Human conversation analysis using attentive multimodal networks with hierarchical encoder-decoder
Yue Gu, Xinyu Li, Kaixiang Huang, Shiyu Fu, Kangning Yang, Shuhong Chen, Moliang Zhou, and Ivan Marsic.
ACM MM 2018 Paper Link
Hybrid Attention based Multimodal Network for Spoken Language Classification
Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, and Ivan Marsic.
COLING 2018 Paper Link
Region-based Activity Recognition Using Conditional GAN
Xinyu Li, Yanyi Zhang, Jianyu Zhang, Yueyang Chen, Huangcan Li, Ivan Marsic, and Randall S. Burd.
ACM MM 2017 Paper Link
Progress Estimation And Phase Detection For Sequential Processes
Xinyu Li, Yanyi Zhang, Jianyu Zhang, Moliang Zhou, Shuhong Chen, Yue Gu, Yueyang Chen, Ivan Marsic.
UBICOMP 2017 Paper Link
Deep Learning for RFID-based Activity Recognition
Xinyu Li, Yanyi Zhang, Ivan Marsic, Aleksandra Sarcevic, and Randall S. Burd.
SenSys 2016
Activity Recognition For Medical Teamwork Based on Passive RFID
Xinyu Li, Dongyang Yao, Xuechao Pan, Jonathan Johannaman, JaeWon Yang, Rachel Webman, Aleksandra Sarcevic, Ivan Marsic, and Randall S. Burd.
IEEE RFID 2016
A Novel Single Image Dehazing Method
Yanjing Yang, Zhizhong Fu, Xinyu Li, Chang Shu, and Xiaofeng Li.
ICCP 2013