publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2024
- CVPRWhat, when, and where?–Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated InstructionsIn Ieee/cvf conference on computer vision and pattern recognition (CVPR) , 2024
2023
- ICCVEgoTV: Egocentric Task Verification from Natural Language Task DescriptionsIn International Conference on Computer Vision (ICCV) , 2023
- ICCVPretrained Language Models as Visual Planners for Human AssistanceIn International Conference on Computer Vision (ICCV) workshop , 2023
- WACVPreViTS: Contrastive Pretraining with Video Tracking SupervisionIn Winter Conference on Applications of Computer Vision , 2023
2022
- CVPREverything at once-multi-modal fusion transformer for video retrievalIn Proceedings of the ieee/cvf conference on computer vision and pattern recognition , 2022
- EMNLPWeakly-supervised temporal article groundingIn Empirical Methods in Natural Language Processing findings (EMNLP) , 2022
2021
- EMNLPJoint Multimedia Event Extraction from Video and ArticleIn Empirical Methods in Natural Language Processing findings (EMNLP) , 2021
- ICCVMultimodal Clustering Networks for Self-supervised Learning from Unlabeled VideosIn International Conference on Computer Vision (ICCV) , 2021
- InterspeechAvlnet: Learning audio-visual language representations from instructional videosIn Proceedings of the Interspeech , 2021
- InterspeechCascaded Multilingual Audio-Visual Learning from VideosIn Proceedings of the Interspeech , 2021
- NAACLRESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking SystemIn Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations (NAACL) , 2021
2020
- AAAIGeneral Partial Label Learning via Dual Bipartite Graph AutoencoderIn AAAI Conference on Artificial Intelligence (AAAI) , 2020
- ACLGAIA: A fine-grained multimedia knowledge extraction systemIn Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (ACL) , 2020
2019
- CVPRMulti-level multimodal common semantic space for image-phrase groundingIn Computer Vision and Pattern Recognition (CVPR) , 2019
2018
- TACGAIA-A Multi-media Multi-lingual Knowledge Extraction and Hypothesis Generation System.In TAC , 2018