publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

WACV

User-in-the-Loop Evaluation of Multimodal LLMs for Activity Assistance

Mrinal Verghese , Brian Chen , Hamid Eghbalzadeh , and 2 more authors

In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , 2025

2024

CIKM

Personalized Video Summarization by Multimodal Video Understanding

Brian Chen , Xiangyuan Zhao , and Yingnan Zhu

In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , 2024
CVPR

What, when, and where?–Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Brian Chen , Nina Shvetsova , Andrew Rouditchenko , and 6 more authors

In Ieee/cvf conference on computer vision and pattern recognition (CVPR) , 2024

2023

ICCV

EgoTV: Egocentric Task Verification from Natural Language Task Descriptions

Rishi Hazra , Brian Chen , Akshara Rai , and 2 more authors

In International Conference on Computer Vision (ICCV) , 2023
ICCV

Pretrained Language Models as Visual Planners for Human Assistance

Dhruvesh Patel , Hamid Eghbalzadeh , Brian Chen , and 4 more authors

In International Conference on Computer Vision (ICCV) workshop , 2023
WACV

PreViTS: Contrastive Pretraining with Video Tracking Supervision

Brian Chen , Ramprasaath R Selvaraju , Shih-Fu Chang , and 2 more authors

In Winter Conference on Applications of Computer Vision , 2023

2022

CVPR

Everything at once-multi-modal fusion transformer for video retrieval

Nina Shvetsova , Brian Chen , Andrew Rouditchenko , and 6 more authors

In Proceedings of the ieee/cvf conference on computer vision and pattern recognition , 2022
EMNLP

Weakly-supervised temporal article grounding

Long Chen , Yulei Niu , Brian Chen , and 6 more authors

In Empirical Methods in Natural Language Processing findings (EMNLP) , 2022

HTML

2021

EMNLP

Joint Multimedia Event Extraction from Video and Article

Brian Chen , Xudong Lin , Christopher Thomas , and 5 more authors

In Empirical Methods in Natural Language Processing findings (EMNLP) , 2021
ICCV

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

Brian Chen , Andrew Rouditchenko , Kevin Duarte , and 3 more authors

In International Conference on Computer Vision (ICCV) , 2021
Interspeech

Avlnet: Learning audio-visual language representations from instructional videos

Andrew Rouditchenko , Angie Boggust , David Harwath , and 2 more authors

In Proceedings of the Interspeech , 2021
Interspeech

Cascaded Multilingual Audio-Visual Learning from Videos

Andrew Rouditchenko , Angie Boggust , David Harwath , and 4 more authors

In Proceedings of the Interspeech , 2021
NAACL

RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System

Haoyang Wen , Ying Lin , Tuan Lai , and 4 more authors

In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations (NAACL) , 2021

2020

AAAI

General Partial Label Learning via Dual Bipartite Graph Autoencoder

Brian Chen , Bo Wu , Alireza Zareian , and 2 more authors

In AAAI Conference on Artificial Intelligence (AAAI) , 2020
ACL

GAIA: A fine-grained multimedia knowledge extraction system

Manling Li , Alireza Zareian , Ying Lin , and 8 more authors

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (ACL) , 2020

2019

CVPR

Multi-level multimodal common semantic space for image-phrase grounding

Hassan Akbari , Svebor Karaman , Surabhi Bhargava , and 3 more authors

In Computer Vision and Pattern Recognition (CVPR) , 2019

2018

TAC

GAIA-A Multi-media Multi-lingual Knowledge Extraction and Hypothesis Generation System.

Tongtao Zhang , Ananya Subburathinam , ... , and 3 more authors

In TAC , 2018