Brian Chen
I’m a senior researcher at Samsung Research America working on video summarization and text-to-image editing/style transfer. Before I joined SRA, I was a visiting researcher at Meta Reality Lab/FAIR Embodied AI. I received my Ph.D. at Dept. Of Computer Science, Columbia University, in DVMM lab advised by Prof. Shih-Fu Chang.
My research interests focus on Computer Vision
, Multimodal Learning
, and Self-supervised Learning
. Particularly, I am interested in learning representations from videos.
In Meta, I worked on a project to develop egocentric agents on Aria that understand everyday tasks specified in natural language and generate goal plans for further assistance. We leverage the large language model (LLM) LLaMA2 and video training for zero-shot goal planning. Our model can generate future plans conditioned on the given video and input goals.
During my Ph.D. period, I worked on the DARPA AIDA project, which mainly focused on incorporating cross-domain knowledge (images, videos, text, and audio) for knowledge graph construction.
I am closely working with IBM Research and MIT CSAIL on the Sight and Sound Project (since 2020 - ), aiming at learning representations from video and audio.
Prior to joining Columbia Univ., I finished my Bachelor and Master degrees at the Dept. of Computer Science and Information Eng., National Taiwan University, in 2015 and 2017 respectively, advised by Prof. Shou-De Lin.
news
Oct 03, 2023 | Joined Samsung Research America as a Senior Researcher. |
---|---|
May 09, 2023 | Obtained Ph.D. at Dept. Of Computer Science, Columbia University. Advised by Prof. Shih-Fu Chang. |
Oct 03, 2022 | Started Visiting Researcher at Meta Reality Lab/FAIR Embodied AI Collaborate with Ruta Desai, Rishi Hazra, and Tushar Nagarajan. |
Jun 01, 2021 | Started Research Intern at Salesforce Research Under the guidance of Ramprasaath Selvaraju, Juan Carlos Niebles, and Nikhil Naik. |
Jun 01, 2020 | Started Research Intern at IBM Research Under the guidance of Samuel Thomas, Brian Kingsbury, and Hilde Kuehne. |
Aug 01, 2019 | Started Research Intern at NTU Under the guidance of Hanwang Zhang |
selected publications
- CVPRWhat, when, and where?–Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated InstructionsIn Ieee/cvf conference on computer vision and pattern recognition (CVPR) , 2024
- ICCVEgoTV: Egocentric Task Verification from Natural Language Task DescriptionsIn International Conference on Computer Vision (ICCV) , 2023
- WACVPreViTS: Contrastive Pretraining with Video Tracking SupervisionIn Winter Conference on Applications of Computer Vision , 2023
- CVPREverything at once-multi-modal fusion transformer for video retrievalIn Proceedings of the ieee/cvf conference on computer vision and pattern recognition , 2022
- EMNLPJoint Multimedia Event Extraction from Video and ArticleIn Empirical Methods in Natural Language Processing findings (EMNLP) , 2021
- ICCVMultimodal Clustering Networks for Self-supervised Learning from Unlabeled VideosIn International Conference on Computer Vision (ICCV) , 2021
- AAAIGeneral Partial Label Learning via Dual Bipartite Graph AutoencoderIn AAAI Conference on Artificial Intelligence (AAAI) , 2020