Visual Basic Computer Language

CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering

Abstract: Multimodal large language models (MLLMs) have garnered widespread attention from researchers due to their remarkable understanding and generation capabilities in visual language tasks (e.g., ...

IEEE

SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes

Abstract: Reference Audio-Visual Segmentation (Ref-AVS) aims to provide a pixel-wise scene understanding in Language-aided Audio-Visual Scenes (LAVS). This task requires the model to continuously ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering

SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes

Trending now