중앙대학교 첨단영상대학원

대학원소개

논문성과

논문성과

IRIS Lab's (Prof. Hak Gu Kim) papers accepted to ICIP 2024 관리자 │ 2024-07-04 HIT 29540
Two papers of Immersive Reality & Intelligent Systems Lab (IRIS Lab) are accepted to IEEE International Conference on Image Processing (ICIP) 2024 and ICIP Workshop 2024, respectively. <ICIP 2024> Title: Analyzing Visible Articulatory Movements in Speech Production for Speech-Driven 3D Facial Animation Authors: Hyung Kyu Kim, Sangmin Lee (UIUC), Hak Gu Kim Abstract: Speech-driven 3D facial animation aims to generate realistic facial meshes based on input speech signals. However, due to a lack of understanding of visible articulatory movements, current state-of-the-art methods result in inaccurate lip and jaw movements. Traditional evaluation metrics, such as lip vertex error (LVE), often fail to represent the quality of visual results. Based on our observation, we reveal the problems with existing evaluation metrics and raise the necessity for separate evaluation approaches for 3D axes. Comprehensive analysis shows that most recent methods struggle to precisely predict lip and jaw movements in 3D space. <ICIP Workshop 2024> Title: Unveiling the Potential of Multimodal Large Language Models for Scene Text Segmentation via Semantic-Enhanced Features Authors: Ho Jun Kim^, Hyung Kyu Kim^, Sangmin Lee (UIUC), Hak Gu Kim (^equal contribution) Abstract:* Scene text segmentation is to accurately identify text areas within a scene while disregarding non-textual elements like background imagery or graphical elements. However, current text segmentation models often fail to accurately segment text regions due to complex background noises or various font styles and sizes. To address this issue, it is essential to consider not only visual information but also semantic information of text in scene text segmentation. For this purpose, we propose a novel semantic-aware scene text segmentation framework, which incorporates multimodal large language models (MLLMs) to fuse visual, text, and linguistic information. By leveraging semantic-enhanced features from multimodal LLMs, the scene text segmentation model can remove false positives that are visually confusing but not recognized as text. Both qualitative and quantitative evaluations demonstrate that multimodal LLMs improve scene text segmentation performances.

IRIS Lab's (Prof. Hak Gu Kim) papers accepted to ICIP 2024

관리자 │ 2024-07-04

HIT

29540

Two papers of Immersive Reality & Intelligent Systems Lab (IRIS Lab) are accepted to IEEE International Conference on Image Processing (ICIP) 2024 and ICIP Workshop 2024, respectively.

<ICIP 2024>

Title:

Analyzing Visible Articulatory Movements in Speech Production for Speech-Driven 3D Facial Animation

Authors:

Hyung Kyu Kim, Sangmin Lee (UIUC), Hak Gu Kim

Abstract:

Speech-driven 3D facial animation aims to generate realistic facial meshes based on input speech signals. However, due to a lack of understanding of visible articulatory movements, current state-of-the-art methods result in inaccurate lip and jaw movements. Traditional evaluation metrics, such as lip vertex error (LVE), often fail to represent the quality of visual results. Based on our observation, we reveal the problems with existing evaluation metrics and raise the necessity for separate evaluation approaches for 3D axes. Comprehensive analysis shows that most recent methods struggle to precisely predict lip and jaw movements in 3D space.

<ICIP Workshop 2024>

Title:

Unveiling the Potential of Multimodal Large Language Models for Scene Text Segmentation via Semantic-Enhanced Features

Authors:

Ho Jun Kim^*, Hyung Kyu Kim^*, Sangmin Lee (UIUC), Hak Gu Kim (^*equal contribution)

Abstract:

Scene text segmentation is to accurately identify text areas within a scene while disregarding non-textual elements like background imagery or graphical elements. However, current text segmentation models often fail to accurately segment text regions due to complex background noises or various font styles and sizes. To address this issue, it is essential to consider not only visual information but also semantic information of text in scene text segmentation. For this purpose, we propose a novel semantic-aware scene text segmentation framework, which incorporates multimodal large language models (MLLMs) to fuse visual, text, and linguistic information. By leveraging semantic-enhanced features from multimodal LLMs, the scene text segmentation model can remove false positives that are visually confusing but not recognized as text. Both qualitative and quantitative evaluations demonstrate that multimodal LLMs improve scene text segmentation performances.

이전글	CM Lab's (Prof. Jihyong Oh) paper accepted to CVPR 2024 (AI Top-tier Conference)
다음글	IIPL's (Prof. Youngbin Kim) paper accepted to ECCV 2024 (AI Top-tier Conference)

대학원소개

구성원

입학·학사

전공

졸업

커뮤니티

대학원소개

구성원

입학·학사

전공

졸업

커뮤니티

대학원소개

논문성과