Session 2-Utilization of the Foundation Models and the Future

Session: Utilization of the Foundation Models and the Future

Moderator:

Prof Isao Echizen, Professor, National Institute of Informatics, Japan

Speakers & Presentations

Shiqi Wang (DL 2025-2026), Professor, Department of Computer Science, City University of Hong Kong, Hong Kong, China

Bio: Shiqi Wang is a Professor with the Department of Computer Science, City University of Hong Kong. He has proposed more than 70 technical proposals to ISO/MPEG, ITU-T, and AVS standards, and authored or coauthored more than 300 refereed journal articles/conference papers. His research interests include video compression, image/video quality assessment, and image/video search and analysis. He received the Best Paper Award from IEEE VCIP 2019, ICME 2019, IEEE Multimedia 2018, and PCM 2017. His co-authored article received the Best Student Paper Award at the IEEE ICIP 2018. He was a recipient of the 2021 IEEE Multimedia Rising Star Award in ICME 2021. He served or serves as an Associate Editor for IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Multimedia, IEEE Transactions on Image Processing, and IEEE Transactions on Cybernetics. He was also the technical program co-chair of IEEE ICME 2024.

Hanwei Zhu, Research Scientist, Alibaba-NTU Global e-Sustainability CorpLab (ANGEL), Nanyang Technological University, Singapore

Bio: Dr. Hanwei Zhu is a Research Scientist with the Alibaba-NTU Global e-Sustainability CorpLab (ANGEL) at Nanyang Technological University, Singapore. He earned his Ph.D. degree from City University of Hong Kong in 2025. His research interests include perceptual image processing, computational vision, and computational photography.

Presentation Title: Visual Quality Assessment Based on Large Vision-Language Models

Presenters: Shiqi Wang and Hanwei Zhu

Abstract: Large vision–language models (LVLMs) have recently exhibited significant potential in visual understanding tasks, yet systematically evaluating their image quality assessment (IQA) capabilities remains challenging. This talk introduces a unified approach to IQA that transitions from traditional scalar metrics to sophisticated reasoning-based evaluation. Specifically, we present three key innovations: (1) a Two-Alternative Forced Choice (2AFC) framework employing strategic pairing and maximum-a-posteriori inference for robust LVLM ranking; (2) an open-ended visual quality comparison task enabling detailed and context-aware model rationales; and (3) a novel no-reference IQA model that translates comparative judgments from LVLMs into continuous quality scores. Additionally, we introduce AgenticIQA, a modular, divide-and-conquer framework that combines LVLM reasoning with conventional IQA tools, coordinated by planning, execution, and summarization agents. Together, these contributions chart a path towards intelligent, interpretable, and adaptable visual quality assessment for the next generation of multimodal models.

Koki Wataoka, Responsible AI Team, SB Intuitions, Japan

Bio: Koki Wataoka leads the Responsible AI Team in the Data & Safety Department of the R&D Headquarters at SB Intuitions, Japan, where he oversees research and development to advance the safety of LLMs and VLMs. He earned his master’s degree from the Graduate School of System Informatics at Kobe University in 2021. That same year, he joined LINE Corporation (now LINE Yahoo!), focusing on the reliability and safety of large-scale language models. In 2023, he moved to SB Intuitions, where he continues to drive responsible AI initiatives and strengthen the safety of next-generation AI systems.

Huy Hong Nguyen, Researcher, SB Intuitions, Japan

Bio: Huy H. Nguyen is a researcher at SB Intuitions, a SoftBank Group company. He is also a visiting associate professor at the National Institute of Informatics (NII), Japan. His research focuses on improving the safety, security, and privacy of LLMs and VLMs, as well as the generation and detection of synthetic media. His future research vision includes extending these efforts to safeguard artificial general intelligence (AGI). He earned his Ph.D. from The Graduate University for Advanced Studies (SOKENDAI) in collaboration with NII in 2022.

Presentation Title: Foundation Models as Guardrails: LLM- and VLM-Based Approaches to Safety and Alignment

Presenters: Koki Wataoka and Huy Hong Nguyen

Abstract: The growing deployment of large language models (LLMs) and vision-language models (VLMs) raises urgent concerns about safety and alignment. While alignment techniques such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) improve model behavior, they are not sufficient to prevent harmful outputs. This paper reviews recent approaches that use foundation models themselves as guardrails systems that monitor or filter inputs and outputs for safety. We cover LLM-based moderation, neural classifiers, and multimodal safety filters, highlighting both academic advances and industry tools. We also discuss empirical evaluation methods such as red teaming and adversarial prompting. Finally, we outline open challenges in robustness, interpretability, and policy adaptation, pointing to key directions for building trustworthy guardrails for generative AI.