BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
概要
arXiv:2605.07394v1 Announce Type: cross Abstract: Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate cap…