From Pixels to Prompts: Vision-Language Models
概要
arXiv:2605.07544v1 Announce Type: new Abstract: When you read a paper about a new Vision-Language Model today, it can be easy to forget how strange this idea would have sounded not so long ago. Teaching machines to see was already hard. Teaching them to read and generate language was already hard. …