Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala
概要
arXiv:2601.14958v3 Announce Type: replace-cross Abstract: The performance of Language Models (LMs) on low-resource, morphologically rich languages like Sinhala remains largely unexplored, particularly regarding script variation in digital communication. Sinhala exhibits script duality, with Unicode…