arXiv cs.AI by Synapse Flow 編集部

Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs

概要

arXiv:2512.09874v2 Announce Type: replace-cross Abstract: Correctly parsing mathematical formulas from PDFs is critical for training large language models and building scientific knowledge bases from academic literature, yet existing benchmarks either exclude formulas entirely or lack semantically-…

元記事を読む →

関連記事