Evaluating Large Language Models in Scientific Discovery
概要
arXiv:2512.15567v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive…