CV
Summary
Software Engineering student at Fudan University working on NLP, trustworthy evaluation of large language models, medical benchmarks, and scientific intelligence.
Education
-
Software EngineeringPresentFudan University
Skills
Research Areas
- Natural Language Processing
- Trustworthy LLM Evaluation
- Medical NLP
- Scientific Intelligence
Publications
-
OpenNovelty: An Open-domain Benchmark for Evaluating the Open-ended Novelty of Language Models2026arXivA benchmark for evaluating whether language models can assess the novelty of open-ended research ideas and claims.
-
LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models2025arXivA dynamic evaluation framework for robust, contamination-resistant, and fair assessment of large language models.
-
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation2025Findings of EMNLP 2025A physician-validated real-world clinical benchmark for evaluating medical LLMs across diverse medical scenarios.
Interests
-
Research InterestsRobust and fair evaluation of language models, Medical benchmarks with expert validation, Open-ended novelty assessment