OpenEval Project

The OpenEval Project

The OpenEval Project provides transparent, systematic evaluation of large language model (LLM) performance in scientific peer review. We compare LLM-generated reviews against traditional peer reviews to assess their accuracy, consistency, and reliability in identifying claims and evaluating scientific evidence. Explore our dataset of manuscripts below. Click any paper to view detailed claim-by-claim comparisons between LLM and peer reviewer assessments.

-
Papers Evaluated
-
Claims Extracted
-
OpenEval Reviews
-
Peer Reviews
-
Comparisons Made

Processed Manuscripts

Loading manuscripts...