Discussion about this post

User's avatar
Raphael Troncy's avatar

Thanks for this blog post! You may be interested in this recent paper we published at the EvalLLM 2025 workshop:

Sarra Gharsallah, Adele Robaldo, Mariia Tokareva, Giovanni Gatti Pinheiro, Ilyana Guendouz, Raphael Troncy, Paolo Papotti and Pietro Michiardi. Can We Trust the Judges ? Validation of Factuality Evaluation Methods via Answer Perturbation. In Workshop on Evaluation Generative Models (LLM) and Challenges (EvalLLM) colocated with TALN, Marseille (France), 2025.

Read our full blog post: https://giovannigatti.github.io/trutheval/

Watch a youtube explainer: https://www.youtube.com/watch?v=f0XJkMuyZlM

Play with our open source library: https://github.com/GiovanniGatti/trutheval/

Expand full comment

No posts