TruthfulQA

Link

About

The benchmark provides a comprehensive and objective evaluation of the capability of modern models in capturing and mimicking human falsehoods. It also includes tools to fine-tune GPT-3 and other pre-trained models to better capture falsehoods in specific domains, and a set of resources to understand and analyze the results.