LLM-as-a-Judge
Using a language model as an automated evaluator of another model's outputs: methodology, calibration with human judgement, known biases, …
Using a language model as an automated evaluator of another model's outputs: methodology, calibration with human judgement, known biases, …