Philipp Schmid • 6/28/2024

Evaluating Open LLMs with MixEval: The Closest Benchmark to LMSYS Chatbot Arena

The article discusses MixEval, a benchmark for evaluating open-source Large Language Models (LLMs) that combines real-world user queries with existing benchmarks. It highlights its 0.96 ranking correlation with LMSYS Chatbot Arena, low cost (~$0.6 to run), and features like dynamic updates and fair grading. It also covers a forked version with enhancements for local model evaluation and Hugging Face integration.

0 comments

#open source #large language models #benchmark