Google says that its most advanced thinking model yet outperforms Claude and ChatGPT on Humanity's Last Exam and other key benchmarks.
Hugging Face has launched Community Evals, a feature that enables benchmark datasets on the Hub to host their own ...