Problem Difficulty CodeChef

13h

I tested GPT-5.4, and the answers were really good - just not always what I asked

Here's where GPT-5.4 Thinking begins to really shine. When I asked GPT-5.2, "Do you think social media has improved or worsened communication in society?" I got back a two-line answer. Both thoughts ...

eWeek

Gemini Beats Claude, GPT in Google’s First Android AI Coding Benchmark

Google’s new Android Bench ranks the top AI models for Android coding, with Gemini 3.1 Pro Preview leading Claude Opus 4.6 and GPT-5.2-Codex.

News9Live on MSN

Claude Opus 4.6 detects AI test, writes code to unlock hidden answers

Anthropic researchers say Claude Opus 4.6 showed unusual behaviour during a BrowseComp evaluation. The model suspected it was ...

10h

Claude AI discovered 22 Firefox flaws. Here's how many it figured out how to exploit.

"Claude Opus 4.6 discovered 22 vulnerabilities over the course of two weeks. Of these, Mozilla assigned 14 as high-severity vulnerabilities —almost a fifth of all high-severity Firefox vulnerabilities ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results