Math Benchmarking - Search News

FrontierMath Benchmark Exposes AI Struggles in Advanced Math

eSpeaks’ Corey Noles talks with Rob Israch, President of Tipalti, about what it means to lead with Global-First Finance and how companies can build scalable, compliant operations in an increasingly ...

The BackDash on MSN

Computer that can't do math, experts slam AI's calculation capabilities after Apple findings reveal flaws even in advanced models

Experts slam calculation capabilities of the AI after Apple findings reveals flaws even in advanced models of the Artificial Intelligence The post “Computer that can’t do math”, Experts slam AI’s ...

VentureBeat

New open-source math model Light-R1-32B surpasses equivalent DeepSeek performance with only $1000 in training costs

Researchers have introduced Light-R1-32B, a new open-source AI model optimized to solve advanced math problems. It is now available on Hugging Face under a permissive Apache 2.0 license — free for ...

The National Law Review

ORCA Benchmark Shows That AI Frequently Fumbles Everyday Math

KRAKóW, MAłOPOLSKA, POLAND, November 7, 2025 /EINPresswire.com/ -- Omni Calculator has introduced the ORCA (Omni Research on Calculation in AI) Benchmark - a new ...

Hosted on MSN

AI is actually bad at math, ORCA shows

ORCA benchmark trips up ChatGPT-5, Gemini 2.5 Flash, Claude Sonnet 4.5, Grok 4, and DeepSeek V3.2 In the world of George Orwell's 1984, two and two make five. And large language models are not much ...

Yahoo Finance

OpenAI GPT score on FrontierMath Benchmark by June 30?

This market will resolve to "Yes" if any OpenAI GPT model achieves the listed score or greater on the FrontierMath Exam by June 30, 2026, 11:59 PM ET. Otherwise, the market will resolve to "No". This ...

Yahoo Finance

Google Gemini score on FrontierMath Benchmark by June 30?

This market will resolve to "Yes" if any Google Gemini model achieves the listed score or greater on the FrontierMath Exam by June 30, 2026, 11:59 PM ET. Otherwise, the market will resolve to "No".

TechSpot

Move over math and reasoning, it's time to benchmark AI using Super Mario Bros.

The big picture: Benchmarking AI remains a thorny issue, with companies often accused of cherry-picking flattering results while burying less favorable ones. Instead of fixating on math and logic ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results