Visual Language Model with Vision Task

Microsoft Launches Vision-Language-Action Model for Robots

Designed to improve robots’ reasoning, the Rho-alpha vision-language-action model marks Microsoft’s offering in the growing field of physical AI.

InfoWorld

Gemini Flash model gets visual reasoning capability

Agentic Vision combines visual reasoning with code execution to ground answers in visual evidence, delivering a 5% to 10% ...

How Moonshot's Kimi K2.5 helps AI builders spin up agent swarms easier than ever

On the Humanity’s Last Exam (HLE) benchmark, Kimi K2.5 scored 50.2% (with tools), surpassing OpenAI’s GPT-5.2 (xhigh) and ...

Microsoft Extends Its Phi Models To Physical AI With Rho-Alpha

Physical AI marks a transition from robots as programmed tools to robots as adaptable collaborators. That transition will ...

Gemini 3 Flash gets Agentic Vision to deliver more accurate, evidence-based image understanding

Google has introduced Agentic Vision for Gemini 3 Flash, a new capability that improves how the model understands and ...

TechNode

DeepSeek releases OCR 2 with new visual encoding architecture, targeting more human-like machine vision

Chinese AI startup DeepSeek on Tuesday released a research paper and open-sourced its latest optical character recognition ...

9to5Google

Gemini 3 Flash’s new ‘Agentic Vision’ improves image responses

Agentic Vision is a new capability for Gemini 3 Flash to make image-related tasks more accurate by “grounding answers in visual evidence.” ...

Chinese AI start-ups open-source new vision and multimodal models, accelerating adoption and ecosystem growth

Chinese large-language model (LLM) start-ups including DeepSeek and Moonshot AI have rapidly open-sourced their latest models ...

Communications of the ACM

Building Intelligent Agents with Neuro-Symbolic Concepts

The agent acquires a vocabulary of neuro-symbolic concepts for objects, relations, and actions, represented through a ...

Microsoft’s New “Physical AI” Could Make Robots Smarter Than Ever

Microsoft has announced Rho-alpha, a new robotics AI model derived from its Phi vision-language series, aimed at helping ...

TMCnet

Robbyant Open-Sources LingBot-World, a World Model for Millisecond-Level Real-Time Interaction

Robbyant, an embodied AI company within Ant Group, today announced the open-source release of LingBot-World, a world model that achieves industry-leading performance in video quality, dynamic fidelity ...

Poetiq nabs $45.8M in seed funding for its LLM-enhancing ‘meta-system’

Poetiq was launched last year by former Google DeepMind researchers Shumeet Baluja and Ian Fischer. Baluja established the Alphabet unit’s computer vision group, while Fischer helped develop some of ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results