What are ‘Computer-Use Agents’? From Web to OS—A Technical Explainer
TL;DR: Computer-use agents are VLM-driven UI agents that act like users on unmodified software. Baselines on OSWorld started at 12.24% (human 72.36%); Claude Sonnet 4.5 now reports 61.4%. Gemini 2.5…
Microsoft Research Releases Skala: a Deep-Learning Exchange–Correlation Functional Targeting Hybrid-Level Accuracy at Semi-Local Cost
TL;DR: Skala is a deep-learning exchange–correlation functional for Kohn–Sham Density Functional Theory (DFT) that targets hybrid-level accuracy at semi-local cost, reporting MAE ≈ 1.06 kcal/mol on W4-17 (0.85 on the…
Tiny Recursive Model (TRM): A Tiny 7M Model that Surpass DeepSeek-R1, Gemini 2.5 pro, and o3-mini at Reasoning on both ARG-AGI 1 and ARC-AGI 2
Can an iterative draft–revise solver that repeatedly updates a latent scratchpad outperform far larger autoregressive LLMs on ARC-AGI? Samsung SAIT (Montreal) has released Tiny Recursive Model (TRM)—a two-layer, ~7M-parameter recursive…
RA3: Mid-Training with Temporal Action Abstractions for Faster Reinforcement Learning (RL) Post-Training in Code LLMs
TL;DR: A new research from Apple, formalizes what “mid-training” should do before reinforcement learning RL post-training and introduces RA3 (Reasoning as Action Abstractions)—an EM-style procedure that learns temporally consistent latent…
Stanford Researchers Released AgentFlow: In-the-Flow Reinforcement Learning RL for Modular, Tool-Using AI Agents
TL;DR: AgentFlow is a trainable agent framework with four modules—Planner, Executor, Verifier, Generator—coordinated by an explicit memory and toolset. The planner is optimized in the loop with a new on-policy…
Anthropic AI Releases Petri: An Open-Source Framework for Automated Auditing by Using AI Agents to Test the Behaviors of Target Models on Diverse Scenarios
How do you audit frontier LLMs for misaligned behavior in realistic multi-turn, tool-use settings—at scale and beyond coarse aggregate scores? Anthropic released Petri (Parallel Exploration Tool for Risky Interactions), an…
Model Context Protocol (MCP) vs Function Calling vs OpenAPI Tools — When to Use Each?
MCP (Model Context Protocol): Open, transport-agnostic protocol that standardizes discovery and invocation of tools/resources across hosts and servers. Best for portable, multi-tool, multi-runtime systems. Function…
Google AI Introduces Gemini 2.5 ‘Computer Use’ (Preview): A Browser-Control Model to Power AI Agents to Interact with User Interfaces
Which of your browser workflows would you delegate today if an agent could plan and execute predefined UI actions? Google AI introduces Gemini 2.5 Computer Use, a specialized variant of…
Meta AI Open-Sources OpenZL: A Format-Aware Compression Framework with a Universal Decoder
How much compression ratio and throughput would you recover by training a format-aware graph compressor and shipping only a self-describing graph to a universal decoder?…
An Intelligent Conversational Machine Learning Pipeline Integrating LangChain Agents and XGBoost for Automated Data Science Workflows
In this tutorial, we combine the analytical power of XGBoost with the conversational intelligence of LangChain. We build an end-to-end pipeline that can generate synthetic datasets, train an XGBoost model,…