This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language Agents
Large language models are powering a new wave of digital agents to handle sophisticated web-based tasks. These agents are expected to interpret user instructions, navigate interfaces, and execute complex commands…
Language Models Reinforce Dialect Discrimination – The Berkeley Artificial Intelligence Research Blog
Sample language model responses to different varieties of English and native speaker reactions. ChatGPT does amazingly well at communicating with people in English. But whose English? Only 15% of ChatGPT…
DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAI
Artificial intelligence (AI) has made significant strides in recent years, yet challenges persist in achieving efficient, cost-effective, and high-performance models. Developing large language models (LLMs) often requires substantial computational resources…
Virtual Personas for Language Models via an Anthology of Backstories – The Berkeley Artificial Intelligence Research Blog
We introduce Anthology, a method for conditioning LLMs to representative, consistent, and diverse virtual personas by generating and utilizing naturalistic backstories with rich details of individual values and experience. What…
Qwen Releases the Qwen2.5-VL-32B-Instruct: A 32B Parameter VLM that Surpasses Qwen2.5-VL-72B and Other Models like GPT-4o Mini
In the evolving field of artificial intelligence, vision-language models (VLMs) have become essential tools, enabling machines to interpret and generate insights from both visual and textual data. Despite advancements, challenges…
A 100-AV Highway Deployment – The Berkeley Artificial Intelligence Research Blog
Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to…
TokenSet: A Dynamic Set-Based Framework for Semantic-Aware Visual Representation
Visual generation frameworks follow a two-stage approach: first compressing visual signals into latent representations and then modeling the low-dimensional distributions. However, conventional tokenization methods apply uniform spatial compression ratios regardless…
This AI Paper from NVIDIA Introduces Cosmos-Reason1: A Multimodal Model for Physical Common Sense and Embodied Reasoning
Artificial intelligence systems designed for physical settings require more than just perceptual abilities—they must also reason about objects, actions, and consequences in dynamic, real-world environments. These systems must understand spatial…
Lyra: A Computationally Efficient Subquadratic Architecture for Biological Sequence Modeling
Deep learning architectures like CNNs and Transformers have significantly advanced biological sequence modeling by capturing local and long-range dependencies. However, their application in biological contexts is constrained by high computational…
Achieving Critical Reliability in Instruction-Following with LLMs: How to Achieve AI Customer Service That’s 100% Reliable
Ensuring reliable instruction-following in LLMs remains a critical challenge. This is particularly important in customer-facing applications, where mistakes can be costly. Traditional prompt engineering techniques fail to deliver consistent results.…













