Microsoft AI Introduces Belief State Transformer (BST): Enhancing Goal-Conditioned Sequence Modeling with Bidirectional Context

Transformer models have transformed language modeling by enabling large-scale text generation with emergent properties. However, they struggle with tasks that require extensive planning. Researchers have explored modifications in architecture, objectives,…

Researchers from AMLab and CuspAI Introduced Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems

Deep learning faces difficulties when applied to large physical systems on irregular grids, especially when interactions occur over long distances or at multiple scales. Handling these complexities becomes harder as…

Alibaba Researchers Propose START: A Novel Tool-Integrated Long CoT Reasoning LLM that Significantly Enhances Reasoning Capabilities by Leveraging External Tools

Large language models have made significant strides in understanding and generating human-like text. Yet, when it comes to complex reasoning tasks—especially those that require multi-step calculations or logical analysis—they often…

Q-Filters: A Training-Free AI Method for Efficient KV Cache Compression

Large Language Models (LLMs) have significantly advanced due to the Transformer architecture, with recent models like Gemini-Pro1.5, Claude-3, GPT4, and Llama3.1 demonstrating capabilities to process hundreds of thousands of tokens.…

A Coding Guide to Sentiment Analysis of Customer Reviews Using IBM’s Open Source AI Model Granite-3B and Hugging Face Transformers

In this tutorial, we will look into how to easily perform sentiment analysis on text data using IBM’s open-source Granite 3B model integrated with Hugging Face Transformers. Sentiment analysis, a…

CASS: Injecting Object-Level Context for Advanced Open-vocabulary semantic segmentation

This paper was just accepted at CVPR 2025. In short, CASS is as an elegant solution to Object-Level Context in open-world segmentation. They outperform several training-free approaches and even surpasses…

Starter Guide For Running Large Language Models LLMs

Running large language models (LLMs) presents significant challenges due to their hardware demands, but numerous options exist to make these powerful tools accessible. Today’s landscape offers several approaches – from…

AMD Releases Instella: A Series of Fully Open-Source State-of-the-Art 3B Parameter Language Model

In today’s rapidly evolving digital landscape, the need for accessible, efficient language models is increasingly evident. Traditional large-scale models have advanced natural language understanding and generation considerably, yet they often…

Meta AI Introduces Brain2Qwerty: Advancing Non-Invasive Sentence Decoding with MEG and Deep Learning

Neuroprosthetic devices have significantly advanced brain-computer interfaces (BCIs), enabling communication for individuals with speech or motor impairments due to conditions like anarthria, ALS, or severe paralysis. These devices decode neural…

Alibaba Released Babel: An Open Multilingual Large Language Model LLM Serving Over 90% of Global Speakers

Most existing LLMs prioritize languages with abundant training resources, such as English, French, and German, while widely spoken but underrepresented languages like Hindi, Bengali, and Urdu receive comparatively less attention.…