Shanghai AI Lab Releases OREAL-7B and OREAL-32B: Advancing Mathematical Reasoning with Outcome Reward-Based Reinforcement Learning
Mathematical reasoning remains a difficult area for artificial intelligence (AI) due to the complexity of problem-solving and the need for structured, logical thinking. While large language models (LLMs) have made…
This AI Paper Explores Long Chain-of-Thought Reasoning: Enhancing Large Language Models with Reinforcement Learning and Supervised Fine-Tuning
Large language models (LLMs) have demonstrated proficiency in solving complex problems across mathematics, scientific research, and software engineering. Chain-of-thought (CoT) prompting is pivotal in guiding models through intermediate reasoning steps…
LLMDet: How Large Language Models Enhance Open-Vocabulary Object Detection
Open-vocabulary object detection (OVD) aims to detect arbitrary objects with user-provided text labels. Although recent progress has enhanced zero-shot detection ability, current techniques handicap themselves with three important challenges. They…
Zyphra Introduces the Beta Release of Zonos: A Highly Expressive TTS Model with High Fidelity Voice Cloning
Text-to-speech (TTS) technology has made significant strides in recent years, but challenges remain in creating natural, expressive, and high-fidelity speech synthesis. Many TTS systems struggle to replicate the nuances of…
Google DeepMind Introduces AlphaGeometry2: A Significant Upgrade to AlphaGeometry Surpassing the Average Gold Medalist in Solving Olympiad Geometry
The International Mathematical Olympiad (IMO) is a globally recognized competition that challenges high school students with complex mathematical problems. Among its four categories, geometry stands out as the most consistent…
Efficient Alignment of Large Language Models Using Token-Level Reward Guidance with GenARM
Large language models (LLMs) must align with human preferences like helpfulness and harmlessness, but traditional alignment methods require costly retraining and struggle with dynamic or conflicting preferences. Test-time alignment approaches…
Adaptive Inference Budget Management in Large Language Models through Constrained Policy Optimization
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, particularly in mathematical problem-solving and coding applications. Research has shown a strong correlation between the length of reasoning…
Tutorial to Fine-Tuning Mistral 7B with QLoRA Using Axolotl for Efficient LLM Training
In this tutorial, we demonstrate the workflow for fine-tuning Mistral 7B using QLoRA with Axolotl, showing how to manage limited GPU resources while customizing the model for new tasks. We’ll…
This AI Paper Introduces MaAS (Multi-agent Architecture Search): A New Machine Learning Framework that Optimizes Multi-Agent Systems
Large language models (LLMs) are the foundation for multi-agent systems, allowing multiple AI agents to collaborate, communicate, and solve problems. These agents use LLMs to understand tasks, generate responses, and…
BARE: A Synthetic Data Generation AI Method that Combines the Diversity of Base Models with the Quality of Instruct-Tuned Models
As the need for high-quality training data grows, synthetic data generation has become essential for improving LLM performance. Instruction-tuned models are commonly used for this task, but they often struggle…













