AdaReasoner Project Icon

AdaReasoner

Adaptive Reasoning Enables More Flexible Thinking

Project Overview

AdaReasoner is a novel, plug-and-play framework designed to automatically configure reasoning strategies for Large Language Models (LLMs). Unlike traditional prompting methods that use static configurations, AdaReasoner dynamically selects optimal parameters (like temperature and number of reasoning steps), adapting to different tasks for improved performance and robustness.

Motivation

Why AdaReasoner?

Large language models have shown remarkable reasoning abilities, but their success often hinges on getting just the right settings—like how many steps to take, whether to think step-by-step, or how much randomness to allow. The problem? These settings are usually fixed, and what works for one task might completely fail on another. AdaReasoner was created to tackle this issue head-on: instead of relying on a one-size-fits-all approach, it learns to adapt the reasoning strategy to each specific question, making the model more flexible, reliable, and effective across a wide range of tasks.

Key Contributions

Main Contributions
AdaReasoner is an LLM-agnostic plugin that automates adaptive reasoning configurations for tasks requiring diverse types of thinking, implemented as a reinforcement learning framework with a factorized action space.
Its training is data-efficient yet scalable, requiring only a small number of samples for each task aided by the use of the Boltzmann exploration mechanism.
Extensive evaluations on diverse tasks show that AdaReasoner outperforms standard Chain-of-Thought (CoT) and other baselines, and sustains strong out-of-distribution (OOD) performance.

Core Idea

Adaptive Reasoning

AdaReasoner introduces adaptive reasoning through a model-agnostic, reinforcement learning (RL)-based policy system. Its main innovations include:

Factorized Action Space: Separates reasoning parameters (e.g., temperature, steps) to simplify and speed up policy learning.
Targeted Exploration Strategy: Guides exploration towards the most influential configurations.
Reward Model: Trained to predict the quality of reasoning outcomes, enabling efficient training with only a few examples.
AdaReasoner Method Diagram

Figure: Overview of AdaReasoner's architecture and workflow

Technical Highlights

RL Framework

Learns to select the best configuration for each prompt using reward signals from a pre-trained model.

Sample Efficiency

Achieves strong performance with minimal supervision and data.

Compatibility

Can be plugged into any LLM without modification.

Experimental Results

AdaReasoner was tested across six different LLMs and various reasoning tasks. Key findings include:

Improved Accuracy: Outperforms static prompting strategies on knowledge-intensive tasks.
Robustness: Maintains strong performance on out-of-distribution and few-shot scenarios.
Efficiency: Learns effective policies with limited computational resources.

šŸ† Main Results (GPT-4o)

Method Metaphor TruthfulQA MMLU (Math) LogiQA Average
CoT 50.40 78.40 76.04 70.00 68.71
Think Short 61.00 64.81 68.52 70.81 66.28
ToT 48.25 74.29 86.11 73.90 70.91
Best-of-N 52.60 79.41 83.41 72.37 71.95
Auto-CoT 62.33 83.09 72.15 71.71 72.32
In-context CoT 53.98 77.04 83.63 80.04 74.42
AdaReasoner 71.56 81.30 86.49 82.31 80.42

Team & Acknowledgements

Authors
Xiangqi Wang1 Yue Huang1 Yanbo Wang2 Xiaonan Luo1 Kehan Guo1 Yujun Zhou1 Xiangliang Zhang1

* Equal Contribution

1 University of Notre Dame
2 Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)

Corresponding Author: xzhang33@nd.edu (Xiangliang Zhang)

University of Notre Dame

University of Notre Dame

Mohamed bin Zayed University of Artificial Intelligence

MBZUAI