AdaBoosting Text Prompts for Vision-Language Models

Jin, Seokhee; Sung, Changhwan; Mun, Sunung; Kim, Hoyoung; Ok, Jungseul

AdaBoosting Text Prompts
for Vision-Language Models

Seokhee Jin^1,*, Changhwan Sung^2,*, Sunung Mun², Hoyoung Kim³, Jungseul Ok^2,†

KT Corporation¹
POSTECH²
National AI Research Lab³
^*Equal contribution. ^†Corresponding author.

Abstract

The classification accuracy of pretrained Vision-Language Models (VLMs) relies on the quality of the text prompts. Handcrafted templates and Large Language Model (LLM)-generated descriptions not only make predictions more interpretable, but also enable reuse of the same prompts across heterogeneous VLMs. Recent works construct task-adapted text prompts with a small number of labeled images. However, existing few-shot text prompting methods do not explicitly focus on misclassified examples during prompt construction, leading to only marginal improvements even as more shots become available. To fully exploit few-shot supervision, we propose Text Prompt Boosting (TPB), an AdaBoost-inspired framework that treats each text-prompt-based classifier as a weak learner and sequentially aggregates them into a strong ensemble by explicitly targeting hard, misclassified examples. Extensive experiments show that TPB preserves task-intrinsic, model-agnostic cues in text space, enabling robust cross-model transfer. Across eleven classification benchmarks, TPB improves accuracy on the source model and preserves shot-driven gains when transferred to larger, more capable VLMs, where existing methods struggle to sustain such improvements.

Method Overview

Text Prompt Boosting treats each prompt-based classifier as a weak learner. At every boosting round, TPB reweights misclassified few-shot examples, selects a new prompt bank with Greedy Prompt Composition, and aggregates all weak classifiers into a final strong classifier.

Overview of the Text Prompt Boosting framework — TPB repeatedly focuses on hard examples and builds a strong natural-language prompt ensemble.

Shot Scalability

Unlike prior text-prompting methods that quickly saturate, TPB continues to benefit from additional few-shot supervision.

Transfer Robustness

Because TPB keeps adaptation in natural-language prompt space, the learned prompt ensemble can be re-embedded and transferred to larger heterogeneous VLMs without model-specific tuning.

AdaBoosting Text Prompts
for Vision-Language Models

Abstract

Method Overview

Shot Scalability

Transfer Robustness

Presentation Video

Poster

BibTeX

AdaBoosting Text Promptsfor Vision-Language Models

Abstract

Method Overview

Shot Scalability

Transfer Robustness

Presentation Video

Poster

BibTeX

AdaBoosting Text Prompts
for Vision-Language Models