Abstract

Retrieval-Augmented Generation grounds language models in external evidence, but common exact-match and judge-based evaluations often cannot tell whether a model truly used the retrieved context or simply recalled an answer from parametric memory. This preprint introduces Normalized Context Utilization (NCU), a length-normalized token log-probability metric that compares zero-shot, oracle, noisy, and adversarial conditions to quantify contextual information gain. Across question-answering domains and models from 1.5B to 72B parameters, alongside a proprietary commercial API, the study finds severe diminishing returns for strict factual extraction: small language models can match larger systems in context utilization while showing stronger context adherence and lower latency.

Full Paper

Open in a new tab

Quantifying Prior Dominance in RAG Systems

Abstract

Full Paper