PhilSci Archive

Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks

Freiesleben, Timo (2026) Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks. In: UNSPECIFIED.

[img] Text
Construct_Validity_PSA.pdf - Accepted Version
Available under License Creative Commons Attribution No Derivatives.

Download (249kB)

Abstract

Recent work in machine learning increasingly attributes human-like capabilities, such as reasoning, to large language models (LLMs) based on benchmark performance. This paper examines that practice through the lens of construct validity, understood as the challenge of linking theoretical capabilities to their empirical measurement. It compares three influential frameworks, the nomological, inferential, and causal accounts, and argues that the nomological account provides the most suitable foundation for LLM capability research. The nomological account avoids the strong ontological assumptions of the causal account while offering a more substantive framework for defining construct meaning than the inferential account.


Export/Citation: EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL
Social Networking:
Share |

Item Type: Conference or Workshop Item (UNSPECIFIED)
Creators:
CreatorsEmailORCID
Freiesleben, Timotimo.freiesleben@lmu.de0000-0003-1338-3293
Keywords: Construct Validity; Measurement; Benchmarking; Machine Learning; Large Language Models
Subjects: Specific Sciences > Artificial Intelligence
Specific Sciences > Artificial Intelligence > Machine Learning
Specific Sciences > Psychology
Depositing User: Dr. Timo Freiesleben
Date Deposited: 29 May 2026 12:40
Last Modified: 29 May 2026 12:40
Item ID: 29807
Subjects: Specific Sciences > Artificial Intelligence
Specific Sciences > Artificial Intelligence > Machine Learning
Specific Sciences > Psychology
Date: 29 May 2026
URI: https://philsci-archive.pitt.edu/id/eprint/29807

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item