Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks

Freiesleben, Timo (2026) Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks. In: UNSPECIFIED.

Text
Construct_Validity_PSA.pdf - Accepted Version
Available under License Creative Commons Attribution No Derivatives.
Download (249kB)

Abstract

Recent work in machine learning increasingly attributes human-like capabilities, such as reasoning, to large language models (LLMs) based on benchmark performance. This paper examines that practice through the lens of construct validity, understood as the challenge of linking theoretical capabilities to their empirical measurement. It compares three influential frameworks, the nomological, inferential, and causal accounts, and argues that the nomological account provides the most suitable foundation for LLM capability research. The nomological account avoids the strong ontological assumptions of the causal account while offering a more substantive framework for defining construct meaning than the inferential account.

Export/Citation:

Social Networking:

Share |

Item Type:

Conference or Workshop Item (UNSPECIFIED)

Creators:

Creators	Email	ORCID
Freiesleben, Timo	timo.freiesleben@lmu.de	0000-0003-1338-3293

Keywords:

Construct Validity; Measurement; Benchmarking; Machine Learning; Large Language Models

Subjects:

Specific Sciences > Artificial Intelligence
Specific Sciences > Artificial Intelligence > Machine Learning
Specific Sciences > Psychology

Depositing User:

Dr. Timo Freiesleben

Date Deposited:

29 May 2026 12:40

Last Modified:

29 May 2026 12:40

Item ID:

29807

Subjects:

Specific Sciences > Artificial Intelligence
Specific Sciences > Artificial Intelligence > Machine Learning
Specific Sciences > Psychology

Date:

29 May 2026

URI:

https://philsci-archive.pitt.edu/id/eprint/29807

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item

Search & Browse

Information

Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks

Abstract

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

ULS D-Scribe

E-Prints

Share

Feeds

Get Alerts for All New Posts