Measuring What AI Systems Might Do: Towards A Measurement Science in AI

Voudouris, Konstantinos and Thalmann, Mirko and Kipnis, Alex and Hernández-Orallo, José and Schulz, Eric (2026) Measuring What AI Systems Might Do: Towards A Measurement Science in AI. [Preprint]

Text
Capabilities___Propensities_in_AI.pdf
Download (298kB)

Abstract

Scientists, policy-makers, business leaders, and members of the public care about what modern artificial intelligence systems are disposed to do. Yet terms such as capabilities, propensities, skills, values, and abilities are routinely used interchangeably and conflated with observable performance, with AI evaluation practices rarely specifying what quantity they purport to measure. We argue that capabilities and propensities are dispositional properties---stable features of systems characterised by counterfactual relationships between contextual conditions and behavioural outputs. Measuring a disposition requires (i) hypothesising which contextual properties are causally relevant, (ii) independently operationalising and measuring those properties, and (iii) empirically mapping how variation in those properties affects the probability of the behaviour. Dominant approaches to AI evaluation, from benchmark averages to data‑driven latent‑variable models such as Item Response Theory, bypass these steps entirely. Building on ideas from philosophy of science, measurement theory, and cognitive science, we develop a principled account of AI capabilities and propensities as dispositions, show why prevailing evaluation practices fail to measure them, and outline what disposition‑respecting, scientifically defensible AI evaluation would require.

Export/Citation:

Social Networking:

Share |

Item Type:

Preprint

Creators:

Creators	Email	ORCID
Voudouris, Konstantinos	kv301@srcf.net	0000-0001-8453-3557
Thalmann, Mirko
Kipnis, Alex
Hernández-Orallo, José
Schulz, Eric

Keywords:

AI Evaluation, Capabilities, Propensities, Measurement, Dispositions

Subjects:

Specific Sciences > Artificial Intelligence
General Issues > Experimentation
General Issues > Theory/Observation

Depositing User:

Dr Konstantinos Voudouris

Date Deposited:

13 Feb 2026 13:36

Last Modified:

13 Feb 2026 13:36

Item ID:

28232

Subjects:

Specific Sciences > Artificial Intelligence
General Issues > Experimentation
General Issues > Theory/Observation

Date:

February 2026

URI:

https://philsci-archive.pitt.edu/id/eprint/28232

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item

Search & Browse

Information

Measuring What AI Systems Might Do: Towards A Measurement Science in AI

Abstract

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

ULS D-Scribe

E-Prints

Share

Feeds

Get Alerts for All New Posts