PhilSci Archive

Interpretability (Propositional and Mechanistic) Needs Behavior

Friedman, Daniel and Duede, Eamon (2026) Interpretability (Propositional and Mechanistic) Needs Behavior. [Preprint]

[img] Text
Interpretability_Propositional_and_Mechanistic_Needs_Behavior.pdf

Download (192kB)

Abstract

Prevailing approaches to interpreting large language models (LLMs) risk addressing the field's central questions at the wrong level of analysis. As LLMs develop, researchers have turned to ``under-the-hood'' methods to investigate whether LLMs possess states analogous to beliefs, desires, or intentions. These methods typically map internal representations, feature directions, or neural circuits onto folk-psychological categories. We argue that these methods are too fragile to reap the results we need.
Instead, interpretability ought to be reoriented toward and grounded in what we call ``ecologically sensitive behavioral analyses'' which integrates mechanistic tools with severe testing of behavioral signatures across ecological contexts, aligning interpretability efforts with emerging evaluation sciences for AI. We outline a multi-disciplinary trajectory for developing interpretability methods that are both philosophically coherent and practically relevant for the deployment of increasingly capable LLMs.


Export/Citation: EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL
Social Networking:
Share |

Item Type: Preprint
Creators:
CreatorsEmailORCID
Friedman, Danieldcfriedm@purdue.edu
Duede, Eamoneduede@purdue.edu0000-0002-3592-0478
Keywords: large language models, mechanistic interpretability, propositional interpretability, ecologically sensitive behavioral analysis, folk psychology, mental state attribution, LLM alignment, AI evaluation, confabulation, behavioral science, philosophy of mind, ecological validity
Subjects: Specific Sciences > Cognitive Science
Specific Sciences > Artificial Intelligence
Specific Sciences > Neuroscience
Specific Sciences > Psychology
General Issues > Technology
Depositing User: Dr. Eamon Duede
Date Deposited: 08 May 2026 11:44
Last Modified: 08 May 2026 11:44
Item ID: 29538
Subjects: Specific Sciences > Cognitive Science
Specific Sciences > Artificial Intelligence
Specific Sciences > Neuroscience
Specific Sciences > Psychology
General Issues > Technology
Date: 5 May 2026
URI: https://philsci-archive.pitt.edu/id/eprint/29538

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item