Friedman, Daniel and Duede, Eamon (2026) Interpretability (Propositional and Mechanistic) Needs Behavior. [Preprint]
|
Text
Interpretability_Propositional_and_Mechanistic_Needs_Behavior.pdf Download (192kB) |
Abstract
Prevailing approaches to interpreting large language models (LLMs) risk addressing the field's central questions at the wrong level of analysis. As LLMs develop, researchers have turned to ``under-the-hood'' methods to investigate whether LLMs possess states analogous to beliefs, desires, or intentions. These methods typically map internal representations, feature directions, or neural circuits onto folk-psychological categories. We argue that these methods are too fragile to reap the results we need.
Instead, interpretability ought to be reoriented toward and grounded in what we call ``ecologically sensitive behavioral analyses'' which integrates mechanistic tools with severe testing of behavioral signatures across ecological contexts, aligning interpretability efforts with emerging evaluation sciences for AI. We outline a multi-disciplinary trajectory for developing interpretability methods that are both philosophically coherent and practically relevant for the deployment of increasingly capable LLMs.
| Export/Citation: | EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL |
| Social Networking: |
| Item Type: | Preprint | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Creators: |
|
|||||||||
| Keywords: | large language models, mechanistic interpretability, propositional interpretability, ecologically sensitive behavioral analysis, folk psychology, mental state attribution, LLM alignment, AI evaluation, confabulation, behavioral science, philosophy of mind, ecological validity | |||||||||
| Subjects: | Specific Sciences > Cognitive Science Specific Sciences > Artificial Intelligence Specific Sciences > Neuroscience Specific Sciences > Psychology General Issues > Technology |
|||||||||
| Depositing User: | Dr. Eamon Duede | |||||||||
| Date Deposited: | 08 May 2026 11:44 | |||||||||
| Last Modified: | 08 May 2026 11:44 | |||||||||
| Item ID: | 29538 | |||||||||
| Subjects: | Specific Sciences > Cognitive Science Specific Sciences > Artificial Intelligence Specific Sciences > Neuroscience Specific Sciences > Psychology General Issues > Technology |
|||||||||
| Date: | 5 May 2026 | |||||||||
| URI: | https://philsci-archive.pitt.edu/id/eprint/29538 |
Monthly Views for the past 3 years
Monthly Downloads for the past 3 years
Plum Analytics
Actions (login required)
![]() |
View Item |



