PhilSci Archive

Explaining AI Through Mechanistic Interpretability

Kästner, Lena and Crook, Barnaby (2023) Explaining AI Through Mechanistic Interpretability. [Preprint]

[img]
Preview
Text
Explaining AI through Mechanistic Interpretability_final_v2.pdf

Download (382kB) | Preview

Abstract

Recent work in explainable artificial intelligence (XAI) attempts to render opaque AI systems understandable through a divide-and-conquer strategy. However, this fails to illuminate how trained AI systems work as a whole. Precisely this kind of functional understanding is needed, though, to satisfy important societal desiderata such as safety. To remedy this situation, we argue, AI researchers should seek mechanistic interpretability, viz. apply coordinated discovery strategies familiar from the life sciences to uncover the functional organisation of complex AI systems. Additionally, theorists should accommodate for the unique costs and benefits of such strategies in their portrayals of XAI research.


Export/Citation: EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL
Social Networking:
Share |

Item Type: Preprint
Creators:
CreatorsEmailORCID
Kästner, Lenamail@lenakaestner.de0000-0002-8747-6911
Crook, Barnaby
Keywords: AI, ANN, deep learning, discovery, explanation, mechanistic interpretability, XAI
Subjects: General Issues > Data
Specific Sciences > Cognitive Science > Computation
Specific Sciences > Computer Science
General Issues > Explanation
General Issues > Models and Idealization
Specific Sciences > Neuroscience
General Issues > Philosophers of Science
Depositing User: Dr. Lena Kästner
Date Deposited: 08 Nov 2023 18:37
Last Modified: 08 Nov 2023 18:37
Item ID: 22747
Subjects: General Issues > Data
Specific Sciences > Cognitive Science > Computation
Specific Sciences > Computer Science
General Issues > Explanation
General Issues > Models and Idealization
Specific Sciences > Neuroscience
General Issues > Philosophers of Science
Date: 3 November 2023
URI: https://philsci-archive.pitt.edu/id/eprint/22747

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item