PhilSci Archive

A Mechanistic Explanatory Strategy for XAI

Rabiza, Marcin (2024) A Mechanistic Explanatory Strategy for XAI. [Preprint]

This is the latest version of this item.

[img] Text
Rabiza - Mechanistic XAI v4 (preprint).pdf

Download (557kB)

Abstract

Despite significant advancements in XAI, scholars continue to note a persistent lack of robust conceptual foundations and integration with broader discourse on scientific explanation. In response, emerging XAI research increasingly draws on explanatory strategies from various scientific disciplines and the philosophy of science to address these gaps. This paper outlines a mechanistic strategy for explaining the functional organization of deep learning systems, situating recent developments in AI explainability within a broader philosophical context. According to the mechanistic approach, explaining opaque AI systems involves identifying the mechanisms underlying decision-making processes. For deep neural networks, this means discerning functionally relevant components — such as neurons, layers, circuits, or activation patterns — and understanding their roles through decomposition, localization, and recomposition. Proof-of-principle case studies from image recognition and language modeling align this theoretical framework with recent research from OpenAI and Anthropic. The findings suggest that pursuing mechanistic explanations can uncover elements that traditional explainability techniques may overlook, ultimately contributing to more thoroughly explainable AI.


Export/Citation: EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL
Social Networking:
Share |

Item Type: Preprint
Creators:
CreatorsEmailORCID
Rabiza, Marcinmarcin.rabiza@gssr.edu.pl0000-0001-6217-6149
Additional Information: Forthcoming in Müller, V. C., Dung, L., Löhr, G., & Rumana, A. (Eds.). Philosophy of Artificial Intelligence: The State of the Art, Synthese Library, Springer Nature. Please cite the published version.
Keywords: black box problem, explainable artificial intelligence (XAI), explainability, interpretability, mechanisms, mechanistic explanation, mechanistic interpretability, new mechanism
Subjects: Specific Sciences > Artificial Intelligence > AI and Ethics
General Issues > Causation
Specific Sciences > Cognitive Science > Computation
Specific Sciences > Computer Science
Specific Sciences > Artificial Intelligence
General Issues > Explanation
Specific Sciences > Artificial Intelligence > Machine Learning
General Issues > Models and Idealization
Depositing User: Mr. Marcin Rabiza
Date Deposited: 25 Mar 2025 13:41
Last Modified: 25 Mar 2025 13:41
Item ID: 24949
Subjects: Specific Sciences > Artificial Intelligence > AI and Ethics
General Issues > Causation
Specific Sciences > Cognitive Science > Computation
Specific Sciences > Computer Science
Specific Sciences > Artificial Intelligence
General Issues > Explanation
Specific Sciences > Artificial Intelligence > Machine Learning
General Issues > Models and Idealization
Date: 2 November 2024
URI: https://philsci-archive.pitt.edu/id/eprint/24949

Available Versions of this Item

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item