Rabiza, Marcin
(2024)
A Mechanistic Explanatory Strategy for XAI.
[Preprint]
Abstract
Despite significant advancements in XAI, scholars note a persistent lack of solid conceptual foundations and integration with broader discourse on scientific explanation. In response, emerging XAI research draws on explanatory strategies from various sciences and philosophy of science literature to fill these gaps. This paper outlines a mechanistic strategy for explaining the functional organization of deep learning systems, situating recent advancements in AI explainability within a broader philosophical context. According to the mechanistic approach, the explanation of opaque AI systems involves identifying mechanisms that drive decision-making. For deep neural networks, this means discerning functionally relevant components — such as neurons, layers, circuits, or activation patterns — and understanding their roles through decomposition, localization, and recomposition. Proof-of-principle case studies from image recognition and language modeling align these theoretical approaches with the latest research from AI labs like OpenAI and Anthropic. The paper suggests that such a systematic approach to studying model organization can reveal elements that individual explainability techniques might miss, fostering more thoroughly explainable AI.
Item Type: |
Preprint
|
Creators: |
|
Additional Information: |
Forthcoming in Müller, V. C., Dung, L., Löhr, G., & Rumana, A. (Eds.). Philosophy of Artificial Intelligence: The
State of the Art, Synthese Library, Springer Nature. Please cite the published version. |
Keywords: |
black box problem, explainable artificial intelligence (XAI), explainability, interpretability, mechanisms, mechanistic explanation, mechanistic interpretability, new mechanism |
Subjects: |
Specific Sciences > Artificial Intelligence > AI and Ethics General Issues > Causation Specific Sciences > Cognitive Science > Computation Specific Sciences > Computer Science Specific Sciences > Artificial Intelligence General Issues > Explanation Specific Sciences > Artificial Intelligence > Machine Learning General Issues > Models and Idealization |
Depositing User: |
Mr. Marcin Rabiza
|
Date Deposited: |
24 Mar 2025 13:37 |
Last Modified: |
24 Mar 2025 13:37 |
Item ID: |
24939 |
Subjects: |
Specific Sciences > Artificial Intelligence > AI and Ethics General Issues > Causation Specific Sciences > Cognitive Science > Computation Specific Sciences > Computer Science Specific Sciences > Artificial Intelligence General Issues > Explanation Specific Sciences > Artificial Intelligence > Machine Learning General Issues > Models and Idealization |
Date: |
2 November 2024 |
URI: |
https://philsci-archive.pitt.edu/id/eprint/24939 |
Available Versions of this Item
Monthly Views for the past 3 years
Monthly Downloads for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |