Rushing, Bruce (2024) AI Safety Collides with the Overattribution Bias. [Preprint]
|
Text
rushing_aisafetycollides_manuscript.pdf Download (379kB) | Preview |
Abstract
The field of Artificial Intelligence (AI) safety evaluations aims to test AI behavior for problematic capabilities like deception. However, some scientists have cautioned against the use of behavior to infer general cognitive abilities because of the human tendency to overattribute cognition to everything. They recommend the adoption of a heuristic to avoid these errors that states behavior provides no evidence for cognitive capabilities unless there is some theoretical feature present to justify that inference. We make that heuristic precise in terms of our credences's conditional independencies between behavior, cognitive capabilities, and the presence or absence of theoretical features. When made precise, the heuristic entails absurdly that failure at a behavioral task supports the presence of a theoretical feature. This is due to the heuristic suggesting inductive dependencies that conflict with our best causal models about cognition. Weakening this heuristic to allow only weak evidence between behavior and cognitive abilities leads to similar problems. Consequently, we suggest abandoning the heuristic and updating those causal models in light of the behavior observed when testing AIs for troublesome cognitive abilities.
Export/Citation: | EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL |
Social Networking: |
Item Type: | Preprint | ||||||
---|---|---|---|---|---|---|---|
Creators: |
|
||||||
Keywords: | AI safety, philosophy of machine learning, inductive bias, philosophy of AI, large language models | ||||||
Subjects: | Specific Sciences > Cognitive Science Specific Sciences > Psychology > Comparative Psychology and Ethology Specific Sciences > Artificial Intelligence General Issues > Evidence |
||||||
Depositing User: | Dr Bruce Rushing | ||||||
Date Deposited: | 26 Jun 2024 17:19 | ||||||
Last Modified: | 26 Jun 2024 17:19 | ||||||
Item ID: | 23623 | ||||||
Subjects: | Specific Sciences > Cognitive Science Specific Sciences > Psychology > Comparative Psychology and Ethology Specific Sciences > Artificial Intelligence General Issues > Evidence |
||||||
Date: | 1 March 2024 | ||||||
URI: | https://philsci-archive.pitt.edu/id/eprint/23623 |
Monthly Views for the past 3 years
Monthly Downloads for the past 3 years
Plum Analytics
Actions (login required)
View Item |