Neth, Sven (2025) Off-Switching Not Guaranteed. [Preprint]
![]() |
Text
no-off-switch.pdf Download (342kB) |
Abstract
Hadfield-Menell et al. (2017) propose the Off-Switch Game, a model of Human-AI cooperation in which AI agents always defer to humans because they are uncertain about our preferences. I explain two reasons why AI agents might not defer. First, AI agents might not value learning. Second, even if AI agents value learning, they might not be certain to learn our actual preferences.
Export/Citation: | EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL |
Social Networking: |
Item Type: | Preprint | ||||||
---|---|---|---|---|---|---|---|
Creators: |
|
||||||
Additional Information: | Forthcoming in Philosophical Studies | ||||||
Keywords: | Artificial Intelligence, Decision Theory, Value of Information, Off-Switch Game, Cooperative Inverse Reinforcement Learning | ||||||
Subjects: | Specific Sciences > Artificial Intelligence > AI and Ethics Specific Sciences > Artificial Intelligence General Issues > Decision Theory General Issues > Game Theory Specific Sciences > Probability/Statistics |
||||||
Depositing User: | Sven Neth | ||||||
Date Deposited: | 14 Feb 2025 14:26 | ||||||
Last Modified: | 14 Feb 2025 14:26 | ||||||
Item ID: | 24740 | ||||||
Subjects: | Specific Sciences > Artificial Intelligence > AI and Ethics Specific Sciences > Artificial Intelligence General Issues > Decision Theory General Issues > Game Theory Specific Sciences > Probability/Statistics |
||||||
Date: | 12 February 2025 | ||||||
URI: | https://philsci-archive.pitt.edu/id/eprint/24740 |
Monthly Views for the past 3 years
Monthly Downloads for the past 3 years
Plum Analytics
Actions (login required)
![]() |
View Item |