PhilSci Archive

Off-Switching Not Guaranteed

Neth, Sven (2025) Off-Switching Not Guaranteed. [Preprint]

[img] Text
no-off-switch.pdf

Download (342kB)

Abstract

Hadfield-Menell et al. (2017) propose the Off-Switch Game, a model of Human-AI cooperation in which AI agents always defer to humans because they are uncertain about our preferences. I explain two reasons why AI agents might not defer. First, AI agents might not value learning. Second, even if AI agents value learning, they might not be certain to learn our actual preferences.


Export/Citation: EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL
Social Networking:
Share |

Item Type: Preprint
Creators:
CreatorsEmailORCID
Neth, Svensven.neth@pitt.edu0000-0003-4275-7581
Additional Information: Forthcoming in Philosophical Studies
Keywords: Artificial Intelligence, Decision Theory, Value of Information, Off-Switch Game, Cooperative Inverse Reinforcement Learning
Subjects: Specific Sciences > Artificial Intelligence > AI and Ethics
Specific Sciences > Artificial Intelligence
General Issues > Decision Theory
General Issues > Game Theory
Specific Sciences > Probability/Statistics
Depositing User: Sven Neth
Date Deposited: 14 Feb 2025 14:26
Last Modified: 14 Feb 2025 14:26
Item ID: 24740
Subjects: Specific Sciences > Artificial Intelligence > AI and Ethics
Specific Sciences > Artificial Intelligence
General Issues > Decision Theory
General Issues > Game Theory
Specific Sciences > Probability/Statistics
Date: 12 February 2025
URI: https://philsci-archive.pitt.edu/id/eprint/24740

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item