Off-Switching Not Guaranteed

Neth, Sven (2025) Off-Switching Not Guaranteed. [Preprint]

Text
no-off-switch.pdf
Download (342kB)

Abstract

Hadfield-Menell et al. (2017) propose the Off-Switch Game, a model of Human-AI cooperation in which AI agents always defer to humans because they are uncertain about our preferences. I explain two reasons why AI agents might not defer. First, AI agents might not value learning. Second, even if AI agents value learning, they might not be certain to learn our actual preferences.

Export/Citation:

Social Networking:

Share |

Item Type:

Preprint

Creators:

Creators	Email	ORCID
Neth, Sven	sven.neth@pitt.edu	0000-0003-4275-7581

Additional Information:

Forthcoming in Philosophical Studies

Keywords:

Artificial Intelligence, Decision Theory, Value of Information, Off-Switch Game, Cooperative Inverse Reinforcement Learning

Subjects:

Specific Sciences > Artificial Intelligence > AI and Ethics
Specific Sciences > Artificial Intelligence
General Issues > Decision Theory
General Issues > Game Theory
Specific Sciences > Probability/Statistics

Depositing User:

Sven Neth

Date Deposited:

14 Feb 2025 14:26

Last Modified:

14 Feb 2025 14:26

Item ID:

24740

Subjects:

Date:

12 February 2025

URI:

https://philsci-archive.pitt.edu/id/eprint/24740

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item

Search & Browse

Information

Off-Switching Not Guaranteed

Abstract

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

ULS D-Scribe

E-Prints

Share

Feeds

Get Alerts for All New Posts