PhilSci Archive

Simulated Selfhood in LLMs: A Behavioral Analysis of Introspective Coherence

de Lima Prestes, José Augusto (2025) Simulated Selfhood in LLMs: A Behavioral Analysis of Introspective Coherence. [Preprint]

[img] Text
Simulated_Selfhood_in_LLMs__Preprint_Version.pdf

Download (336kB)

Abstract

Large Language Models (LLMs) increasingly produce outputs that resemble introspection, including self-reference, epistemic modulation, and claims about internal states. This study investigates whether such behaviors display consistent patterns across repeated prompts or reflect surface-level generative artifacts. We evaluated five open-weight, stateless LLMs using a structured battery of 21 introspective prompts, each repeated ten times, yielding 1,050 completions. These outputs are analyzed across three behavioral dimensions: surface-level similarity (via token overlap), semantic coherence (via sentence embeddings), and inferential consistency (via natural language inference). Although some models demonstrate localized thematic stability—especially in identity - and consciousness-related prompts—none sustain diachronic coherence. High rates of contradiction are observed, often arising from tensions between mechanistic disclaimers and anthropomorphic phrasing. We introduce the concept of pseudo-consciousness to describe structured but non-experiential self-referential output. Based on Dennett’s intentional stance, our analysis avoids ontological claims and instead focuses on behavioral regularities. The study contributes a reproducible framework for evaluating simulated introspection in LLMs and offers a graded taxonomy for classifying self-referential output. Our LLM findings have implications for interpretability, alignment, and user perception, highlighting the need for caution in attributing mental states to stateless generative systems based solely on linguistic fluency.


Export/Citation: EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL
Social Networking:
Share |

Item Type: Preprint
Creators:
CreatorsEmailORCID
de Lima Prestes, José Augustocontato@joseprestes.com0000-0001-8686-5360
Keywords: Large Language Models Introspective Simulation Pseudo-consciousness Self-reference Behavioral Evaluation Epistemic Modulation AI Alignment Philosophy of Cognitive Science Philosophy of Artificial Intelligence Explainability
Subjects: Specific Sciences > Computation/Information > Classical
Specific Sciences > Artificial Intelligence > AI and Ethics
Specific Sciences > Artificial Intelligence > Classical AI
Specific Sciences > Cognitive Science > Computation
Specific Sciences > Cognitive Science > Concepts and Representations
Specific Sciences > Cognitive Science > Consciousness
General Issues > Explanation
General Issues > Models and Idealization
Depositing User: José Augusto de Lima Prestes
Date Deposited: 02 Apr 2025 15:13
Last Modified: 02 Apr 2025 15:13
Item ID: 24988
Subjects: Specific Sciences > Computation/Information > Classical
Specific Sciences > Artificial Intelligence > AI and Ethics
Specific Sciences > Artificial Intelligence > Classical AI
Specific Sciences > Cognitive Science > Computation
Specific Sciences > Cognitive Science > Concepts and Representations
Specific Sciences > Cognitive Science > Consciousness
General Issues > Explanation
General Issues > Models and Idealization
Date: 1 April 2025
URI: https://philsci-archive.pitt.edu/id/eprint/24988

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item