de Lima Prestes, José Augusto (2026) Simulated Selfhood in LLMs: A Behavioral Analysis of Introspective Coherence. [Preprint]
This is the latest version of this item.
|
Text
Prestes_Simulated_Selfhood_in_LLMs__Preprint_Version___v4_.pdf - Draft Version Available under License Creative Commons Attribution. Download (353kB) |
Abstract
Large Language Models (LLMs) increasingly generate outputs that resemble introspection, including self-reference, epistemic modulation, and claims about their internal states. This study investigates whether such behaviors reflect stable underlying patterns or merely surface-level generative artifacts. We evaluated five open-weight, stateless LLMs using a structured battery of 21 introspective prompts. The main corpus comprised 1,050 completions collected under a baseline decoding condition (temperature = 0.7), supplemented by 2,100 additional completions generated under matched temperature conditions (temperature = 0.2 and 1.0), for a total of 3,150 completions. Outputs were analyzed across four behavioral dimensions: surface-level similarity (token overlap via SequenceMatcher), semantic coherence (Sentence-BERT embeddings), inferential consistency (Natural Language Inference with a RoBERTa-large model), and diachronic continuity (stability across prompt repetitions). Construct validity was further examined through a human-evaluation layer in which 10 annotators rated 80 selected response pairs drawn from the same prompt battery on a 5-point consistency scale. The annotation task was perceived as low-to-moderate in difficulty (mean self-reported difficulty = 2.6/5). Inter-rater agreement was moderate by Krippendorff's Alpha for ordinal ratings (Alpha = 0.553), while reliability was moderate at the single-rater level and strong for aggregated ratings by intraclass correlation (ICC(2,1) = 0.564; ICC(2,k) = 0.928). The human-evaluation layer showed that lexical overlap and embedding-based semantic similarity were weak proxies for perceived self-referential consistency, whereas NLI-based indicators tracked mean human ratings much more closely. Across the matched temperature conditions, lower temperature generally increased semantic and diachronic stability, whereas higher temperature tended to increase drift and reduce coherence, though the pattern was not perfectly monotonic across all models or metrics. We therefore interpret apparent self-referential stability in stateless LLMs as conditional and fragile rather than robustly stable across generation regimes. Following recent behavioral frameworks, we heuristically adopt the term pseudo-consciousness to describe structured yet non-experiential self-referential output in LLMs. This usage reflects a functionalist stance that avoids ontological commitments, focusing instead on behavioral regularities interpretable through Dennett's intentional stance. The study contributes a reproducible behavioral framework, complemented by human validation and a matched decoding-temperature sensitivity analysis, for evaluating simulated introspection in LLMs. Our findings carry implications for interpretability, alignment, and user perception, highlighting the need for caution when attributing mental states to stateless generative systems based on linguistic fluency alone.
| Export/Citation: | EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL |
| Social Networking: |
Available Versions of this Item
-
Simulated Selfhood in LLMs: A Behavioral Analysis of Introspective Coherence. (deposited 02 Apr 2025 15:13)
-
Simulated Selfhood in LLMs: A Behavioral Analysis of Introspective Coherence. (deposited 21 Sep 2025 11:13)
- Simulated Selfhood in LLMs: A Behavioral Analysis of Introspective Coherence. (deposited 08 Apr 2026 12:26) [Currently Displayed]
-
Simulated Selfhood in LLMs: A Behavioral Analysis of Introspective Coherence. (deposited 21 Sep 2025 11:13)
Monthly Views for the past 3 years
Monthly Downloads for the past 3 years
Plum Analytics
Actions (login required)
![]() |
View Item |



