Zhang, Jianqiu (2024) What is Lacking in Sora and V-JEPA’s World Models? -A Philosophical Analysis of Video AIs Through the Theory of Productive Imagination. [Preprint]
|
Text
SoraVJEPA.pdf Available under License Creative Commons Attribution. Download (1MB) | Preview |
Abstract
Sora from Open AI has shown exceptional performance, yet it faces scrutiny over whether its technological prowess equates to an authentic comprehension of reality. Critics contend that it lacks a foundational grasp of the world, a deficiency V-JEPA from Meta aims to amend with its joint embedding approach. This debate is vital for steering the future direction of Artificial General Intelligence(AGI). We enrich this debate by developing a theory of productive imagination that generates a coherent world model based on Kantian philosophy. We identify three indispensable components of the coherent world model capable of genuine world understanding: representations of isolated objects, an a priori law of change across space and time, and Kantian categories. Our analysis reveals that Sora is limited because of its oversight of the a priori law of change and Kantian categories, flaws that are not rectifiable through scaling up the training. V-JEPA learns the context-dependent aspect of the a priori law of change. Yet it fails to fully comprehend Kantian categories and incorporate experience, leading us to conclude that neither system currently achieves a comprehensive world understanding. Nevertheless, each system has developed components essential to advancing an integrated AI productive imagination-understanding engine. Finally, we propose an innovative training framework for an AI productive imagination-understanding engine, centered around a joint embedding system designed to transform disordered perceptual input into a structured, coherent world model. Our philosophical analysis pinpoints critical challenges within contemporary video AI technologies and a pathway toward achieving an AI system capable of genuine world understanding, such that it can be applied for reasoning and planning in the future.
Export/Citation: | EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL |
Social Networking: |
Item Type: | Preprint | ||||||
---|---|---|---|---|---|---|---|
Creators: |
|
||||||
Keywords: | Video generative AIs, Video AIs, Kant Philosophy, Productive Imagination, World Models, Real World Models, Sora, V-JEPA, World Simulators, Artificial General Intelligence, AGI | ||||||
Subjects: | Specific Sciences > Computer Science Specific Sciences > Artificial Intelligence Specific Sciences > Artificial Intelligence > Machine Learning Specific Sciences > Cognitive Science > Perception |
||||||
Depositing User: | Jianqiu Zhang | ||||||
Date Deposited: | 16 May 2024 11:00 | ||||||
Last Modified: | 16 May 2024 11:00 | ||||||
Item ID: | 23434 | ||||||
Subjects: | Specific Sciences > Computer Science Specific Sciences > Artificial Intelligence Specific Sciences > Artificial Intelligence > Machine Learning Specific Sciences > Cognitive Science > Perception |
||||||
Date: | 15 May 2024 | ||||||
URI: | https://philsci-archive.pitt.edu/id/eprint/23434 |
Monthly Views for the past 3 years
Monthly Downloads for the past 3 years
Plum Analytics
Actions (login required)
View Item |