PhilSci Archive

The Agnostic Structure of Data Science Methods

Napoletani, Domenico and Panza, Marco and Struppa, Daniele (2018) The Agnostic Structure of Data Science Methods. In: UNSPECIFIED.

This is the latest version of this item.

[img]
Preview
Text
TheAgnosticStructure.pdf - Updated Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (319kB) | Preview

Abstract

In this paper we argue that data science is a coherent approach to empirical problems that, in its most general form, does not build understanding about phenomena. We start by exploring the broad structure of mathematization methods in data science, organized around the belief that if enough and sufficiently diverse data are collected regarding a certain phenomenon, it is possible to answer all relevant questions about it. We call this belief `the microarray paradigm’ and the approach to empirical phenomena based on it `agnostic science'. Not all computational methods dealing with large data sets are properly within the domain of agnostic science, and we give an example of an algorithm, PageRank, that relies on large data processing, but such that the significance of its output is readily intelligible. Within the new type of mathematization at work in agnostic science, mathematical methods are not selected because of any particular relevance for a problem at hand. Rather, mathematical methods are applied to a specific problem only on the basis of their ability to reorganize the data for further analysis and the intrinsic richness of their mathematical structure. We refer to this type of mathematization as `forcing’. We then show that optimization methods are used in data science by forcing them on problems. This is particularly significant since virtually all methods of data science can be reinterpreted as types of optimization methods. In particular, we argue that deep learning neural networks are best understood within the context of forcing optimality. We finally explore the broader question of the appropriateness of data science methods in solving problems. We argue that this question should not be interpreted as a search for a correspondence between phenomena and specific solutions found by data science methods. Rather, it is the internal structure of data science methods that is open to forms of understanding. As an example, we offer an analysis of ensemble methods, where distinct data science methods are combined in the search for the solution of a problem, and we speculate on the general structure of the data sets that are most appropriate for such methods.


Export/Citation: EndNote | BibTeX | Dublin Core | ASCII/Text Citation (Chicago) | HTML Citation | OpenURL
Social Networking:
Share |

Item Type: Conference or Workshop Item (UNSPECIFIED)
Creators:
CreatorsEmailORCID
Napoletani, Domenico
Panza, Marcopanzam10@gmail.com0000-0003-4131-7103
Struppa, Daniele
Additional Information: To appear in Lato Sensu, revue de la Société de philosophie des sciences, Société de philosophie des sciences.
Keywords: Data Analysis, Agnostic Sciences, Machine Learning
Subjects: General Issues > Data
Specific Sciences > Mathematics > Applicability
Depositing User: Marco Panza
Date Deposited: 11 Feb 2021 15:34
Last Modified: 11 Feb 2021 15:34
Item ID: 18707
Subjects: General Issues > Data
Specific Sciences > Mathematics > Applicability
Date: November 2018
URI: https://philsci-archive.pitt.edu/id/eprint/18707

Available Versions of this Item

Monthly Views for the past 3 years

Monthly Downloads for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item