Papers
Topics
Authors
Recent
Search
2000 character limit reached

Problems in AI, their roots in philosophy, and implications for science and society

Published 22 Jul 2024 in cs.AI, cs.ET, and cs.HC | (2407.15671v1)

Abstract: AI is one of today's most relevant emergent technologies. In view thereof, this paper proposes that more attention should be paid to the philosophical aspects of AI technology and its use. It is argued that this deficit is generally combined with philosophical misconceptions about the growth of knowledge. To identify these misconceptions, reference is made to the ideas of the philosopher of science Karl Popper and the physicist David Deutsch. The works of both thinkers aim against mistaken theories of knowledge, such as inductivism, empiricism, and instrumentalism. This paper shows that these theories bear similarities to how current AI technology operates. It also shows that these theories are very much alive in the (public) discourse on AI, often called Bayesianism. In line with Popper and Deutsch, it is proposed that all these theories are based on mistaken philosophies of knowledge. This includes an analysis of the implications of these mistaken philosophies for the use of AI in science and society, including some of the likely problem situations that will arise. This paper finally provides a realistic outlook on AGI and three propositions on A(G)I and philosophy (i.e., epistemology).

Citations (1)

Summary

  • The paper critiques current AI methodologies by exposing the reliance on inductivism, empiricism, and Bayesian inference, arguing these hinder true scientific explanation.
  • It demonstrates how flawed philosophical doctrines misdirect AI’s capabilities, limiting its role in generating explanatory and innovative scientific knowledge.
  • The study urges policymakers to adopt rigorous epistemological standards to ensure AI functions as an instrumental support for human inquiry.

Detailed Critique of "Problems in AI, their roots in philosophy, and implications for science and society"

The paper "Problems in AI, their roots in philosophy, and implications for science and society" by M.J. Velthoven and E.J. Marcus examines the philosophical shortcomings surrounding AI technologies with a particular focus on epistemological issues as highlighted by Karl Popper and David Deutsch. By anchoring its arguments in critical rationalism, the paper challenges the philosophical foundations upon which many current AI methodologies operate.

Philosophical Examination of AI Technologies

The authors assert that current AI methodologies often align with mistaken philosophical doctrines such as inductivism, empiricism, and instrumentalism. These doctrines, critiqued by Popper and Deutsch, are argued to parallel how AI technologies attempt to deduce general rules from specific data instances—an approach deemed flawed in generating true knowledge.

Popper's falsification principle—which posits that scientific theories cannot be definitively proven but only refuted—serves as a key theoretical underpinning of the authors' critique. This principle opposes the foundational assumptions of many AI algorithms, which often adjust belief probabilities based on data instances, much like inductivist approaches. The paper posits that this methodological reliance casts AI in a role they term 'explanationless', unable to generate the type of new knowledge that human creativity and scientific inquiry strive for.

AI, Bayesianism, and Instrumentalism

A significant portion of the critique focuses on Bayesianism, which is utilized frequently in AI for probabilistic analysis and decision-making. By positioning Bayesian inference as a modern cousin to inductivism, the paper argues that AI systems leverage Bayesian approaches to assign or update probability distributions in a way similar to defective knowledge growth models.

Moreover, the authors argue that instrumentalism, which values results over process, manifests in AI development practices. In contexts like quantum theory and AI application, the focus on predictive modeling without in-depth understanding or deeper explanations illustrates an 'explanationless' stance, contrary to scientific inquiry emphasized by Popperian epistemology.

Implications for Science and Society

The paper illuminates the risks posed by uncritical acceptance of AI technologies based on these flawed epistemologies. By treating AI as a mere tool rather than a creator of knowledge, the authors caution against misinterpretations in scientific and policy frameworks. They urge policymakers to recognize the instrumental nature of AI—focusing on its role as an aid, not a replacement, for human intellect—and to implement practices ensuring AI application aligns with a deep understanding of its philosophical limitations.

Challenges of Achieving AGI

In examining AGI, the authors argue that current trends in AI development, underpinned by mistaken philosophical premises, mislead expectations around reaching AGI. They claim that without breakthroughs in philosophical understanding—particularly regarding how explanations are generated and critiqued—AGI remains an unattainable goal. The authors argue that this necessitates a reevaluation of philosophical perspectives in AI research to lay the groundwork for potentially achieving AGI in the future, thus challenging arguments that posit sheer increases in computational power and data availability will suffice.

Conclusion

The paper offers a critical perspective on AI technologies through the lens of epistemological analysis. By aligning AI's functioning with flawed philosophical methods, the authors underscore the importance of philosophical awareness in the development, application, and governance of AI technologies. The authors propose that maintaining human oversight and responsibility is crucial and that AI’s true potential lies in its role as an instrumental support—rather than a replacement—in the expansion of human knowledge and society.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, consolidated list of what remains missing, uncertain, or unexplored in the paper, framed to be concrete and actionable for future research.

  • Operational definition gap: The paper does not define “explanation,” “new knowledge,” or “creativity” in precise, testable terms; a formal taxonomy and measurable criteria for explanatory knowledge are needed.
  • Empirical evidence gap: Claims that current AI cannot generate explanations or novel knowledge are not supported by systematic empirical studies; benchmarked investigations (e.g., symbolic regression, scientific discovery systems, mechanistic hypothesis generation) are needed to test these claims.
  • Formalization gap: The Popperian conjecture–criticism cycle is not formalized computationally; research is needed to specify algorithms, data structures, and workflows that implement conjecture generation and automated criticism.
  • Bayesianism vs. inductivism conflation: The paper treats Bayesian methods as essentially inductivist without a rigorous comparative analysis; a formal treatment distinguishing (or equating) Bayesian epistemology with inductivism in AI practice is required.
  • Probability claim untested: The assertion that additional observations do not increase the probability of universal statements (Popper’s stance) is not reconciled with modern Bayesian statistics; mathematical proofs or counterexamples and simulations should be provided.
  • Out-of-distribution (OOD) generalization gap: The critique relies on OOD failure but does not engage with contemporary OOD, causal, and invariance literature; systematic assessment of whether these methods address the inductivist limitations is needed.
  • Causal representation learning gap: No evaluation of whether causal discovery, SCMs, invariant prediction, or counterfactual reasoning can advance toward explanatory AI; targeted studies should test their explanatory and falsification capacities.
  • Interpretability and mechanistic analysis gap: The paper asserts “black box” opacity without analyzing whether mechanistic interpretability, concept extraction, or causal attribution can yield genuine explanations; comparative studies across interpretability methods are needed.
  • Hybrid human–AI workflows gap: The proposal that humans must remain at the creative steering wheel is not operationalized; experimental protocols for human-in-the-loop conjecture/critique pipelines and their efficacy in generating explanatory knowledge are needed.
  • Creativity measurement gap: No metric or protocol is offered to assess whether AI outputs (alone or with human scaffolding) are novel, explanatory, or creative; develop standardized benchmarks (e.g., novelty, usefulness, mechanistic adequacy) and evaluation panels.
  • AGI requirement gap: The suggested necessity of a “philosophical breakthrough” for AGI is not translated into concrete research milestones or algorithmic components; articulate a research program with specific hypotheses, tasks, and success criteria.
  • AGI progress metrics gap: The claim that current AI progress moves away from AGI lacks measurable indicators; propose quantitative and qualitative metrics to track movement toward or away from universal explanatory capability.
  • Alternative epistemologies gap: The paper omits engagement with abductive inference, inference to the best explanation (IBE), Lakatosian research programs, error statistics (Mayo), pragmatism, and non-Western epistemologies; comparative analyses are needed to test AI’s alignment with these frameworks.
  • Counterexample analysis gap: No systematic review of domains where AI has allegedly discovered mechanisms or laws (e.g., SINDy, automated theorem proving, materials discovery) and whether these meet the paper’s criteria for explanation; curate and analyze such cases.
  • Policy guardrails gap: Recommendations for governance (accountability, explainability, non-discrimination) are high-level; design concrete mechanisms (auditing protocols, documentation standards, causal impact assessments, appeal processes) and test them in real deployments.
  • Legal cross-jurisdiction gap: The tax case is specific and anecdotal; conduct comparative legal analyses across jurisdictions and domains (healthcare, credit, criminal justice) with empirical audits of AI-driven discrimination and accountability failures.
  • Evidence base gap for claims of “instrumentalism dominance”: The assertion that instrumentalism dominates AI publishing and practice is not empirically substantiated; perform meta-analyses of publication criteria, citation practices, and evaluation standards.
  • Dataset bias mitigation gap: The paper diagnoses bias (e.g., scanner effects) without proposing or testing technical mitigations (re-weighting, debiasing, domain adaptation, causal adjustment); develop, implement, and evaluate mitigation pipelines.
  • Explainability obligations gap: The tension between black-box models and legal explainability is noted but not resolved; propose legally compliant explanation artifacts (counterfactuals, causal narratives, model cards) and validate with regulators and courts.
  • Measurement of harm gap: Discrimination and false dichotomies are discussed qualitatively; implement quantitative fairness/harm metrics and longitudinal monitoring frameworks applicable to public-sector AI.
  • Experimental design gap: No experiments are proposed to test Popperian/Deutschian claims within AI systems; design controlled studies where AI is tasked with conjecture generation and falsification under resource constraints and compare with human baselines.
  • LLM capability gap: The paper does not examine whether modern LLMs exhibit proto-explanatory behaviors (e.g., chain-of-thought, tool use, program synthesis) or whether these can be harnessed for explanation; empirical evaluations and scaffolding strategies are needed.
  • Definition and detection of “moving away from AGI” gap: The claim is metaphoric; specify detectable properties (e.g., overfitting to distributions, lack of counterfactual competence, failure to propose testable hypotheses) and measure them across model families.
  • Public discourse transparency gap: The call for explicit philosophical positions in AGI debates is not accompanied by templates or reporting standards; propose a “philosophy statement” framework for AI/AGI projects and conferences, and study its impact on discourse quality.

Practical Applications

Immediate Applications

The following items can be deployed now to align AI practice with the paper’s core insights (AI as instrument; human accountability; falsification over induction; vigilance about out‑of‑distribution risks and spurious correlations).

Industry

  • Human‑in‑the‑loop decision gates for high‑stakes AI
    • Sector: healthcare, finance, HR, tax/accounting, legal tech
    • Tools/workflows: approval workflows where model outputs require human critique and sign‑off; RACI matrices assigning responsibility; audit trails capturing rationale
    • Dependencies/assumptions: trained personnel; management buy‑in; process tooling (e.g., workflow software); acceptance of slower throughput in exchange for safety
  • Falsification‑first model evaluation suites
    • Sector: software/ML platforms, healthcare diagnostics, credit scoring, risk analytics
    • Tools/workflows: counterexample and stress‑test libraries, adversarial red‑teaming, “OOD probes” that deliberately violate learned correlations (e.g., dog-in-indoor scenes to break the “green grass = dog” heuristic)
    • Dependencies/assumptions: test data curation budget; domain experts to design critical tests; tolerance for discovering uncomfortable failure modes
  • Bias and spurious correlation audits prior to deployment
    • Sector: healthcare (scanner confounding), finance (demographic imbalances), retail (contextual spurious cues), computer vision
    • Tools/products: dataset cards; model cards with “known spurious cues”; bias dashboards; instrumentation to track feature correlations driving predictions
    • Dependencies/assumptions: access to raw data; legal ability to collect demographic metadata for fairness assessment; interpretability tooling
  • Out‑of‑distribution (OOD) monitoring and safe‑fallbacks
    • Sector: robotics, autonomous systems, fintech fraud detection, content moderation
    • Tools/workflows: OOD detectors; uncertainty thresholds; fail‑safe modes that escalate to humans; runtime data drift monitors
    • Dependencies/assumptions: reliable OOD/uncertainty estimation; clearly defined escalation protocols; operator capacity
  • Editorial “critique loops” for generative AI content creation
    • Sector: media, marketing, software documentation, education materials
    • Tools/workflows: structured rubrics for critique (coherence, originality, explanation quality), version control of human edits, prompt libraries focused on hypothesis and argumentation rather than mere style
    • Dependencies/assumptions: staff trained in epistemic critique; acceptance that AI drafts are starting points, not final outputs
  • Procurement standards that treat AI as a tool and assign human accountability
    • Sector: enterprise software, public sector software acquisition
    • Tools/workflows: contract clauses requiring human explainability of model‑influenced decisions; vendor deliverables (model cards, data provenance, failure mode documentation)
    • Dependencies/assumptions: legal counsel alignment; vendor cooperation; standardized documentation templates

Academia

  • Curriculum updates embedding Popperian falsification and epistemology into AI/DS programs
    • Sector: education (computer science, data science, medicine, public policy)
    • Tools/workflows: modules on theory‑laden observation, criticism, and explanation; case studies (e.g., scanner confounding in medical AI); assignments that prioritize refutation exercises
    • Dependencies/assumptions: faculty capacity; curricular governance; teaching materials
  • Publication and peer‑review norms beyond “SOTA metrics”
    • Sector: AI research
    • Tools/workflows: mandatory “Explanation and Failure Analysis” sections; preregistered critical tests; replication packages including stress‑tests not just in‑distribution benchmarks
    • Dependencies/assumptions: journal and conference policy changes; reviewer guidance; author compliance
  • Research workflows using AI as instrument for hypothesis generation, with human explanation and critique
    • Sector: scientific discovery across domains
    • Tools/workflows: LLM‑assisted literature synthesis followed by human theory formulation and falsification planning; “counterfactual challenge” generators to probe proposed theories
    • Dependencies/assumptions: researcher training; data access; careful separation of pattern finding vs explanatory theory building

Policy and Governance

  • Explicit accountability and explainability requirements for algorithm‑influenced public decisions
    • Sector: tax authorities, social services, law enforcement, healthcare payers
    • Tools/workflows: policies that ban “the model says so” rationales; requirement to document human reasoning; decision justification templates
    • Dependencies/assumptions: statutory authority; staff training; defensible records management
  • Anti‑discrimination safeguards in algorithmic selection systems
    • Sector: taxation (audit selection), social benefits, hiring
    • Tools/workflows: direct and indirect discrimination checks; feature review to eliminate proxies (e.g., travel patterns as nationality proxies); periodic fairness audits with corrective actions
    • Dependencies/assumptions: legal clarity on protected attributes; privacy constraints; independent audit capacity
  • AI risk registers including inductivist/Bayesian pitfalls and OOD risks
    • Sector: all public agencies using AI
    • Tools/workflows: standardized risk taxonomy (spurious correlations, false dichotomies, instrumentalism without explanation); mitigation plans; governance boards with philosophical expertise
    • Dependencies/assumptions: cross‑disciplinary participation; resourcing; integration with existing risk management frameworks
  • Funding and communication standards for AGI claims
    • Sector: research funding, national AI strategies
    • Tools/workflows: proposals must state underlying philosophy of knowledge; clear milestones tied to explanatory/critique capabilities (not data scale alone)
    • Dependencies/assumptions: policy consensus; evaluation expertise; political will

Daily Life

  • Personal practices to treat AI outputs as suggestions, not truths
    • Sector: everyday productivity, health information seeking
    • Tools/workflows: “second‑source verification” checklists; note‑taking templates capturing why an answer seems plausible, and what would falsify it; using AI to surface options, humans to choose
    • Dependencies/assumptions: digital literacy; time to verify; access to alternative sources
  • Creative workflows where humans retain the “creative steering wheel”
    • Sector: writing, art, coding
    • Tools/workflows: deliberate human ideation before prompting; iterative critique rounds; maintaining a log of human decisions and explanations behind edits
    • Dependencies/assumptions: discipline to avoid over‑reliance; version control tools; openness to slower but higher‑quality output

Long‑Term Applications

The following items require further research, scaling, or development, often contingent on philosophical advances in explanation and criticism (as emphasized by Deutsch and Popper).

Industry

  • Explanation‑centric AI architectures that generate explicit hypotheses and invite refutation
    • Sector: scientific software, healthcare diagnostics, compliance analytics
    • Tools/products: “hypothesis engines” that output candidate explanatory models alongside uncertainty; integrated falsification planners; interfaces for human critique and model revision
    • Dependencies/assumptions: breakthroughs in representing and evaluating explanations; new benchmarks; regulatory acceptance
  • Production “critique agents” and counterexample generators
    • Sector: ML platforms, enterprise AI
    • Tools/products: autonomous agents that propose counterfactuals to break spurious correlations; continuous integration of falsification tests; responsibility ledgers recording human decisions
    • Dependencies/assumptions: scalable test generation; guardrails to avoid harmful probes; cultural readiness to confront failures
  • Sector‑specific OOD sentinel services
    • Sector: robotics/autonomy, healthcare, finance
    • Tools/products: managed services monitoring domain shift and triggering safe fallback modes; certification programs for OOD resilience
    • Dependencies/assumptions: robust OOD metrics; standardization; liability frameworks

Academia

  • Formal metrics and evaluation frameworks for “explanation quality” and “criticism effectiveness”
    • Sector: AI and philosophy of science
    • Tools/workflows: benchmarks that score explanatory adequacy, testability, and refutability; shared datasets for explanation evaluation
    • Dependencies/assumptions: theoretical consensus; community adoption; tooling to measure abstract qualities
  • Cross‑disciplinary institutes for “Philosophy‑in‑AI”
    • Sector: research organizations
    • Tools/workflows: sustained collaboration between philosophers, AI researchers, domain scientists; longitudinal studies of explanation‑driven discovery
    • Dependencies/assumptions: funding; talent pipelines; shared agendas

Policy and Governance

  • National “falsification testbeds” for high‑stakes public AI
    • Sector: taxation, social services, public health
    • Tools/workflows: sandbox environments to stress‑test models against adversarial and OOD scenarios before deployment; public reporting of failure analyses
    • Dependencies/assumptions: legislative support; data sharing agreements; independent oversight bodies
  • AGI governance frameworks tied to philosophical thresholds
    • Sector: national and international policy
    • Tools/workflows: “AGI readiness indices” that require demonstrable explanatory and criticism capabilities; staged regulatory regimes that evolve with genuine capability signals
    • Dependencies/assumptions: clarity on AGI definition; international coordination; avoidance of hype‑driven policymaking

Daily Life

  • Widely adopted epistemic literacy programs
    • Sector: public education, workplace training
    • Tools/workflows: modules teaching theory‑laden observation, falsification, and the limits of pattern matching; practical exercises using consumer AI tools
    • Dependencies/assumptions: curriculum reform; educator training; accessible materials
  • Consumer tools that scaffold explanation‑seeking behavior
    • Sector: productivity apps, personal knowledge management
    • Tools/products: “why notebooks” embedded in AI assistants prompting users to articulate explanations and possible refutations before acting; default settings that nudge verification
    • Dependencies/assumptions: product development; user adoption; UX that balances friction with benefit

Assumptions and Dependencies Common Across Applications

  • Access to representative, well‑documented datasets; willingness to collect fairness‑relevant metadata within legal constraints.
  • Availability of interpretability, OOD detection, and red‑teaming tools; acceptance of performance trade‑offs when prioritizing explainability and safety.
  • Organizational culture that values explanation and criticism over purely instrumental “it works” metrics.
  • Legal and regulatory frameworks (e.g., EU AI Act) that support human accountability and documentation without mandating infeasible technical transparency.
  • Philosophical and methodological advances to formalize explanation generation and criticism in AI systems for long‑term goals.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.