Papers
Topics
Authors
Recent
Search
2000 character limit reached

The Discovery Gap: How Product Hunt Startups Vanish in LLM Organic Discovery Queries

Published 1 Jan 2026 in cs.IR and cs.AI | (2601.00912v1)

Abstract: When someone asks ChatGPT to recommend a project management tool, which products show up in the response? And more importantly for startup founders: will their newly launched product ever appear? This research set out to answer these questions. I randomly selected 112 startups from the top 500 products featured on the 2025 Product Hunt leaderboard and tested each one across 2,240 queries to two different LLMs: ChatGPT (gpt-4o-mini) and Perplexity (sonar with web search). The results were striking. When users asked about products by name, both LLMs recognized them almost perfectly: 99.4% for ChatGPT and 94.3% for Perplexity. But when users asked discovery-style questions like "What are the best AI tools launched this year?" the success rates collapsed to 3.32% and 8.29% respectively. That's a gap of 30-to-1 for ChatGPT. Perhaps the most surprising finding was that Generative Engine Optimization (GEO), the practice of optimizing website content for AI visibility, showed no correlation with actual discovery rates. Products with high GEO scores were no more likely to appear in organic queries than products with low scores. What did matter? For Perplexity, traditional SEO signals like referring domains (r = +0.319, p < 0.001) and Product Hunt ranking (r = -0.286, p = 0.002) predicted visibility. After cleaning the Reddit data for false positives, community presence also emerged as significant (r = +0.395, p = 0.002). The practical takeaway is counterintuitive: don't optimize for AI discovery directly. Instead, build the SEO foundation first and LLM visibility will follow.

Summary

  • The paper demonstrates that while direct queries yield near-perfect recognition, organic discovery queries show a dramatic visibility drop for startups.
  • The paper uses empirical analysis of 112 startups and 2,240 queries to link traditional SEO and community signals with improved discoverability in web-search LLMs.
  • The paper finds that GEO optimization alone is ineffective without established authority, underscoring the need for conventional SEO and community engagement.

The Discovery Gap: LLM Discoverability of Product Hunt Startups

Introduction

The paper "The Discovery Gap: How Product Hunt Startups Vanish in LLM Organic Discovery Queries" (2601.00912) presents a rigorous empirical analysis of the discoverability of newly launched startups within generative AI-powered search systems, focusing on LLMs such as ChatGPT (knowledge-cutoff) and Perplexity (web-search-augmented). It centers on the critical difference between direct recognition of named entities and organic inclusion in broader discovery-oriented queries, with a strong emphasis on the implications for SEO, community strategies, and the emergent field of Generative Engine Optimization (GEO).

Methodology

The study examines 112 randomly selected startups from the Product Hunt 2025 leaderboard, each with at least 200 upvotes and an accessible English-language web presence. A total of 2,240 queries were submitted—both direct queries (e.g., "What is [ProductName]?") and discovery queries (e.g., "What are the best AI tools launched this year?")—across two contrasting LLM architectures:

  • ChatGPT (gpt-4o-mini): A static, knowledge-cutoff model.
  • Perplexity (sonar with web search): A search-augmented LLM integrating real-time web data.

The study tracks multiple predictors of discoverability, including Product Hunt engagement metrics, technical SEO indicators, community signals (particularly Reddit), and a composite GEO score built on six dimensions (statistics, citations, technical terms, authoritative style, structured data, and content depth). The GEO score is operationalized using automated content analysis.

Key Findings

The Visibility Gap

A pronounced "discovery gap" is empirically confirmed. Direct queries elicited near-perfect recognition rates: 99.4% for ChatGPT and 94.3% for Perplexity. However, when startups were queried via organic discovery-style questions, these rates collapsed to 3.32% (ChatGPT) and 8.29% (Perplexity). Thus, the probability of a new Product Hunt startup being surfaced in general queries is 30 times lower for ChatGPT and 11 times lower for Perplexity compared to direct recognition. Only 5.4% of startups were ever recommended in ChatGPT discovery responses; 27.7% for Perplexity. These results expose a severe discoverability barrier for emerging entities.

Impact of Product, SEO, Community, and GEO Signals

Product Hunt Signals: For the web-augmented Perplexity model, Product Hunt ranking (POTD), upvotes, and ratings had statistically significant correlations with discovery visibility (POTD rank: r=−0.286r = -0.286, p=0.002p = 0.002; upvotes: r=+0.225r = +0.225, p=0.017p = 0.017; rating: r=+0.187r = +0.187, p=0.048p = 0.048). No such correlations existed for ChatGPT, as expected due to its static training set.

Technical SEO: In Perplexity, the number of referring domains and dofollow ratio were significant predictors (referring domains: r=+0.319r = +0.319, p<0.001p < 0.001; dofollow: r=+0.238r = +0.238, p=0.012p = 0.012). This indicates that traditional SEO remains a key mechanism for increasing LLM visibility in real-time search-enabled settings.

Community Presence: After cleansing for false positive entity matches caused by generic names, Reddit presence (number of unique subreddits and mentions) emerged as a strong predictor (r=+0.405r = +0.405, p=0.001p = 0.001 for unique subreddits). Hacker News and GitHub signals were not significant.

GEO Optimization: Contrary to claims in recent literature, the study found no significant correlation between the composite GEO score and organic discovery rates in either LLM (r=−0.108r = -0.108 and r=−0.102r = -0.102, non-significant for ChatGPT and Perplexity, respectively). This finding directly contradicts assertions that content-level optimization alone can drive generative engine discoverability for new, unestablished entities.

Architecture Divide

A fundamental divide is documented between knowledge-cutoff and web-search-augmented LLMs:

  • ChatGPT: No tested predictor could meaningfully influence discoverability. Organic inclusion appears stochastic and primarily determined by static model authority bias and temporal coverage.
  • Perplexity: Several content, SEO, and community signals reliably predicted inclusion, mapping to established search ranking mechanisms.

The ability to devise actionable discovery strategies is thus contingent on LLM architecture.

Theoretical and Practical Implications

Foundational Barriers for Startups

For emergent startups, especially those newly launched and not widely cited, the study demonstrates that LLMs trained on static corpora present a substantial, essentially insurmountable barrier to organic discovery. This status quo perpetuates "authority concentration," wherein entrenched entities maintain visibility advantages. Even aggressive on-page GEO optimization is ineffective unless an initial discoverability threshold has been met via conventional authority-building.

Strategic Recommendations

  • Prioritize SEO Authority: Backlink-building and conventional SEO are prerequisites for LLM visibility in search-augmented architectures.
  • Leverage Community Channels: Genuine engagement, particularly in high-signal communities, can elevate discoverability once false positives are controlled.
  • Deprioritize GEO as Initial Strategy: Without underlying citation base and authority, GEO efforts yield little practical effect on organic discovery.
  • Focus on Web-Search LLMs: Optimizing for web-augmented models offers actionable, measurable benefits, whereas knowledge-cutoff LLMs should not be a primary channel for early-stage discoverability.

Research Horizons

The findings suggest directions for further AI evaluation: cross-model analysis beyond ChatGPT and Perplexity, longitudinal tracking of entity discovery evolution, and refined methods for community signal extraction. Understanding the triggering threshold for effective GEO is a high-value open problem. There are also implications for LLM alignment, particularly concerning the risk of reinforcing systemic biases towards established entities.

Conclusion

This paper establishes the existence and scale of the discovery gap for startups in LLM-generated organic queries. While direct queries confirm entities’ existence, the pathway to organic inclusion is obstructed by authority gaps, insufficient link profiles, and limited community presence. Traditional SEO and genuine ecosystem engagement remain determinative for web-search-augmented models, while knowledge-cutoff LLMs are structurally resistant to both current SEO and GEO tactics. These findings recalibrate expectations for "AI-first" marketing, emphasizing the continuity of foundational SEO and community-building as prerequisites in the generative paradigm. Future research should address the lifecycle of discoverability, nuanced architectural contrasts, and more dynamic community/entity mapping strategies.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 16 tweets with 25 likes about this paper.