Understanding the rules governing LLM discovery responses

Determine the decision-making rules that large language models such as ChatGPT (gpt-4o-mini) and Perplexity (sonar with web search) apply when generating synthesized responses to product discovery queries, specifically characterizing the criteria and mechanisms that lead to inclusion or exclusion of specific products in these recommendations.

Background

The paper contrasts traditional search engines, where ranking rules are well-understood via SEO and PageRank, with LLMs that generate synthesized responses instead of lists of links. In this context, the author highlights that while SEO ranking factors are known, the analogous "rules" used by LLMs to decide what to include in discovery-style responses remain poorly understood.

This gap in understanding is central to the study’s motivation: startups experience a dramatic difference between recognition in direct queries and visibility in discovery queries. Clarifying the rules that govern inclusion would help explain the observed visibility gap and inform optimization strategies for LLM discovery.

References

But LLMs work differently. They don't return ranked lists of links; they generate synthesized responses. The rules are different and we don't fully understand them yet.

The Discovery Gap: How Product Hunt Startups Vanish in LLM Organic Discovery Queries  (2601.00912 - Sharma, 1 Jan 2026) in Section 1.1 (The Problem), Page 3