First Citation
Research
16 min read

The Science of AI Recommendations: What Academic Research Tells Us

Review of key academic studies on LLM citation behavior, factual recall, recommendation bias, and the citation gap for underrepresented sources.

Eren Çöp
February 12, 2026
Share:

Key Takeaways

  • LLM factual recall accuracy correlates with how frequently and consistently information appears across authoritative training sources
  • Popularity bias means well-known brands receive disproportionately more and more accurate citations — a self-reinforcing cycle
  • Position bias in list generation gives first-mentioned brands compounding advantages in user attention
  • RAG systems favor content that directly answers queries with structured, quantitative, and authoritative information
  • The citation gap systematically underrepresents non-English sources, SMBs, newer entrants, and niche specialists
  • Contradictory information across sources increases hallucination risk — consistency is critical for citation accuracy
  • Research-backed GEO strategies build more sustainable advantages than tactical approaches

The Science of AI Recommendations: What Academic Research Tells Us

Introduction

Beneath GEO tactical advice lies a growing body of academic research on LLM behavior that provides deeper insight into why AI models recommend what they do.

Theme 1: Factual Recall and Knowledge Boundaries

Model confidence correlates with frequency and consistency of information across training sources. Facts in many authoritative sources are recalled accurately; facts in few sources risk hallucination. This creates a self-reinforcing cycle where well-known brands get more accurate citations.

Theme 2: Recommendation Bias and Popularity Effects

LLMs exhibit popularity bias — disproportionately recommending entities appearing more frequently in training data. Position bias in list generation means items generated first are more confidently recommended.

Theme 3: Citation Attribution Patterns

RAG systems favor sources that:

  • Answer queries directly in opening paragraphs
  • Present information in structured formats
  • Include specific quantitative claims
  • Come from high-authority domains
  • Appear earlier in retrieved document lists

Theme 4: The Citation Gap

Systematic underrepresentation affects:

  • Non-English sources — English-language bias even for non-English queries
  • SMBs — fewer citations relative to quality and relevance
  • Newer entrants — invisible in parametric responses post-cutoff
  • Niche specialists — generalists favored over niche experts

Implications for GEO Strategy

Information consistency reduces hallucination risk. Structured content aligns with RAG selection. Below-threshold brands must invest disproportionately to break through.

Conclusion

Research-backed GEO builds more sustainable citation advantages than tactical tricks.

Frequently Asked Questions

What is popularity bias in AI recommendations?

Popularity bias is the tendency of LLMs to disproportionately recommend entities (brands, products) that appear more frequently in their training data. This creates a self-reinforcing cycle where well-known brands get cited more often, leading to even greater visibility.

Why do AI models hallucinate about brands?

AI hallucination occurs when models encounter inconsistent or sparse information in their training data. Brands with contradictory information across sources or those that appear in only a few sources face higher hallucination risk — the model may blend conflicting details into incorrect composite descriptions.

What is the citation gap in AI search?

The citation gap is the systematic underrepresentation of certain sources in AI recommendations. It particularly affects non-English sources, small and medium businesses, newer market entrants, and niche specialists who may have deep expertise but narrower digital footprints.

How do RAG systems select which sources to cite?

RAG systems favor sources that answer queries directly in opening paragraphs, present structured and extractable information, include specific quantitative claims, come from high-authority domains, and appear earlier in retrieved document lists.

← Back to Blog

Stay ahead of AI search changes

Get research updates, citation insights, and tool announcements.