All guides

llms.txt Explained: Complete 2026 Guide

llms.txt is a proposed standard for a text file served at the root of your site that tells AI agents where to find your most important content in a clean, structured format. It sits alongside robots.txt and sitemap.xml as a file AI crawlers check early.

As of early 2026, adoption is ~10% of top 300k domains. It's not yet a ratified standard, but it's cheap to publish and signals that your site is machine-readable.

Key takeaways

  • llms.txt is a proposed, not yet ratified, convention originally introduced by Jeremy Howard in 2024.
  • Adoption is around 10% of top 300k domains as of 2026, concentrated in developer tool documentation — per research from SE Ranking.
  • Some AI systems consume it (parts of the Claude + LangChain ecosystem); ChatGPT currently ignores it in favor of standard robots.txt and sitemap signals.
  • Best treated as a low-cost signal: publish one, maintain it, don't overinvest.
  • Separately from llms.txt, allow AI crawlers in robots.txt — blocking them removes you from AI search entirely.

llms.txt vs robots.txt vs llms-full.txt

FilePurposeFormatConsumer
robots.txtControl crawler accessDirectivesAll crawlers
sitemap.xmlList all indexable URLsXMLSearch engines
llms.txtCurated index of key contentMarkdownLLM agents
llms-full.txtFull content dumpMarkdownLLM agents

robots.txt says "you can/can't crawl this." sitemap.xml says "here are all my URLs." llms.txt says "here are my best, canonical URLs, already described in a way an LLM can consume."

Current adoption: ~10%

Recent crawls of the top 300,000 domains show:

  • ~10% have a valid llms.txt
  • ~3% have llms-full.txt in addition
  • Adoption is concentrated in developer-tool and SaaS documentation sites (where Mintlify, ReadMe, and other doc platforms generate these files automatically)

Enterprise content sites lag significantly. If you publish one, you're still early.

Does it actually work?

Honest answer: it depends on which LLM.

  • Respected — Anthropic's Claude and some LangChain/RAG pipelines explicitly check for llms.txt as a retrieval hint.
  • Ignored — OpenAI's ChatGPT crawlers don't officially check llms.txt; they rely on standard robots.txt + sitemap + content extraction.
  • Partial — Perplexity and Google Gemini's behavior is undocumented publicly.

The practical stance: publish llms.txt as a low-cost signal. It won't dominate citation outcomes, but it adds a cleanly-indexed entry point that some AI systems will consume.

Anatomy of a great llms.txt

The structure Jeremy Howard proposed:

# GeoDaddy
 
> GeoDaddy is an open-source tool that analyzes your site's visibility in
> AI search engines like ChatGPT, Perplexity, Gemini, and Claude. It
> runs 22 checks and returns a scored report with fix recommendations.
 
## Core documentation
 
- [What is GEO?](https://geodaddy.dev/docs/what-is-geo): Introduction to Generative Engine Optimization.
- [CLI Usage](https://geodaddy.dev/docs/cli): Install and run the CLI in CI/CD.
- [MCP Server](https://geodaddy.dev/docs/mcp): Run GEO analysis from Claude or Cursor.
- [Checks Reference](https://geodaddy.dev/docs/checks): All 22 checks in detail.
 
## Guides
 
- [GEO vs SEO](https://geodaddy.dev/guides/geo-vs-seo): How AI and traditional search differ.
- [How to rank in ChatGPT](https://geodaddy.dev/guides/chatgpt-seo): 9 citation signals.
- [Perplexity SEO](https://geodaddy.dev/guides/perplexity-seo): The 7 citation signals.
- [llms.txt guide](https://geodaddy.dev/guides/llms-txt): This page.
- [AI search visibility checklist](https://geodaddy.dev/guides/ai-search-visibility-checklist): 22-signal pillar.
 
## Optional
 
- [GitHub](https://github.com/borabiricik/geodaddy-cli): Source code.

Key elements:

  1. # Title — one H1 with the site/project name.
  2. > Description — a blockquote summary.
  3. ## Section — grouped link sections. Conventionally "Core", "Docs", "Guides", "Optional".
  4. Bulleted links with : description format.

Keep it under 4 KB. This is an index, not a sitemap.

7 best practices

1. Serve from the root

Must be at https://yourdomain.com/llms.txt. Not a subdirectory, not a subdomain.

2. Use Content-Type: text/plain (or text/markdown)

Don't serve as text/html. AI agents expect text.

3. Use absolute URLs

Every link should be fully-qualified (https://yourdomain.com/...). Relative URLs break when the file is cached externally.

4. Avoid redirects

llms.txt should return 200 OK, not 301/302. Some crawlers drop content served through redirects.

5. Keep it canonical

Every URL in llms.txt should be the canonical version — no duplicate content, no query parameters, no session IDs.

6. No authentication required

AI agents don't sign in. Public URLs only.

7. Update when content structure changes

Don't let llms.txt drift. If you restructure URLs, update the file. Broken links in llms.txt hurt more than none at all.

Common mistakes

  • CDN caching conflicts — some CDN configs serve llms.txt as application/octet-stream. Set explicit content type.
  • Exhaustive inventories — llms.txt isn't a sitemap. List the 10-30 most important pages, not everything.
  • Auto-generated cruft — some doc platforms generate bloated llms.txt. Review and trim.
  • Redirects from /llms.txt — must return 200 directly.
  • Blocking in robots.txt — make sure Disallow: / rules don't accidentally block /llms.txt.

Templates by site type

SaaS / product

# {Product Name}
 
> {One-sentence product description}
 
## Product
 
- [Homepage](...)
- [Features](...)
- [Pricing](...)
 
## Documentation
 
- [Getting Started](...)
- [API Reference](...)
- [Guides](...)
 
## Company
 
- [About](...)
- [Blog](...)

Docs-heavy project

# {Project Name}
 
> {Project description}
 
## Core concepts
 
- [Introduction](...)
- [Architecture](...)
 
## API
 
- [Reference](...)
- [SDK](...)
 
## Guides
 
- [Quickstart](...)
- [Advanced](...)

Blog / content site

# {Site Name}
 
> {Site purpose}
 
## Featured content
 
- [Best-of article](...)
- [Guide](...)
 
## Categories
 
- [Category index](...)

Generate llms.txt automatically

Several static site generators and doc platforms generate llms.txt automatically:

  • Mintlify — built-in, enabled by default
  • ReadMe — available via config
  • Docusaurus — via docusaurus-plugin-llms
  • Nextra — via nextra-llmstxt
  • Custom Next.js — create src/app/llms.txt/route.ts that returns the markdown

For a custom Next.js site, a route handler is a few lines:

// src/app/llms.txt/route.ts
export function GET() {
  const content = `# Your Site
...
`
  return new Response(content, {
    headers: { "Content-Type": "text/plain" },
  })
}

How GeoDaddy checks llms.txt

The GeoDaddy playground validates:

  • llms.txt presence at root
  • Correct content type
  • Valid markdown structure
  • Absolute URLs
  • Non-redirecting response

Missing llms.txt is a minor severity (2 points) — present but malformed drops you half.

Bottom line

Publishing llms.txt in 2026:

  • Takes 20 minutes
  • Costs nothing
  • Might help citation in some AI systems, likely neutral in others
  • Signals machine-readability

Publish one, keep it maintained, and don't overinvest. The 22 other signals in the AI search visibility checklist are all higher-ROI than llms.txt alone.

References