llms.txt is a proposed standard file served at the root of a website (like robots.txt) that lists curated links to your most important content in a format optimized for LLM consumption. It helps AI agents find canonical content without crawling your entire site.

Is llms.txt an official standard?

Not yet. llms.txt was proposed by Jeremy Howard in 2024 and is an emerging convention — not a ratified standard. Adoption as of early 2026 sits around 10% of the top 300k domains, but growing.

Does llms.txt actually work?

Current evidence is mixed. Some LLM crawlers consume llms.txt; others ignore it entirely. It's low-cost to publish and signals machine-readability, so the practical answer is "yes, publish one" — while acknowledging it's not yet a required or universally respected standard.

What's the difference between llms.txt and llms-full.txt?

llms.txt is a concise index of links with descriptions (~1-2 KB). llms-full.txt is a full-content dump of your documentation in markdown format (potentially much larger). Both are optional; llms.txt is more common.

Should I block or allow AI crawlers in robots.txt?

Separate from llms.txt, you control AI crawler access in robots.txt. Most content publishers should allow GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and similar bots — blocking them removes you from AI search results entirely, cutting off a growing distribution channel. The sensible default is to allow by default, block only for specific content licensing or competitive reasons, and review the decision annually.

llms.txt Explained: Complete 2026 Guide

llms.txt is a proposed standard for a text file served at the root of your site that tells AI agents where to find your most important content in a clean, structured format. It sits alongside robots.txt and sitemap.xml as a file AI crawlers check early.

As of early 2026, adoption is ~10% of top 300k domains. It's not yet a ratified standard, but it's cheap to publish and signals that your site is machine-readable.

Key takeaways

llms.txt is a proposed, not yet ratified, convention originally introduced by Jeremy Howard in 2024.
Adoption is around 10% of top 300k domains as of 2026, concentrated in developer tool documentation — per research from SE Ranking.
Some AI systems consume it (parts of the Claude + LangChain ecosystem); ChatGPT currently ignores it in favor of standard robots.txt and sitemap signals.
Best treated as a low-cost signal: publish one, maintain it, don't overinvest.
Separately from llms.txt, allow AI crawlers in robots.txt — blocking them removes you from AI search entirely.

llms.txt vs robots.txt vs llms-full.txt

File	Purpose	Format	Consumer
`robots.txt`	Control crawler access	Directives	All crawlers
`sitemap.xml`	List all indexable URLs	XML	Search engines
`llms.txt`	Curated index of key content	Markdown	LLM agents
`llms-full.txt`	Full content dump	Markdown	LLM agents

robots.txt says "you can/can't crawl this." sitemap.xml says "here are all my URLs." llms.txt says "here are my best, canonical URLs, already described in a way an LLM can consume."

Current adoption: ~10%

Recent crawls of the top 300,000 domains show:

~10% have a valid llms.txt
~3% have llms-full.txt in addition
Adoption is concentrated in developer-tool and SaaS documentation sites (where Mintlify, ReadMe, and other doc platforms generate these files automatically)

Enterprise content sites lag significantly. If you publish one, you're still early.

Does it actually work?

Honest answer: it depends on which LLM.

Respected — Anthropic's Claude and some LangChain/RAG pipelines explicitly check for llms.txt as a retrieval hint.
Ignored — OpenAI's ChatGPT crawlers don't officially check llms.txt; they rely on standard robots.txt + sitemap + content extraction.
Partial — Perplexity and Google Gemini's behavior is undocumented publicly.

The practical stance: publish llms.txt as a low-cost signal. It won't dominate citation outcomes, but it adds a cleanly-indexed entry point that some AI systems will consume.

Anatomy of a great llms.txt

The structure Jeremy Howard proposed:

# GeoDaddy
 
> GeoDaddy is an open-source tool that analyzes your site's visibility in
> AI search engines like ChatGPT, Perplexity, Gemini, and Claude. It
> runs 22 checks and returns a scored report with fix recommendations.
 
## Core documentation
 
- [What is GEO?](https://geodaddy.dev/docs/what-is-geo): Introduction to Generative Engine Optimization.
- [CLI Usage](https://geodaddy.dev/docs/cli): Install and run the CLI in CI/CD.
- [MCP Server](https://geodaddy.dev/docs/mcp): Run GEO analysis from Claude or Cursor.
- [Checks Reference](https://geodaddy.dev/docs/checks): All 22 checks in detail.
 
## Guides
 
- [GEO vs SEO](https://geodaddy.dev/guides/geo-vs-seo): How AI and traditional search differ.
- [How to rank in ChatGPT](https://geodaddy.dev/guides/chatgpt-seo): 9 citation signals.
- [Perplexity SEO](https://geodaddy.dev/guides/perplexity-seo): The 7 citation signals.
- [llms.txt guide](https://geodaddy.dev/guides/llms-txt): This page.
- [AI search visibility checklist](https://geodaddy.dev/guides/ai-search-visibility-checklist): 22-signal pillar.
 
## Optional
 
- [GitHub](https://github.com/borabiricik/geodaddy-cli): Source code.

Key elements:

# Title — one H1 with the site/project name.
> Description — a blockquote summary.
## Section — grouped link sections. Conventionally "Core", "Docs", "Guides", "Optional".
Bulleted links with : description format.

Keep it under 4 KB. This is an index, not a sitemap.

7 best practices

1. Serve from the root

Must be at https://yourdomain.com/llms.txt. Not a subdirectory, not a subdomain.

2. Use `Content-Type: text/plain` (or `text/markdown`)

Don't serve as text/html. AI agents expect text.

CDN caching conflicts — some CDN configs serve llms.txt as application/octet-stream. Set explicit content type.
Exhaustive inventories — llms.txt isn't a sitemap. List the 10-30 most important pages, not everything.
Auto-generated cruft — some doc platforms generate bloated llms.txt. Review and trim.
Redirects from /llms.txt — must return 200 directly.
Blocking in robots.txt — make sure Disallow: / rules don't accidentally block /llms.txt.

Templates by site type

SaaS / product

# {Product Name}
 
> {One-sentence product description}
 
## Product
 
- [Homepage](...)
- [Features](...)
- [Pricing](...)
 
## Documentation
 
- [Getting Started](...)
- [API Reference](...)
- [Guides](...)
 
## Company
 
- [About](...)
- [Blog](...)

Docs-heavy project

# {Project Name}
 
> {Project description}
 
## Core concepts
 
- [Introduction](...)
- [Architecture](...)
 
## API
 
- [Reference](...)
- [SDK](...)
 
## Guides
 
- [Quickstart](...)
- [Advanced](...)

Blog / content site

# {Site Name}
 
> {Site purpose}
 
## Featured content
 
- [Best-of article](...)
- [Guide](...)
 
## Categories
 
- [Category index](...)

Generate llms.txt automatically

Several static site generators and doc platforms generate llms.txt automatically:

Mintlify — built-in, enabled by default
ReadMe — available via config
Docusaurus — via docusaurus-plugin-llms
Nextra — via nextra-llmstxt
Custom Next.js — create src/app/llms.txt/route.ts that returns the markdown

For a custom Next.js site, a route handler is a few lines:

// src/app/llms.txt/route.ts
export function GET() {
  const content = `# Your Site
...
`
  return new Response(content, {
    headers: { "Content-Type": "text/plain" },
  })
}

How GeoDaddy checks llms.txt

The GeoDaddy playground validates:

llms.txt presence at root
Correct content type
Valid markdown structure
Absolute URLs
Non-redirecting response

Missing llms.txt is a minor severity (2 points) — present but malformed drops you half.

Bottom line

Publishing llms.txt in 2026:

Takes 20 minutes
Costs nothing
Might help citation in some AI systems, likely neutral in others
Signals machine-readability

Publish one, keep it maintained, and don't overinvest. The 22 other signals in the AI search visibility checklist are all higher-ROI than llms.txt alone.

AI search visibility checklist — all 22 signals
How to rank in ChatGPT — ChatGPT-specific tactics
Perplexity SEO guide — Perplexity citation signals
What is GEO? — the foundational concept
Checks reference — every GeoDaddy check

References

llms.txt — the original proposal by Jeremy Howard — specification and rationale.
SE Ranking: The state of llms.txt in 2026 — adoption data.
Search Engine Land: llms.txt meets the standard — analysis of impact.
Mintlify: The value of llms.txt — hype or real? — honest assessment.
CrawlerOptic: llms.txt best practices — implementation pitfalls.
Anthropic: Controlling web crawlers — how Claude interacts with web content.

llms.txt Explained: Complete 2026 Guide

Key takeaways

llms.txt vs robots.txt vs llms-full.txt

Current adoption: ~10%

Does it actually work?

Anatomy of a great llms.txt

7 best practices

1. Serve from the root

2. Use `Content-Type: text/plain` (or `text/markdown`)

3. Use absolute URLs

4. Avoid redirects

5. Keep it canonical

6. No authentication required

7. Update when content structure changes

Common mistakes

Templates by site type

SaaS / product

Docs-heavy project

Blog / content site

Generate llms.txt automatically

How GeoDaddy checks llms.txt

Bottom line

References

2. Use Content-Type: text/plain (or text/markdown)

2. Use `Content-Type: text/plain` (or `text/markdown`)