llms.txt Explained: Complete 2026 Guide
llms.txt is a proposed standard for a text file served at the root of your site that tells AI agents where to find your most important content in a clean, structured format. It sits alongside robots.txt and sitemap.xml as a file AI crawlers check early.
As of early 2026, adoption is ~10% of top 300k domains. It's not yet a ratified standard, but it's cheap to publish and signals that your site is machine-readable.
Key takeaways
- llms.txt is a proposed, not yet ratified, convention originally introduced by Jeremy Howard in 2024.
- Adoption is around 10% of top 300k domains as of 2026, concentrated in developer tool documentation — per research from SE Ranking.
- Some AI systems consume it (parts of the Claude + LangChain ecosystem); ChatGPT currently ignores it in favor of standard robots.txt and sitemap signals.
- Best treated as a low-cost signal: publish one, maintain it, don't overinvest.
- Separately from llms.txt, allow AI crawlers in robots.txt — blocking them removes you from AI search entirely.
llms.txt vs robots.txt vs llms-full.txt
| File | Purpose | Format | Consumer |
|---|---|---|---|
robots.txt | Control crawler access | Directives | All crawlers |
sitemap.xml | List all indexable URLs | XML | Search engines |
llms.txt | Curated index of key content | Markdown | LLM agents |
llms-full.txt | Full content dump | Markdown | LLM agents |
robots.txt says "you can/can't crawl this." sitemap.xml says "here are all my URLs." llms.txt says "here are my best, canonical URLs, already described in a way an LLM can consume."
Current adoption: ~10%
Recent crawls of the top 300,000 domains show:
- ~10% have a valid
llms.txt - ~3% have
llms-full.txtin addition - Adoption is concentrated in developer-tool and SaaS documentation sites (where Mintlify, ReadMe, and other doc platforms generate these files automatically)
Enterprise content sites lag significantly. If you publish one, you're still early.
Does it actually work?
Honest answer: it depends on which LLM.
- Respected — Anthropic's Claude and some LangChain/RAG pipelines explicitly check for llms.txt as a retrieval hint.
- Ignored — OpenAI's ChatGPT crawlers don't officially check llms.txt; they rely on standard robots.txt + sitemap + content extraction.
- Partial — Perplexity and Google Gemini's behavior is undocumented publicly.
The practical stance: publish llms.txt as a low-cost signal. It won't dominate citation outcomes, but it adds a cleanly-indexed entry point that some AI systems will consume.
Anatomy of a great llms.txt
The structure Jeremy Howard proposed:
# GeoDaddy
> GeoDaddy is an open-source tool that analyzes your site's visibility in
> AI search engines like ChatGPT, Perplexity, Gemini, and Claude. It
> runs 22 checks and returns a scored report with fix recommendations.
## Core documentation
- [What is GEO?](https://geodaddy.dev/docs/what-is-geo): Introduction to Generative Engine Optimization.
- [CLI Usage](https://geodaddy.dev/docs/cli): Install and run the CLI in CI/CD.
- [MCP Server](https://geodaddy.dev/docs/mcp): Run GEO analysis from Claude or Cursor.
- [Checks Reference](https://geodaddy.dev/docs/checks): All 22 checks in detail.
## Guides
- [GEO vs SEO](https://geodaddy.dev/guides/geo-vs-seo): How AI and traditional search differ.
- [How to rank in ChatGPT](https://geodaddy.dev/guides/chatgpt-seo): 9 citation signals.
- [Perplexity SEO](https://geodaddy.dev/guides/perplexity-seo): The 7 citation signals.
- [llms.txt guide](https://geodaddy.dev/guides/llms-txt): This page.
- [AI search visibility checklist](https://geodaddy.dev/guides/ai-search-visibility-checklist): 22-signal pillar.
## Optional
- [GitHub](https://github.com/borabiricik/geodaddy-cli): Source code.Key elements:
# Title— one H1 with the site/project name.> Description— a blockquote summary.## Section— grouped link sections. Conventionally "Core", "Docs", "Guides", "Optional".- Bulleted links with
: descriptionformat.
Keep it under 4 KB. This is an index, not a sitemap.
7 best practices
1. Serve from the root
Must be at https://yourdomain.com/llms.txt. Not a subdirectory, not a subdomain.
2. Use Content-Type: text/plain (or text/markdown)
Don't serve as text/html. AI agents expect text.
3. Use absolute URLs
Every link should be fully-qualified (https://yourdomain.com/...). Relative URLs break when the file is cached externally.
4. Avoid redirects
llms.txt should return 200 OK, not 301/302. Some crawlers drop content served through redirects.
5. Keep it canonical
Every URL in llms.txt should be the canonical version — no duplicate content, no query parameters, no session IDs.
6. No authentication required
AI agents don't sign in. Public URLs only.
7. Update when content structure changes
Don't let llms.txt drift. If you restructure URLs, update the file. Broken links in llms.txt hurt more than none at all.
Common mistakes
- CDN caching conflicts — some CDN configs serve llms.txt as
application/octet-stream. Set explicit content type. - Exhaustive inventories — llms.txt isn't a sitemap. List the 10-30 most important pages, not everything.
- Auto-generated cruft — some doc platforms generate bloated llms.txt. Review and trim.
- Redirects from
/llms.txt— must return 200 directly. - Blocking in robots.txt — make sure
Disallow: /rules don't accidentally block/llms.txt.
Templates by site type
SaaS / product
# {Product Name}
> {One-sentence product description}
## Product
- [Homepage](...)
- [Features](...)
- [Pricing](...)
## Documentation
- [Getting Started](...)
- [API Reference](...)
- [Guides](...)
## Company
- [About](...)
- [Blog](...)Docs-heavy project
# {Project Name}
> {Project description}
## Core concepts
- [Introduction](...)
- [Architecture](...)
## API
- [Reference](...)
- [SDK](...)
## Guides
- [Quickstart](...)
- [Advanced](...)Blog / content site
# {Site Name}
> {Site purpose}
## Featured content
- [Best-of article](...)
- [Guide](...)
## Categories
- [Category index](...)Generate llms.txt automatically
Several static site generators and doc platforms generate llms.txt automatically:
- Mintlify — built-in, enabled by default
- ReadMe — available via config
- Docusaurus — via
docusaurus-plugin-llms - Nextra — via
nextra-llmstxt - Custom Next.js — create
src/app/llms.txt/route.tsthat returns the markdown
For a custom Next.js site, a route handler is a few lines:
// src/app/llms.txt/route.ts
export function GET() {
const content = `# Your Site
...
`
return new Response(content, {
headers: { "Content-Type": "text/plain" },
})
}How GeoDaddy checks llms.txt
The GeoDaddy playground validates:
llms.txtpresence at root- Correct content type
- Valid markdown structure
- Absolute URLs
- Non-redirecting response
Missing llms.txt is a minor severity (2 points) — present but malformed drops you half.
Bottom line
Publishing llms.txt in 2026:
- Takes 20 minutes
- Costs nothing
- Might help citation in some AI systems, likely neutral in others
- Signals machine-readability
Publish one, keep it maintained, and don't overinvest. The 22 other signals in the AI search visibility checklist are all higher-ROI than llms.txt alone.
Related reading
- AI search visibility checklist — all 22 signals
- How to rank in ChatGPT — ChatGPT-specific tactics
- Perplexity SEO guide — Perplexity citation signals
- What is GEO? — the foundational concept
- Checks reference — every GeoDaddy check
References
- llms.txt — the original proposal by Jeremy Howard — specification and rationale.
- SE Ranking: The state of llms.txt in 2026 — adoption data.
- Search Engine Land: llms.txt meets the standard — analysis of impact.
- Mintlify: The value of llms.txt — hype or real? — honest assessment.
- CrawlerOptic: llms.txt best practices — implementation pitfalls.
- Anthropic: Controlling web crawlers — how Claude interacts with web content.