Key takeaways
- ChatGPT and other AI models pull from content they can actually parse -- messy JavaScript-heavy pages, thin content, and blocked crawlers all get ignored
- Being visible in AI answers requires a different approach than traditional SEO, though the two overlap more than you'd expect
- Technical fixes like
llms.txt, structured data, and semantic HTML make your content machine-readable - Content strategy matters just as much -- AI models cite authoritative, specific, question-answering content
- You can't fix what you can't measure: tracking your AI visibility is the only way to know if your changes are working
You spent hours writing a blog post. It's well-researched, properly formatted, and ranking on page two of Google. Then you ask ChatGPT the exact question your article answers -- and it cites a Reddit thread from 2021 and a Medium post that's half as detailed as yours.
That's genuinely frustrating. And it's happening to a lot of people right now.
The good news: there are real, fixable reasons why this happens. AI models aren't random. They have preferences -- for structure, for authority signals, for content that's easy to parse. Once you understand what they're looking for, you can start giving it to them.
Why AI models skip your site in the first place
Before jumping to fixes, it helps to understand the actual problem. ChatGPT (and Perplexity, Claude, Gemini, etc.) don't crawl the web in real time the way Google does. Their knowledge comes from training data and, for models with web search enabled, from live search results they pull at query time.
This means there are two separate problems you might be facing:
Problem 1: You're not in the training data. Newer sites, low-authority sites, and sites with thin content often didn't make it into the datasets used to train these models. There's not much you can do retroactively about this, but you can start building the signals that get you included in future training runs.
Problem 2: You're not being cited in real-time search. When ChatGPT uses web search (which it does by default for many queries), it pulls live results and synthesizes them. If your page doesn't rank well in Bing or Google for the relevant query, it won't show up in those results, and it won't get cited.
Most people are dealing with a combination of both. Here's how to address each.
Fix 1: Make your content actually machine-readable
This is the most overlooked piece. A lot of websites are technically functional for human visitors but nearly unreadable for AI crawlers. Heavy JavaScript rendering, pop-ups that block content, paywalls, and messy HTML all make it harder for AI systems to extract useful information from your pages.
Add an llms.txt file
This is a plain text file you place at yoursite.com/llms.txt. It tells AI systems what your site is about and where to find your most important content -- think of it as a map for LLMs, similar to what robots.txt does for search crawlers.
A basic version looks like this:
# Your Site Name
> A brief description of what your site covers and who it's for.
## Key pages
- [Your best article title](/blog/article-slug): One-line description
- [Another key page](/about): What it covers
It doesn't need to be elaborate. Clear and accurate beats comprehensive and vague.
Use semantic HTML properly
If your entire site is built with <div> tags, AI parsers have a much harder time understanding what's a heading, what's body content, what's navigation, and what's a sidebar. Semantic HTML elements like <article>, <main>, <nav>, <aside>, <header>, and <footer> give structure that machines can actually interpret.
Heading hierarchy matters too. One <h1> per page, then <h2> for main sections, <h3> for subsections. Not chosen based on how they look, but based on what they mean.
Add structured data (JSON-LD)
Structured data is metadata you embed in your page's <head> that explicitly tells machines what type of content they're looking at. For a blog post:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Your Article Title",
"author": {
"@type": "Person",
"name": "Your Name"
},
"datePublished": "2026-01-15",
"description": "A one-sentence summary of what this article covers."
}
</script>
This helps AI models understand authorship, publication date, and content type -- all signals that feed into whether your content gets cited and how it gets attributed.

Fix 2: Check that AI crawlers can actually reach you
Even if your content is perfectly structured, it won't get cited if AI crawlers are blocked from accessing it. This is surprisingly common -- many sites accidentally block legitimate AI crawlers through overly aggressive robots.txt rules or Cloudflare bot protection settings.
The main crawlers to allow:
GPTBot(OpenAI/ChatGPT)ClaudeBot(Anthropic)PerplexityBotGoogleOther(used for AI Overviews training)Googlebot-Extended
Check your robots.txt file and make sure none of these are disallowed. If you're using Cloudflare or another CDN with bot management, verify that these user agents aren't being blocked or challenged.
You can also check your server logs to see if these crawlers are visiting your site at all -- and if they're hitting errors when they do. Tools like Promptwatch include AI crawler log analysis that shows you exactly which crawlers are hitting your pages, what errors they encounter, and how often they return.

Fix 3: Build the authority signals AI models trust
AI models don't just cite any page that answers a question. They favor pages that have authority signals -- the same signals that help with traditional SEO, plus some additional ones specific to AI.
Get cited on authoritative third-party sites
One of the strongest signals for AI visibility is being mentioned on sites that AI models already trust: Wikipedia, major publications, industry directories, Reddit threads with high engagement, YouTube videos from credible creators. When AI models see your brand or content referenced across multiple authoritative sources, they're more likely to include you in their answers.
This means PR, link building, and community participation aren't just SEO tactics anymore -- they directly feed your AI visibility.
Build topical authority
AI models tend to cite sources that cover a topic comprehensively, not just one article that happens to answer a specific question. If you have 20 well-written articles on a narrow topic, you're more likely to get cited than a site with one great article surrounded by unrelated content.
Pick a niche and go deep. Cover the main questions, the follow-up questions, the edge cases, and the comparisons. The more thoroughly you cover a topic, the more likely AI models are to treat your site as an authoritative source on it.
Answer specific questions directly
AI models are trying to answer questions. They favor content that answers questions clearly and directly -- ideally in the first few paragraphs, not buried after 500 words of preamble.
Structure your content around questions. Use FAQ sections. Put the answer at the top, then explain it. This isn't just good for AI visibility; it's good writing.
Fix 4: Align your content with what people actually ask AI
There's a meaningful difference between what people type into Google and what they ask AI assistants. Google queries tend to be short and keyword-focused ("best project management tool"). AI queries tend to be conversational and specific ("what's the best project management tool for a remote team of 10 that already uses Slack?").
If your content is optimized purely for short-tail keywords, it may not match the longer, more specific prompts that AI models are responding to. You need to understand the actual prompts people are using -- not just the keywords.
This is where prompt intelligence tools become genuinely useful. Rather than guessing which questions to answer, you can see the specific prompts that are driving AI responses in your category, which ones have high volume, and which ones your competitors are already being cited for but you're not.

Fix 5: Track your AI visibility so you know what's working
Here's the frustrating part of AI optimization: you can't just check your Google Search Console and call it done. AI citations don't show up in traditional analytics. If ChatGPT mentions your site in an answer, you won't see a referral in GA4 unless the user actually clicks through -- and most don't.
This means you need a different way to measure progress. The basic approach is manual: regularly ask AI models the questions your content answers and see if you show up. It's tedious but it works as a starting point.
The more systematic approach is to use a dedicated AI visibility tracking tool. These platforms monitor how often your brand and content appear in AI responses across different models, track which pages are being cited, and show you how your visibility changes over time.

Tools worth knowing about
| Tool | Best for | Key strength |
|---|---|---|
| Promptwatch | Full AI visibility + content optimization | Crawler logs, content gap analysis, content generation |
| Peec AI | Monitoring AI brand mentions | Clean dashboard, smart suggestions |
| Otterly.AI | Budget-friendly monitoring | Affordable entry point |
| Ahrefs Brand Radar | Teams already using Ahrefs | Integrated with existing SEO workflow |
| Semrush AI Visibility | Teams already using Semrush | Broad platform coverage |
| LLM Pulse | Small teams getting started | Simple, lightweight |

The main thing to look for in any of these tools: does it just show you data, or does it help you act on it? A dashboard that tells you "your visibility score is 23" isn't useful if it doesn't tell you why, and what to do about it.
The content gap problem
One thing most people miss: it's not enough to know that you're not showing up. You need to know which specific prompts you're not showing up for, and what content would change that.
This is the hardest part to do manually. You'd need to identify all the prompts relevant to your category, run them through multiple AI models, analyze the responses, compare them to your existing content, and figure out what's missing. That's a significant research project.
Content gap analysis tools automate this. They show you the prompts your competitors are being cited for that you're not, and what topics your site needs to cover to compete. Promptwatch's Answer Gap Analysis does exactly this -- it maps your content against actual AI responses and surfaces the specific gaps you need to fill.
A realistic timeline
Getting your site cited more in AI answers isn't instant. Here's roughly what to expect:
- Technical fixes (llms.txt, structured data, crawler access): these can be done in a day or two, and AI crawlers may pick up the changes within a few weeks
- Content improvements (better structure, question-focused writing): ongoing, but you may see movement within 1-3 months
- Authority building (third-party mentions, topical depth): this takes months to years, but it compounds
The sites that win in AI search are the ones that started building these signals early. The good news is that most of your competitors haven't started yet.
Where to start
If you're not sure where to begin, here's a simple order of operations:
- Check your
robots.txtand make sure AI crawlers aren't blocked - Add an
llms.txtfile to your site root - Add JSON-LD structured data to your key pages
- Audit your HTML structure -- are you using semantic elements and proper heading hierarchy?
- Run a manual test: ask ChatGPT, Perplexity, and Gemini the questions your content answers. See who shows up instead of you.
- Set up some form of ongoing tracking so you can measure progress
The manual test in step 5 is particularly useful because it tells you exactly who you're competing against in AI answers -- and you can study those pages to understand what they're doing that you're not.
Getting cited in AI answers is increasingly where discovery happens. Search behavior is shifting, and the sites that adapt now will have a real advantage over those that wait.
