A roofer in Tampa called me in March. He'd been #1 on Google for "metal roofing tampa" for two years and his phone had basically stopped ringing. His son had typed the same query into ChatGPT, gotten back three competitors and a Reddit thread, and asked his dad why their company wasn't in there. That conversation is happening in living rooms all over the country right now, and most SMB owners don't even know it's happening until their phone goes quiet.

So let's talk about it. I've spent the last 18 months auditing more than 600 small-business sites specifically for AI-search visibility, and the pattern is depressingly consistent: people who are perfectly fine on Google are getting absolutely smoked inside ChatGPT, Claude, and Gemini. The reasons are technical, fixable, and almost always free. This piece is the working playbook I use with clients — what to do, in what order, and what to skip even if a confident LinkedIn post tells you to do it.

First, what "showing up in ChatGPT" actually means

There are two completely different things people mean when they say "I want to show up in ChatGPT," and you need to understand both or you'll waste your time on the wrong work.

The first is live web answers. When a user asks ChatGPT something current — "best CRM for a four-person agency in 2026," "what's the refund policy at brand X," "who does emergency drain cleaning in Austin on Sundays" — ChatGPT sends a real-time crawler out to the open web, pulls a handful of pages, reads them, and writes you an answer with citations. The citations link out. That's the lane most businesses can actually win, and it can pay off in days, not months.

The second is baked-in knowledge. When a user asks something general or evergreen — "what's the standard contract structure for a SaaS reseller," "who makes the best mechanical keyboards" — ChatGPT answers from memory, without doing a web search. That memory came from a training run that happened months ago. Getting into that layer takes longer and depends on the rest of the web talking about you, not just what you publish on your own site.

You optimize for both with the same work. But you measure them differently, and the timelines are different, and pretending otherwise will frustrate you.

Meet OpenAI's three crawlers

OpenAI runs three separate web agents and most people don't know they exist, let alone the difference between them. Here's the cheat sheet.

GPTBot is the training crawler. It scrapes the open web to build datasets that future versions of GPT will be trained on. When GPTBot reads your site today, you might show up in GPT‑6 next year. Long horizon, slow feedback, big upside.

OAI-SearchBot is the search-index crawler for ChatGPT Search. OpenAI quietly maintains its own search index (partly augmented by Bing, more on that in a minute), and OAI-SearchBot is what fills it. When you "appear in ChatGPT" with a clickable citation, OAI-SearchBot is the bot that put you there.

ChatGPT-User is the on-demand fetcher. When a ChatGPT user gives the model a URL to read, or when ChatGPT decides it needs to grab a specific page right now to answer a specific question, this is the user-agent that shows up in your logs. It runs per conversation. If your site only ever gets ChatGPT-User hits and never OAI-SearchBot, that means ChatGPT can read you but is not indexing you for general queries. Worth knowing.

If you block any of these, you cut off that lane. Block OAI-SearchBot and you will not appear in ChatGPT's citations, full stop. Block GPTBot and you opt out of future training. Block ChatGPT-User and individual users can't even paste your URL into a chat and get a summary. People do this by accident all the time — usually a security plugin auto-blocks anything that looks like a bot.

The robots.txt mistake nobody talks about

Here is the single most common cause of "my site isn't in ChatGPT" that I see in audits. The site owner thinks their robots.txt is permissive because it says:

User-agent: *
Allow: /

That's fine for most crawlers. But Cloudflare, Wordfence, certain Shopify apps, and a handful of CDN edge configurations don't actually read robots.txt — they apply their own blocklist of bots they consider "AI scrapers" and refuse to serve content to them regardless of what your robots.txt says. Cloudflare in particular started shipping AI-bot blocking on by default for some plans in mid-2024, and a lot of small businesses got swept up in it without ever knowing.

So the fix isn't just to write a permissive robots.txt. It's to explicitly allow every AI agent by name, and then go check your CDN / firewall / security plugin to make sure none of them are silently overriding you. Here's the robots.txt I deploy on client sites:

# Standard crawlers
User-agent: *
Allow: /

# AI search and training — explicitly welcome
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: CCBot
Allow: /

User-agent: Applebot-Extended
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Then I open Cloudflare and turn off the AI-bot block. Then I check WP Engine or whatever host they're on for any AI bot toggle. Then I test by hitting the page with a fake User-Agent: GPTBot header from curl and making sure I get a real 200 with the actual HTML, not a "we're checking your browser" interstitial.

This step alone fixes maybe one in three of the sites I audit. It's the cheapest, highest-leverage thing you can do today.

The content shape ChatGPT keeps quoting

Once the bots can read you, the next question is whether your content is worth quoting. And I want to say something a little uncomfortable here: most SEO content is unquotable on purpose. We were trained for a decade to write long, padded, keyword-stuffed introductions because Google rewarded depth signals. ChatGPT does the opposite. ChatGPT doesn't care how long your article is — it cares whether it can find one or two clean sentences that answer the user's question without burning a hundred tokens to summarize them.

The pattern that works, and I mean this literally — I have client sites where I rewrote intro paragraphs to follow this pattern and the citation rate tripled inside six weeks:

  1. H1 is the question your customer is actually typing. Not your branded version. The actual phrase. If they search "how much does drain cleaning cost in Austin," your H1 is "How much does drain cleaning cost in Austin?" — not "Affordable Austin Drain Solutions."
  2. First paragraph is a 2–3 sentence direct answer. Lead with the answer. Lead with a number if you can. "Emergency drain cleaning in Austin typically runs $175 to $450, with most jobs landing around $250. Saturday and Sunday calls are usually the same flat rate at honest shops; predatory ones charge a weekend premium."
  3. Then go deep. Now you can have your 1,500-word explanation of why prices vary, what to look for, when to call a plumber vs. snake it yourself. ChatGPT will skip most of it. Humans who got the answer they needed and want context will read it. Win-win.
  4. Close with a real FAQ block. 6–10 questions, each answered in 2–4 sentences, wrapped in FAQPage JSON-LD that mirrors the visible markup. This is the single most lifted format in generative search. Period.

If you do nothing else from this article, do this. The reason it works isn't magic — it's that ChatGPT is fundamentally a quoting machine, and you're making yourself easier to quote.

Why the third-party stuff matters more than your site

Here's the part nobody wants to hear. You can build the cleanest, most quotable site in your category and still lose, because ChatGPT weights what other trusted sources say about you almost as heavily as what you say about yourself.

Why? Because models are trained to be cautious about taking a brand's word for itself. "We are the leading provider" doesn't move the needle. But if Reddit has a thread where someone says "I called these guys and they were great," and a local newspaper has a profile on the founder, and a podcast guest dropped your name in passing, and a Wikipedia article on the broader industry includes you in a list — that's a different story. Now the model has three independent corroborations. It will quote you.

So a real chunk of "AI SEO" is actually offsite work that looks a lot like old-school PR and community building. Specifically:

  • Be useful on Reddit. Not as a marketer. As a human in your industry who answers people's questions in good faith. Two posts a week, real answers, no link drops. ChatGPT loves Reddit because the conversation format is dense with question-answer pairs and the moderation makes signal/noise tolerable.
  • Get into roundups. Email three writers in your space who do "best of" lists. Offer a 60-second hot take. One in ten will quote you.
  • Make sure your Wikipedia presence is accurate. Not promotional. Accurate. If you're notable enough for an article, it should be factually correct. If you're in an industry list, you should be in it.
  • Get on a podcast or two. Doesn't have to be Joe Rogan. A niche industry podcast with 500 listeners produces a transcript that gets indexed, archived, and quoted. I had a B2B SaaS client land in Gemini's answer for a specific procurement question because the founder did a 45-minute interview on a procurement podcast in 2023. The transcript was the citation source.
  • Publish guest pieces. One real article on a respected industry publication does more than five months of your own blog. The model already trusts that domain.

This work is slow and unsexy. There's no growth-hack version. But it's the bottleneck for most businesses I work with, and there's no way around it.

The Bing connection (don't skip this)

ChatGPT Search uses OpenAI's own index, but parts of it are augmented by Bing. This means if you are not in Bing's index, you've cut your visibility surface in ChatGPT. And honestly — most small businesses I audit are not in Bing's index. They submitted to Google Search Console years ago and never set up Bing Webmaster Tools, because who uses Bing, right?

The right move:

  1. Create a Bing Webmaster Tools account if you don't have one (you can import from GSC, takes five minutes).
  2. Submit your sitemap.
  3. Verify your top pages are indexed.
  4. Use the URL Inspection tool the same way you do in GSC.

This is one of those "you'd be embarrassed if you knew" wins. Half the SEO industry forgot Bing existed. Meanwhile it's quietly powering an unknown percentage of every ChatGPT answer.

Schema that actually moves the needle

I'm going to be picky here because most "add schema" advice is too vague to be useful. Of the dozens of schema.org types, three actually matter for ChatGPT visibility:

FAQPage — already mentioned, single most lifted format. Three to ten Q&A pairs per page, JSON-LD in the head, visible markup on the page that matches exactly. If they don't match Google will flag it and you'll lose the rich result.

HowTo — when your page is genuinely step-by-step. Don't fake it. If you have a real "how to clean a tankless water heater" page with seven steps, mark it up with HowTo and let the model lift the steps cleanly.

Organization + Person (author) — these don't get directly quoted, but they help the model verify who is speaking. An anonymous blog post with no author is treated as lower-trust than one with a real byline, a credentials block, and a link to a real About page. Models lean on E-E-A-T (experience, expertise, authoritativeness, trust) almost as hard as Google does.

What I don't bother with: BreadcrumbList (helpful for Google rich results, irrelevant for ChatGPT), Article on every blog post (nice to have, won't change your visibility), product schemas (only matter for e-commerce specifically, and you should be doing them anyway).

The Cloudflare gotcha (a follow-up)

I want to come back to Cloudflare because it bites people so consistently. In June 2024, Cloudflare announced a one-click AI-bot blocking feature and started turning it on by default for some accounts. The blocklist is curated by Cloudflare and includes GPTBot, ClaudeBot, CCBot, PerplexityBot — basically everyone you want to allow.

If your site is behind Cloudflare:

  1. Log in to the Cloudflare dashboard.
  2. Pick your zone.
  3. Go to Security → Bots → AI Bots (or in newer UI, Security → Bots and look for AI scrapers).
  4. If "Block AI bots" is enabled, decide. For most businesses, turn it off.
  5. If you want to be granular, allow the AI bots you care about and block the ones you don't.

This is a five-minute check. Do it today. I've seen it tank otherwise well-optimized sites for an entire quarter before anyone noticed.

How to know if you're actually showing up

Tracking is the hardest part of this whole game right now, because none of the AI engines give you a "search console" yet. Here's what I do:

Read your server logs. If you're on a managed host, you may need to enable raw log access. Look for hits from OAI-SearchBot and ChatGPT-User. The first is OpenAI indexing you; the second is someone asking ChatGPT something and ChatGPT fetching your page live. The second is the immediate signal — if you see ChatGPT-User hits on a specific URL, that URL is actively being cited.

Manual queries. Once a week, open ChatGPT (with web search on) and run 10 queries you should rank for. Note the citations. If you're not in them, note who is. That's your gap analysis.

The free analyzer on this site. Plug in your URL and look at the four-axis score. If LLM Visibility is below 70, you have access or content issues. If Helpfulness is below 70, you're hiding the answer. If Trust is low, you don't have author/schema/llms.txt signals.

Bing Webmaster Tools. Because parts of ChatGPT Search use Bing, your Bing impressions are a halfway-decent proxy for ChatGPT discoverability. Imperfect, but useful.

A 30-day plan that actually moves the needle

I'm going to give you the exact order I'd run this in if you handed me your site tomorrow. No fluff. No "depending on your goals."

Week one — access. Audit robots.txt. Allow all the AI agents by name. Check Cloudflare / CDN / security plugin for silent bot blocking. Fix any blocks. Set up Bing Webmaster Tools and submit your sitemap. Publish an llms.txt at the root of your domain that lists your top 8–10 pages with one-sentence summaries.

Week two — content shape. Pick your five highest-intent URLs. Rewrite the H1 to match the actual question. Rewrite the first paragraph to be a 2–3 sentence direct answer. Add a 6–10 question FAQ at the bottom of each, marked up with FAQPage JSON-LD. Add author bios with real credentials.

Week three — third-party. Make a list of 10 places your name should appear and doesn't (Reddit threads, industry roundups, podcast episodes, Wikipedia article on your industry). Write one real comment on Reddit, email two roundup authors, pitch one podcast. Don't try to do everything at once. Do it once a week, forever.

Week four — measure and iterate. Check server logs for AI bot hits. Run 10 manual ChatGPT queries you should rank for. Note where you're missing. Pick the two highest-value gaps and write content specifically aimed at them next.

That's it. There is no magic. There is no growth hack. People who tell you they can get you into ChatGPT in seven days for $2,000 are selling you nothing.

A closing rant about "GEO experts"

One last thing. The phrase "GEO" (generative engine optimization) is being thrown around a lot right now, mostly by people who two years ago were calling themselves "TikTok growth specialists" and three years before that were "blockchain consultants." Be careful out there. The work I described above is not new. It's careful, basic, technical SEO plus old-school PR plus a few new file formats. Anyone telling you it requires a $5,000 monthly retainer to do a robots.txt audit is full of it.

The honest version of this work, when done by a real practitioner, is not glamorous. It is fixing a robots.txt file. It is rewriting an intro paragraph. It is convincing a client to do a podcast interview. It is checking Cloudflare. It is the kind of careful, boring, compounding work that most people don't want to do, which is why most sites are losing this race.

You don't have to be one of them. The free analyzer at the top of this site will tell you where you stand in about 30 seconds. The fixes are mostly things you can do this afternoon. The third-party work takes longer but it's the kind of thing you can build a real, durable business on.

Now go fix your robots.txt.


This piece will be kept up to date as the major models change their crawlers, schema preferences, and ranking behavior. Last updated May 21, 2026.