Track AI-search traffic when GA4 calls it 'Direct'

AI-search traffic from ChatGPT and Perplexity often lands in GA4 as Direct. Four signals — channel groups, form questions, server logs, citation tracking — pull it back into view.
AI-search traffic from ChatGPT and Perplexity often lands in GA4 as Direct. Four signals — channel groups, form questions, server logs, citation tracking — pull it back into view.
A 12-point website conversion audit founders can run in 30 minutes. Score yes/no, fix the no's first, and stop redesigning before diagnosing.

Gabriel Espinheira

AI search sent 1.13 billion referral visits to the top 1,000 websites in June 2025 — and most of them landed in GA4 as "Direct." If you're a European founder running a content engine, that means the channel working hardest for you right now is also the one your dashboard can't see.

Here's the fix in one line: build a four-signal measurement stack — a custom GA4 channel group, one self-reported form question, a server-log scan, and a citation tracker. Together they recover roughly 70 to 85% of the AI-search traffic currently misclassified as "Direct."

The four signals matter because no single one is enough. GA4 alone misses every session a referrer was stripped from. A "how did you hear about us?" question alone is noisy. Server logs see crawlers but not humans. Citation trackers see answers but not clicks. Run all four and the picture sharpens fast.

TL;DR

AI-search traffic from ChatGPT, Perplexity, Claude, and Gemini lands in GA4 as "Direct" the moment a referrer is stripped — which is most of the time. To measure it, build a custom GA4 channel group, add one form question, scan server logs for AI bot activity, and pair them with a citation tracker. Four signals together recover the hidden traffic.

Why GA4 buries AI-search traffic as "Direct"

GA4 categorises a session using the HTTP referrer that arrives with the request. If no recognisable referrer is present, the session is bucketed as "Direct" — the same channel GA4 uses for someone typing your URL into the address bar by hand. AI search platforms strip that header more often than they keep it, so most genuine AI referrals look identical to a manual URL lookup.

Five things cause the referrer to disappear:

  • The free version of ChatGPT. Free-tier users don't pass referrer data. They click a citation, the visit lands, GA4 sees nothing.

  • The ChatGPT mobile app, iOS and Android. Citations open in an in-app browser. The browser doesn't send a referrer back to your site.

  • Perplexity and Claude on mobile. Same in-app browser pattern. Same outcome.

  • Google AI Mode. It lives inside google.com and serves answers without ever sending a separate referrer. Even when a citation is shown, the click signal arrives as "Direct" or as a Google session with no useful detail.

  • Copy-paste citations. Someone reads a ChatGPT answer, copies your URL, pastes it into a fresh tab. No referrer ever existed.

The scale matters. Conductor's 2025 industry data shows ChatGPT now accounts for 87.4% of all AI referral traffic, and Chartbeat's November 2025 measurement showed Google search referrals down 33% globally year-on-year. The traffic is moving. The dashboards haven't caught up.

The four signals you need to measure AI-search traffic

You don't fix this with one perfect tool. You fix it by stacking four imperfect signals that triangulate the same truth from four different angles.

Signal 1 — A custom GA4 channel group for AI search

This is the quickest win and it takes about 30 minutes.

In GA4: Admin → Data Settings → Channel Groups → Create custom channel group. Add a rule called "AI Search." Set the condition so Session source matches regex with a pattern like:

Move the rule above "Referral" and "Organic Search" so AI sessions are claimed first. Apply it to your standard reports as the default channel grouping.

This catches everything that does arrive with a recognisable referrer — roughly half to two-thirds of true AI traffic, depending on your audience mix. One reason it works better in 2026 than it did in 2024: ChatGPT now appends utm_source=chatgpt.com to most citation links it shows on the web product, a change OpenAI rolled out in June 2025. Those visits classify cleanly out of the box.

What this signal misses: anything from a mobile app, anything from Google AI Mode, and anything where the user copy-pasted your URL.

Signal 2 — One self-reported form question

Add a single field to your enquiry form, your demo request, and any trial signup: "How did you hear about us?" Leave it as free text. Don't pre-fill anything. Don't add helper text suggesting answers. Place it near the end of the form so it doesn't interrupt the conversion.

This is the only signal that captures the most valuable form of AI visibility — the lead who saw your name in a ChatGPT answer and didn't click. They went to your site directly. They booked a call. They typed "ChatGPT recommended you" in your form field.

Self-reported attribution is now treated as a primary signal by mature B2B teams, not a vanity metric. Completion rates run above 95% when the field is placed at the end and labelled simply. Roughly one in five answers will be noise — typos, "Google" defaults from people who can't remember, blanks filled in to clear the form. Discount that bucket honestly and read the rest.

For European founders without paid tools, this is the highest-leverage 30 minutes you can spend this week. One form field. One read every Monday.

Signal 3 — Server logs for AI bot activity

Your server logs see what GA4 never will: every request to your site, including the ones from AI crawlers. The ones to track:

  • GPTBot — OpenAI's training crawler.

  • ChatGPT-User — OpenAI's real-time fetcher when a live conversation needs to read your page.

  • OAI-SearchBot — OpenAI's search-index crawler.

  • PerplexityBot and Perplexity-User — same pattern, different vendor.

  • ClaudeBot and anthropic-ai — Anthropic's crawlers.

  • GoogleOther — Google's AI-feature fetcher, separate from Googlebot.

Two questions to answer from the logs:

  • Which pages are being crawled most? That's the corpus the AI engines are learning from about you. If the answer surprises you, your editorial priorities are out of sync with what AI search thinks you're known for.

  • What's your crawl-to-refer ratio? Compare the number of AI-bot hits on a page to the number of human visits that page gets from any AI channel. High crawl, low referral means the bots are reading the page but not citing it in answers. That's a content gap, not a tracking gap.

You don't need new infrastructure. Most cloud hosts — Cloudflare, Vercel, Netlify, Fastly — expose log data directly. A weekly grep for the bot user-agents above, piped through sort and uniq -c, gives you the report you need in ten minutes.

Signal 4 — Citation tracking for AI search engines

The fourth signal answers a question the first three can't: when ChatGPT cites someone for your category, is it you or your competitor?

Tools like ZipTie, Profound, Otterly, and the AI toolkit inside Semrush query ChatGPT, Perplexity, Claude, and Gemini at scale for the queries that matter to your business. They report which queries cite you, which cite competitors, and which mention you in the answer without a click-through.

Start small. Pick ten to thirty queries that map to your top three blog clusters and your two highest-intent service pages. Track them weekly. Add new queries as you publish new content.

This signal costs money — vendors charge anything from a small monthly fee to a senior-hire-sized budget depending on tool and scale. Skip it until the first three signals have proven the channel exists for you. Then add it once you need to measure share of voice rather than just confirm presence.

A 30-minute weekly measurement loop

The four signals stop being useful the moment they become a full-time job. Run them on a fixed cadence and walk away.

Monday morning, 20 minutes:

  • Open the GA4 AI Search channel. Compare last week's sessions, top landing pages, and any new referring sources to the week before.

  • Read the self-reported form answers from last week's leads. Tag anything mentioning ChatGPT, Perplexity, Claude, Gemini, AI Overviews, or "an AI tool."

  • grep server logs for the AI bot user-agents. Note the top five crawled pages and any new bots that have appeared.

Friday afternoon, 10 minutes:

  • Open the citation tracker if you're using one. Note queries gained, queries lost, and any competitor citations on your priority queries.

Feed the findings into next week's content plan. AI-cited pages get more depth. AI-crawled-but-not-cited pages get a direct-answer rewrite at the top. Pages that show up in self-reported leads get an internal-link sweep so adjacent pages benefit.

What the four-signal stack proves — and what it doesn't

The stack proves three things: that AI search is sending you real human traffic, which pages it lands on, and what queries you're cited for. Triangulated together, the four signals are honest enough to make editorial decisions on.

What they won't give you: a perfect funnel down to two decimal places.

  • Self-reported data is roughly 80% clean. The other 20% is noise.

  • Server logs see crawlers, not humans. A spike in crawls doesn't always become traffic.

  • Channel groups still miss in-app browser sessions and Google AI Mode.

  • Citation tools query AI engines on a sample, not on every real user prompt.

That's fine. The point of the stack isn't precision — it's directional honesty. You'll know whether AI search is working for you. You'll know which content is doing the work. You'll know which queries to write the next post against. That's enough to act on every week.

The one number worth taking seriously: Microsoft Clarity's 2026 dataset across more than 1,200 publishers showed LLM-referred visitors converting at 1.66% for sign-ups versus 0.15% for organic search visitors — roughly eleven times higher. The volume is small. The intent is not. If you're not measuring it, you're under-spending on the channel that converts hardest.

FAQ

Will GA4 fix this on its own?

Unlikely soon. GA4 classifies sessions based on what arrives in the HTTP request. As long as AI platforms strip referrers, mobile apps open citations in in-app browsers, and Google AI Mode lives inside google.com without a separate referrer, the signal GA4 needs simply isn't there. The fix has to come from your side, not Google's.

Do I need a paid citation tool to get started?

No. A custom GA4 channel group plus one form field recovers most of the measurement gap and costs nothing. Paid citation tools add a fourth signal — share of voice across AI answers — but you should prove the channel is working for you before you spend a euro. Three weeks of the first two signals tells you whether you need the third.

What about the utm_source=chatgpt.com parameter?

ChatGPT began appending that tag to citation links from its web product in June 2025. GA4 picks it up automatically as a referral; no setup needed. The catch: it only covers the desktop web product. Mobile app sessions, free-tier sessions, and citations from Perplexity, Claude, Gemini, and Copilot still arrive without anything similar.

Is AI-search traffic actually worth tracking if the volume is still small?

Yes. The volume is rising — AI referrals to the top 1,000 sites grew 357% year-on-year by June 2025 — and the conversion intent is roughly eleven times that of organic search. Small channel, premium leads. Measuring it now is what lets you grow it deliberately rather than by accident.

Should I block AI crawlers if they're not sending traffic?

Not yet. The crawl-to-refer ratio is your diagnostic, not your verdict. Pages getting crawled heavily without producing citations need a direct-answer rewrite at the top of the page — not a robots.txt block. Block crawlers only when you're certain a specific bot offers nothing in return, and even then start with one bot, not a category.

You can't optimise what you can't see

AI search is sending European founders the highest-intent traffic they've had in years — and most of it is sitting in a "Direct" bucket inside GA4, classified as something the reader did by accident. Build the four signals. Run the loop on Mondays. Give it three weeks.

If you want a senior partner to wire up GA4, your forms, your server logs, and your citation tracker in one go — and to publish the content that feeds them — that's a SharpHaw subscription. Book a 30-min call and we'll walk through what's already hiding in your "Direct" bucket.

Plan. Build. Iterate.

Ready to start?

Book a 30-minute call. We'll dig into what's working, what isn't, and what the first move should be. No fluff, no pressure. If it makes sense to work together, we'll make it happen.

Ready to start?

Book a 30-minute call. We'll dig into what's working, what isn't, and what the first move should be. No fluff, no pressure. If it makes sense to work together, we'll make it happen.

Read more