How to Track Referral Traffic from ChatGPT, Claude, and Perplexity

Q: Does ChatGPT send a referrer?

Sometimes. ChatGPT web (chatgpt.com) sends a Referer header when a user clicks a citation in browse or search mode from the desktop browser. The native iOS and Android apps usually do not send one, and traffic from those clicks lands in your analytics as Direct. Expect to capture only a fraction of true ChatGPT-driven visits this way.

Q: What user agent does ChatGPT use?

OpenAI runs three distinct user agents. GPTBot crawls for model training. OAI-SearchBot indexes pages for ChatGPT search. ChatGPT-User fetches a page in real time when a user asks a question that requires a live lookup. Each is independently controllable via robots.txt. Blocking GPTBot does not block the search bot or the user-fetch bot.

Most AI chatbots either send no referrer at all or use inconsistent hostnames, so a large share of AI-driven traffic shows up in GA4 as Direct. To track it, build a custom channel group with a referrer regex matching chatgpt.com, claude.ai, perplexity.ai, gemini.google.com, and copilot.microsoft.com, and treat the result as a floor on real volume, not a complete count.

TL;DR

GA4 added a native “AI Assistant” channel on 13 May 2026. It covers ChatGPT, Gemini, and Claude when a referrer is present. That is a narrow slice.

Cloudflare reported in 2025 that some AI vendors crawl pages far more often than they send clicks back (the crawl-to-click ratio was described as tens of thousands to one for at least one vendor). Confirm the current figure against Cloudflare’s primary write-up. Either way the crawl-to-click gap is real and asymmetric across vendors.

Perplexity is the only major AI tool that consistently passes a Referer header. ChatGPT does it sometimes. Claude rarely does.

Build a custom channel group so AI traffic does not get buried inside Referral or Direct. The regex matters more than the label.

Validate with DebugView and a real click from each tool. If your sample of one click does not show up, your regex is wrong.

llms.txt is fine to ship but no major vendor has committed to honouring it. Do not expect it to move the needle on referral volume in 2026.

Why this is harder than it should be

Three problems compound.

The first is that most AI products do not send a Referer header on outbound clicks. Independent practitioner audits consistently find that a majority of AI-driven visits arrive with no referrer and get bucketed as Direct. Mobile apps are the worst offenders. Native iOS and Android clients almost never set a referrer, and a meaningful share of ChatGPT and Claude usage happens on phones.

The second is that the AI vendors are not consistent about which hostname users land on. OpenAI runs both chatgpt.com and the older chat.openai.com. Microsoft Copilot lives at copilot.microsoft.com but historic Bing Chat traffic still trickles in from bing.com/chat. Google’s AI Mode shows up under gemini.google.com and occasionally bard.google.com. If your regex only matches one variant, you will undercount.

The third is structural. Until 13 May 2026, GA4 had no default channel for AI assistants. Traffic that did carry a referrer got dumped into the generic Referral bucket alongside random blogroll links. Google’s new AI Assistant channel fixes part of this, but it only triggers on referrers Google’s classifier recognises, and Google has been conservative about which vendors qualify.

Add it up and you get an attribution gap that is structurally worse than what we saw with social or search in the 2010s. You can close part of the gap with good engineering. You cannot close all of it.

What referrer hostname each AI tool actually sends

This is the table you wish someone had given you on day one. These are the referrer hostnames you should expect to see in your raw logs when a user clicks a citation from each product on the desktop web. Mobile apps are noted separately.

Product	Referrer hostname(s)	Sends Referer?	Notes
ChatGPT (web, browse / search mode)	`chatgpt.com`, occasionally `chat.openai.com`	Often	Free-tier search results are the most reliable. Logged-in browse mode varies.
ChatGPT iOS / Android app	None	Almost never	Clicks open in an in-app browser without passing the Referer header. Lands as Direct.
ChatGPT Atlas (desktop browser)	`chatgpt.com`	Yes	Treat the same as regular ChatGPT web for tracking purposes.
Claude.ai	`claude.ai`	Rarely	Citations frequently strip the referrer. Expect most Claude-driven clicks to look like Direct.
Perplexity	`perplexity.ai`, `www.perplexity.ai`	Reliably	Cited links carry the referrer. Tracking accuracy lands above 90% in practice.
Perplexity Comet (browser)	`perplexity.ai`	Yes	Behaves like a normal browser session with referrer set when the user clicks a citation.
Google Gemini	`gemini.google.com`, legacy `bard.google.com`	Sometimes	Behaviour varies by surface (gemini.google.com vs Workspace embedded vs mobile).
Google AI Overviews / AI Mode	`google.com` (organic search hostname)	Yes, but indistinguishable from regular organic	You cannot separate AI Overview clicks from blue-link clicks in GA4. Search Console aggregates them into Web search.
Microsoft Copilot	`copilot.microsoft.com`, legacy `bing.com/chat`	Yes	Copilot started reliably passing referrer headers in 2024.
You.com	`you.com`	Yes	Low volume for most sites but easy to capture.
Meta AI	`meta.ai`	Mixed	Embedded surfaces (WhatsApp, Instagram) usually do not pass a referrer.
Grok / x.ai	`grok.com`, `x.ai`	Sometimes	Newer product, still inconsistent.
DeepSeek	`deepseek.com`	Sometimes	Worth including in your regex if you serve international audiences.

One worth calling out: Google AI Overviews are invisible at the channel level. When someone clicks a citation inside an AI Overview, the referrer is google.com with the same parameters as any organic click. GA4 sees Organic Search. Search Console will show the impression and click rolled into Web. Google has begun adding AI Mode breakouts in Search Console for some queries, but you cannot filter the GA4 channel for AI Overviews specifically.

How to detect AI traffic in GA4

You have two options. Use the new native AI Assistant channel that Google rolled out on 13 May 2026, or build your own custom channel group. In practice you want both. The native channel is a free signal that needs no maintenance. The custom group covers more vendors, lets you control the regex, and survives Google quietly changing its classifier.

The native AI Assistant channel

As of 13 May 2026, GA4 automatically assigns the medium ai-assistant and the campaign label (ai-assistant) when a session’s referrer matches a recognised AI tool. Those sessions land under the new “AI Assistant” channel in Default Channel Group reports. No configuration required. The downside: Google’s list is short and not public in full. ChatGPT, Gemini, and Claude are explicitly named. Perplexity, Copilot, and others are not confirmed at the time of writing.

The custom channel group (recommended)

Open GA4 Admin, then Data display, then Channel groups, then Create new channel group. Add a rule above Referral. Match condition: “Session source matches regex” or “Session source / medium matches regex” depending on whether you want the rule to fire on any source carrying a known AI hostname.

Use this pattern:

^(chatgpt\.com|chat\.openai\.com|perplexity\.ai|www\.perplexity\.ai|claude\.ai|gemini\.google\.com|bard\.google\.com|copilot\.microsoft\.com|deepseek\.com|grok\.com|x\.ai|meta\.ai|you\.com|poe\.com|character\.ai)$

Avoid matching bare openai.com, anthropic.com, google.com, or bing.com. Each one serves general traffic that is not AI-chat related: API documentation pages, marketing pages, ordinary organic search. If you need to capture Copilot’s legacy bing.com/chat traffic, do it with a path-level filter on page_referrer rather than putting it in this hostname regex (GA4’s session source is hostname-only, so a path-based pattern here would never match).

Name the channel “AI Assistants” (or whatever your taxonomy demands). Order matters: this rule must sit above the default Referral rule so AI sessions get pulled out before they fall through into the generic bucket.

If you want subchannels per vendor (highly recommended for any team doing serious AI attribution), build separate rules for each product and put them in priority order. That lets you compare ChatGPT vs Perplexity vs Claude vs Copilot directly in any standard report. The custom group applies to sessions going forward only. GA4 does not retroactively reclassify historical sessions when channel-group rules change, so the time you ship the rule is the start of the new data series.

What about Looker Studio and BigQuery?

If you export GA4 to BigQuery, the cleanest answer is a derived ai_source dimension computed off traffic_source.source and collected_traffic_source.manual_source. That gives you an analyst-friendly field that does not depend on GA4’s UI configuration drifting. Custom channel groups can be lost during a GA4 property migration or an admin change, so as a general precaution treat the BigQuery view as the source of truth and GA4 reporting as a convenience layer.

How to verify it is actually working

This is the step everyone skips and then complains six weeks later that their numbers look wrong.

Open DebugView in GA4 on a staging session. From each AI tool you care about, open a fresh prompt, ask it a question that should return a citation pointing at your site, click the citation, and watch the event stream. You are looking for a session_start with the right source and medium. If the click did not produce a session source matching your regex, your regex is wrong, the tool stripped the referrer, or the link opened in a context that does not pass referrers (in-app browser, private window with strict tracking protection, redirect through a wrapper).

Repeat for at least: ChatGPT web, ChatGPT mobile, Claude.ai, Perplexity, Copilot, Gemini. Write down what each one sends. That table is your ground truth.

Then check the GA4 Real-time report and the Explorations panel after 24 hours. If your daily AI-channel session count is zero or very close to it, something is broken at the property level (often a recent change to the data stream or a tag manager rule that fires on a different domain).

Sanity check. A B2B SaaS site with strong content typically sees AI Assistant sessions in the low single digits as a percentage of total traffic in 2026. If you are seeing 30%, you are probably matching too aggressively (a hostname like openai.com can pick up API documentation referrers). If you are seeing zero, you are matching too tightly or your tag is broken.

Attributing conversions when the referrer is unreliable

Referrer-based detection is necessary but not sufficient. To attribute conversions you need at least one of: a UTM parameter on the inbound link, a server-side log of the source URL, or a fingerprint that ties first-touch to a known AI session.

Three concrete tactics.

UTM your own outbound links from places AI tools quote. If you publish a documentation site, a comparison page, or a tools directory, append UTMs to outbound links so when an AI summarises your page and uses the same link, the destination site captures the source. This does not help your own attribution. It helps reciprocal partners and gets you noticed when they share data.

Self-tag inbound links from your llms.txt and other AI-targeted surfaces. Some teams add a ?utm_source=llms-txt&utm_medium=ai&utm_campaign=docs suffix to the URLs they list in their llms.txt and sitemap variants intended for AI crawlers. The theory is that when a model regurgitates the URL verbatim, the click that follows carries the UTM. In practice this works inconsistently because some models strip query strings before quoting. It is not zero value but do not bet attribution on it.

Use server-side logging to capture the raw Referer header. GA4’s UI hides referrer noise. Your edge logs do not. A CDN log pipeline (Cloudflare Logpush, Fastly, AWS CloudFront) gives you the exact referrer string for every request. That lets you spot new AI hostnames as they appear, validate your regex, and reconcile against GA4 reporting. If you have BigQuery you can join GA4 events to CDN logs on session timestamp and IP and recover a non-trivial slice of AI traffic that GA4 marks as Direct.

The llms.txt and crawler-allowlist play

llms.txt is a community proposal from Jeremy Howard (Answer.AI) that lives at llmstxt.org. It defines a Markdown file at the root of your domain that lists the URLs and short descriptions you want LLMs to read first. Think of it as a hand-curated table of contents for models.

The honest status as of mid-2026: no major AI vendor has publicly committed to honouring llms.txt in their production retrieval pipelines. A large-scale crawl attributed to SE Ranking (reported around late 2025, on the order of hundreds of thousands of domains) found no measurable lift in citations after implementation, and a Search Engine Land analysis reported that nearly all of the sites it looked at saw no traffic change. Verify both against the primary write-ups before quoting exact figures. Adoption is still low, on the order of a tenth of surveyed sites.

Should you ship one anyway? If you publish documentation, yes. The file is cheap to maintain (it is just Markdown), Mintlify and similar tools generate it automatically, and the downside is zero. If you run a marketing site or blog, the value is purely speculative. Ship it when the cost is low. Do not block other work on it.

A minimal example:

# Acme
> One-line description of what your product does.

## Docs
- [Quickstart](https://example.com/docs/quickstart): Get a new user productive in 10 minutes.
- [Concepts](https://example.com/docs/concepts): Mental model and core terminology.
- [API reference](https://example.com/docs/api): REST and GraphQL endpoints.

## Guides
- [Onboarding a team](https://example.com/guides/onboarding): A 30-day rollout plan.

Replace Acme and the URLs with your own. The Markdown header is the product or brand. The blockquote is the elevator pitch. Each section lists a small number of canonical entry points that you would want an LLM to read first when answering questions about your product.

While you are at it, decide your robots.txt policy for AI crawlers. Anthropic runs three separate bots: ClaudeBot for model training, Claude-User for user-initiated fetches, and claude-web for search indexing. OpenAI runs three too: GPTBot, ChatGPT-User, and OAI-SearchBot. Perplexity declares PerplexityBot and Perplexity-User, although Cloudflare has reported that it used undeclared crawlers when its declared ones were blocked (confirm the current state of that dispute against Cloudflare’s own write-up, since both sides have responded publicly).

A reasonable default for a site that wants citations but does not want to feed training data:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: claude-web
Allow: /

User-agent: Claude-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

This blocks training, allows indexing for search-style retrieval, and allows live user-triggered fetches. Reasonable people disagree on whether to allow training crawlers. Either choice is defensible.

Tools that track AI citations and when they are worth paying for

Server logs and GA4 tell you who clicked. They do not tell you whether an AI tool is even citing you in the first place. For that you need a citation-tracking product that runs prompts against the major chatbots on a schedule and parses the cited URLs.

Pricing for these tools moves quickly. The table below describes what each does and the rough tier; confirm current pricing on each vendor’s page before you commit.

Tool	What it does	Pricing model	Best for
Profound	Tracks brand mentions and citations across 10+ AI engines including ChatGPT, Gemini, Claude, Perplexity, Copilot, Grok, DeepSeek. Conversation Explorer surfaces real prompt patterns.	Entry tier for a single engine, enterprise pricing for full coverage (confirm current pricing)	Large teams that need broad engine coverage and prompt analytics.
AthenaHQ	Tracks AI-generated answers for brand mentions, citations, and competitor visibility. Credit-based.	Custom (contact vendor)	Mid-market teams comparing AI visibility against competitors.
Otterly.AI	Brand mentions and citations across Google AI Overviews, ChatGPT, Perplexity, AI Mode, Gemini, Copilot. Prompt monitoring and GEO audits.	Tiered by number of monitored prompts (confirm current pricing)	Small teams or solo operators getting started.
BrandRank.AI	Audits how AI engines portray your brand. Brand Vulnerability module flags misinformation and narrative drift.	Custom	Brand teams concerned about reputation in AI answers.

For most B2B SaaS marketing teams, Otterly’s mid prompt-count tier is a sensible starting point. Buy a year, monitor your top 50 prompts, and use the data to inform content priorities. Move up to Profound or AthenaHQ when you have a dedicated GEO/AEO function and need engine coverage that goes beyond the top five.

What AI Overview traffic looks like (and does not look like)

Google AI Overviews are a special case. When a user clicks a citation inside an AI Overview, the referring URL is a regular google.com search results page. GA4 records that as Organic Search. There is no separate channel because, from the browser’s perspective, nothing about the click is different from a normal blue link.

What you can do: open Search Console, switch to the Performance report, and look at impression-to-click ratios for queries where you know AI Overviews are showing. If your impressions are up significantly and your clicks are flat or down, you are seeing the AI Overview effect. Published industry studies have measured CTR reductions of roughly 30 percent or more for keywords with AI Overviews present, and Seer Interactive has reported organic CTR for affected queries falling to a fraction of historical norms. Treat any specific figure as directional, since rates depend heavily on query type and brand familiarity.

A note on Search Console data quality: there have been reports that Google over-reported impressions over a stretch of roughly the last year (one account dates the window from mid-2025 into early 2026). Confirm the exact dates and scope against Google’s own Search Central announcement before you rely on them. If the reports hold, close to a year of year-over-year comparisons sits on a shaky foundation, so treat 2025 Search Console trend data with extra scepticism when you build dashboards.

False positives and how to validate

The most common mistakes:

Matching openai.com at the root. That domain also serves API docs, the platform dashboard, and unrelated marketing pages. A click from a developer reading API docs is not an AI assistant referral. Either drop openai.com from your regex or scope it to specific paths.

Matching google.com for Gemini. Tempting and wrong. google.com is organic search. Use gemini.google.com and bard.google.com only.

Treating bing.com as Copilot. Most bing.com traffic is plain Bing organic search. Match the path bing.com/chat if you want Copilot specifically, or just rely on copilot.microsoft.com.

Counting bot traffic. If your GA4 property is not filtering known bots, you may see PerplexityBot or ClaudeBot show up as referral traffic in raw logs. GA4 strips most automated traffic by default, but server-side analytics platforms do not. Check that the user agent on your suspected AI sessions is a real browser, not a crawler.

To validate cleanly, take a representative day, pull every session GA4 attributed to the AI Assistant channel (or your custom equivalent), and spot-check the referrer string in raw logs. If 10 random sessions all look plausible, your rule is solid. If three of them are noise, tighten the regex.

A worked example: an AI-channel taxonomy

This is where governance matters. If you let every marketer define “AI traffic” their own way you will end up with three different definitions in three different dashboards. Pick one taxonomy, document it, and enforce it.

A clean way to model this, and the approach we recommend when teams build a taxonomy in Terminus, the marketing taxonomy governance platform, is to treat AI traffic as a top-level medium with a controlled set of source values, one per major chatbot. The taxonomy looks like this:

utm_medium	utm_source	Matches referrer hostname	Notes
`ai`	`chatgpt`	`chatgpt.com`, `chat.openai.com`	Includes ChatGPT Atlas browser traffic.
`ai`	`claude`	`claude.ai`	Most Claude clicks lack a referrer. Track with citation tools too.
`ai`	`perplexity`	`perplexity.ai`, `www.perplexity.ai`	Most reliable AI tool for referrer-based tracking.
`ai`	`copilot`	`copilot.microsoft.com`, `bing.com/chat`	Microsoft Copilot, formerly Bing Chat.
`ai`	`gemini`	`gemini.google.com`, `bard.google.com`	Google Gemini direct (not AI Overviews).
`ai`	`meta-ai`	`meta.ai`	Standalone Meta AI surface, not WhatsApp embedded.
`ai`	`grok`	`grok.com`, `x.ai`	xAI’s chatbot.
`ai`	`you`	`you.com`	Low volume but easy to capture.
`ai`	`deepseek`	`deepseek.com`	Useful for international audiences.

Two governance rules that make this work in practice. First, the source value list is closed. New entries get added via a documented change process, not by whoever ran the GA4 setup that quarter. Second, the medium value ai is reserved for chatbot referrals only. AI-assisted ad placements (think Google Performance Max with AI features) use paid or cpc as the medium with the platform as the source. That keeps the AI channel clean and comparable across reporting periods.

Codify it once. Reuse it in GA4, your data warehouse, your Looker dashboards, and every UTM your team builds. That is the whole point of taxonomy governance.

What to do this week

If you have not started yet:

Build the GA4 custom channel group with the regex above. Order it above Referral.
Open DebugView, click a citation from each of the five major AI tools, and confirm the session source matches.
Document your AI source values in whatever taxonomy system your team uses. Even a Notion page is fine to start.
Ship an llms.txt if you publish docs. Spend an hour, not a week.
Configure robots.txt for the three OpenAI bots, three Claude bots, and PerplexityBot. Make a real decision about training vs search vs user-fetch.
Set a calendar reminder for August 2026 to revisit the AI Assistant channel coverage list. Google will expand it. Your regex should still be the source of truth.

Accept that you are measuring a floor. Half or more of your real AI-driven traffic will not be attributable in 2026, and that is fine. The teams that win in this environment are the ones with the cleanest taxonomy and the most honest understanding of what their numbers represent, not the ones that chase the missing 50% into the ground.

FAQ

Does ChatGPT send a referrer?

Sometimes. ChatGPT web (chatgpt.com) sends a Referer header when a user clicks a citation in browse or search mode from the desktop browser. The native iOS and Android apps usually do not send one, and traffic from those clicks lands in your analytics as Direct. Expect to capture only a fraction of true ChatGPT-driven visits this way.

What user agent does ChatGPT use?

OpenAI runs three distinct user agents. GPTBot crawls for model training. OAI-SearchBot indexes pages for ChatGPT search. ChatGPT-User fetches a page in real time when a user asks a question that requires a live lookup. Each is independently controllable via robots.txt. Blocking GPTBot does not block the search bot or the user-fetch bot.

How do I see AI traffic in GA4?

As of 13 May 2026, GA4 has a native AI Assistant channel in the Default Channel Group. It triggers automatically when a session’s referrer matches Google’s recognised list (ChatGPT, Gemini, Claude confirmed). For broader coverage and explicit control, build a custom channel group with a referrer regex matching chatgpt.com, claude.ai, perplexity.ai, gemini.google.com, copilot.microsoft.com, and any others you care about.

What is llms.txt and do I need it?

llms.txt is a community-proposed file at your domain root that lists the URLs and short descriptions you want language models to prioritise. As of mid-2026 no major AI vendor has publicly committed to honouring it in production. Studies show no measurable lift in citations. Ship one if you publish documentation, because the cost is near zero. Do not expect it to move referral volume on its own.

Can I tell if Claude cited my blog post?

Not reliably from server logs alone. Claude.ai often does not pass a Referer header on outbound clicks, so cited-and-clicked traffic appears as Direct. The cleanest signal comes from a citation-tracking tool like Profound, Otterly.AI, or BrandRank.AI that runs prompts against Claude on a schedule and records the source URLs it cites. Cross-reference the cited URLs with your published content to confirm.

What is the AI traffic regex for GA4?

A practical 2026 pattern, matched against the session source / referrer hostname:

^(chatgpt\.com|chat\.openai\.com|perplexity\.ai|www\.perplexity\.ai|claude\.ai|gemini\.google\.com|bard\.google\.com|copilot\.microsoft\.com|deepseek\.com|grok\.com|x\.ai|meta\.ai|you\.com|poe\.com)$

Add or remove vendors based on your audience. Avoid matching bare openai.com or google.com, both of which pick up unrelated organic traffic.

Will AI traffic ever be a default GA4 channel?

It already is, as of 13 May 2026. Google added an AI Assistant channel to GA4’s Default Channel Group, with the medium ai-assistant applied automatically when the referrer matches a recognised AI tool. Coverage is narrow today (Google explicitly names ChatGPT, Gemini, and Claude), so a custom channel group is still useful for completeness.

Does iOS Link Tracking Protection affect AI referrals?

Not much. iOS 17 Link Tracking Protection strips click identifiers like gclid, fbclid, and msclkid in private browsing and certain other contexts. It does not strip utm_source, utm_medium, utm_campaign, or the Referer header itself. AI referral attribution that relies on referrer hostnames or UTMs is not the primary target.