LLM Traffic Monitoring: The Three Signals (Training, Citations, Referrals)
What LLM Traffic Actually Is
LLM traffic is often discussed as a single number, and that framing leaves a lot on the table. On your own site, AI activity actually shows up as three distinct signals you can measure directly, each produced by a different kind of AI behavior and each pointing to different work. A fourth signal, Share of Voice, is measured off-site. It is a less reliable performance dimension, but it is the approach most teams currently use for the off-site view. As teams focus on AI channel growth, we believe brands should orient around this three-signal model for measuring performance.
Each signal is worth tracking separately, ordered by where the user is when each happens: training time before any user is involved, a live conversation where a user is asking a question right now, and a click-through where a user is on your site. This three-signal model captures how AI actually interacts with a site, which is what publishers and ecommerce operators need to make decisions against. We’ve built AI Channel Analytics around the same model.
Once you think in three signals instead of one number, the questions teams ask most often, how to monitor LLM traffic, what belongs on the dashboard, and which numbers are most useful, become a lot easier to answer.
Why GA4 and Other Client-Side Analytics Tools Have a Hard Time With This
Before we get into the signals, a quick word on what existing analytics tools can and cannot show. GA4 is the dominant tool, and it has well-known limits when it comes to AI traffic. The same limits apply to Adobe Analytics, Mixpanel, Amplitude, Heap, Plausible, Fathom, Matomo, and any other tool that depends on a JavaScript tag firing in a real browser. The reason is architectural rather than configurable, so trying a different tool tends to land you in the same place.
We covered the full picture for GA4 specifically in LLM Traffic Is a Blind Spot in Your Analytics, so here is the short version. Each signal interacts with client-side analytics differently.
- Training crawls are not visible because the bots do not execute JavaScript. OpenAI, Anthropic, and Google AI training fetches do not trigger client-side tags by design. GA4, Adobe, Mixpanel, Amplitude, Heap, and the privacy-first tools like Plausible and Fathom all share this gap because none of them log a request unless a browser runs their tag.
- Conversation citations happen entirely off your site. The AI fetches your page server-side on a user’s behalf and renders the answer inside the chat. No browser opens, no analytics event fires, so this activity does not surface in any client-side tool.
- Real user referrals are partially visible across every client-side tool, but typically undercounted by 2.5x to 5x. Mobile LLM apps render outbound links in isolated WebViews that strip the referrer. Gemini and Claude pass no attribution signal at all on most platforms tested. Google AI Overviews are bucketed under organic search, which makes them difficult to separate. The same bucketing shows up the same way whether the report you are reading is in GA4, Adobe, or a privacy-first alternative.
The fix is not a better tag, a cleaner UTM strategy, or a switch to a different client-side tool. The fix is server-side capture at the edge, classified by user-agent, verified IP range, and reverse DNS, and stitched together per AI surface. That is the only way to see all three signals at once, and it is independent of whichever client-side analytics tool you keep running for the rest of your traffic.
Signal 1: LLM Training
LLM Training is AI reading your brand and content to feed the next version of its models. Training crawlers from OpenAI, Anthropic, Google, Common Crawl, and ByteDance fetch your pages on a continuous schedule, ingest the content, and roll it into the next training cycle. This is the foundation of every later interaction. If a model has not absorbed your page, it cannot cite you, cannot recommend your product, and cannot send a user your way.
This is also the signal GA4 and every other client-side analytics tool has zero visibility into. Training crawls are visible only in your server-side request logs.
AI training crawls are now at search-engine scale
The volume context most teams miss: AI training crawl volume on a typical content-rich site now rivals the volume of crawls from leading search engines. OpenAI, Anthropic, Google, Common Crawl, and ByteDance training crawlers together produce request volumes on the same order of magnitude as Google Search and Bing combined. On many sites, AI training fetches already exceed search-engine fetches in frequency.
This is not a minor or side-channel signal anymore. The crawlers that decide what AI tools know about your brand are arriving at the same intensity as the crawlers that decided what Google Search knew about your brand for the last twenty years.
What to look at
With the right tracking in place, the training signal breaks down across several dimensions that matter for action:
- Daily volume by AI engine over rolling 30-day and all-time windows.
- Training coverage as a percentage of the high-value pages on your site, broken out by section.
- Top fetched pages with link-out, refresh frequency, and which AI surfaces are pulling them.
- Page-type rollup so you can see at a glance whether AI is reading your product pages, your category pages, your articles, or all three.
- AI engine breakdown. OpenAI may dominate while Google’s AI training is silent; Anthropic may stick to your blog while Common Crawl blankets the catalog.
- Media types. This is the dimension most teams forget. More on it next.
Media as Training Data: Beyond the Article Body
The biggest gap in most teams’ thinking is that training is not just text. AI bots also fetch your images, video, and audio. Each is a separate training surface, and each carries different signal value.
With better tracking in place, you can distinguish pages from non-page media to see what each AI engine actually consumes:
The practical implication: your alt text, transcripts, and on-page schema are not just SEO hygiene anymore. They are training inputs. The clearer they are, the more accurately a model represents your brand and content to a future user. The Shopify side of this story, where product feeds and structured data become the training surface, is covered in Shopify Agentic Plan: Product Data Beyond Your Control.
KPIs that matter
The right question for the training signal is not “how much are we being crawled” but “is the right content being crawled, often enough, by the AI engines that matter.” Methodology for getting from raw logs to that question is in AI Bot Behavior: A Log Analysis Methodology.
Signal 2: Conversation Citations, the Real-Time Layer
Conversation Citations are AI fetching your page mid-chat to answer a user’s live question. These are the live fetches from ChatGPT, Claude, and Perplexity. Every fetch is, by definition, a moment where someone asked a question and the model decided your page was the best answer. This is the highest-intent of the three signals because the question is being answered right now.
It is also the signal that proves which content is actually reference-grade in the AI’s view. Training tells you what was absorbed. Citations tell you what gets used.
What to look at
The citation signal breaks down along the same dimensions as the training signal, but the meaning of each shifts:
- Daily volume of live fetches per assistant. A spike on a specific page often correlates with a topical news cycle or a new comparison question being widely asked.
- Citation coverage, the share of your reference-grade content that has been fetched at least once in the last window.
- Top fetched pages with the assistant that fetched them. Surprises here are the rule, not the exception. The pages AI cites are rarely the pages you would expect.
- AI surface breakdown. Different assistants prioritize different content. ChatGPT and Perplexity tend to favor structured product and reference content; Claude tends to favor long-form expository writing.
- Media types. Live citation fetches are mostly pages; images and video are cited less often but are growing as multimodal answers get common.
Why this signal is uniquely valuable
A live citation is the closest thing to a vote of confidence the AI ecosystem produces. It is the model saying, in front of a real user, “this page is the right source for this question.” If you were tracking only one signal, this would be it.
The catch is that none of it is visible to GA4 or any client-side tool, because the fetch happens server-side and the user never lands on your page. The only way to see citations is at the edge.
Citations vs. Share of Voice (SOV)
Share of Voice is the off-site alternative most teams currently use when they cannot see server-side citations. It is a measurement of how often your brand shows up in AI answers across a basket of representative prompts. A growing category of tools, including Profound, Otterly, Peec, Evertune, AthenaHQ, BrandRank.AI, Ahrefs Brand Radar, and the AI modules inside Semrush and BrightEdge, query ChatGPT, Claude, Gemini, and Perplexity at scale, count brand mentions, and report a share relative to a competitive set.
It is a useful directional signal, but our preference for measuring AI channel performance is the citation signal above, because it captures actual fetches at the page level by the surface that did the fetching. SOV has structural blind spots that citations do not.
SOV is probabilistic, citations are observed. AI models give different answers to the same prompt depending on temperature settings, conversation context, and time of day. Run the same query twice and you can get two different sets of brand mentions. SOV tools mitigate this by sampling and averaging, but the underlying volatility is real, and the precision of any single number is lower than most reports suggest. Citation data, by contrast, is a server log of actual fetches: each one happened, and each one is countable.
SOV samples prompts; citations are anchored to real conversations. A SOV tool runs a basket of prompts the analyst chose. Real users ask different questions, in different ways, with different prior context. The sample may or may not match what your customers actually ask, and most tools do not make their prompt set transparent. A live citation, by definition, came from a real user asking a real question, so there is no sampling bias.
Models change. Each new model release, system-prompt tweak, or retrieval change shifts what gets cited. A SOV figure measured against last quarter’s models is a different measurement from one taken against this quarter’s, and most tools are not transparent about model versioning in their reports. Citation activity adjusts in real time because it is captured directly from the fetch, with no inference layer in between.
The cleanest way to use SOV is alongside the citation signal, not as a replacement. SOV is a directional, off-site benchmark of how often your brand surfaces in answers. Citations are the verified, on-site record of which pages AI actually used to produce them. Used together, the off-site simulation and the on-site truth give a fuller picture than either alone.
KPIs that matter
A typical pattern, drawn from real publisher and ecommerce deployments: a small set of evergreen pages produces the bulk of citations, while the homepage rarely makes the top 50. If the model has decided your in-depth comparison guide is the right answer to a question, it will fetch that page hundreds of times a week and not your homepage at all. That is the operational signal you want.
Signal 3: Real Users, the High Intent Referral
AI engines are research engines, not interruption channels. A user clicking a citation in ChatGPT, Claude, Perplexity, or Copilot has already asked their question, evaluated the answer, and chosen your page as the next step. By the time they land on your site, they are further down the consideration curve than a user from any channel that interrupted them. Across our deployments, that translates into higher conversion rates and shorter consideration windows than social, display, and often even paid search. The framing matters because AI-referred users are pre-qualified by the time they arrive; for the broader buyer-journey context, see AI Is a Research Engine, Not a Sales Channel.
This is also the only signal a traditional analytics tool can see at all, and the one most teams default to when they hear “LLM traffic.” It is also the signal where the WebView gap and the Gemini/Claude no-referrer gap make GA4 underreport by a factor of 2.5 to 5, so the high intent is going more uncounted than any other channel on your site.
What to look at
- Sessions by source assistant, classified server-side rather than relying on referrer headers.
- Landing page distribution. AI-referred users tend to land on deep pages, not the homepage.
- Conversion rate by source, benchmarked against your organic baseline. Across our deployments, AI-referred conversion typically runs higher than social on a per-session basis.
- Time-to-purchase windows. ChatGPT users tend to convert same-day; Perplexity users often take three to five days; Gemini sits in between.
- Revenue attribution. Verified IP-to-order matching for the cleanest cases, probabilistic matching for the rest.
How to measure referral traffic from LLMs in Google Analytics (the honest answer)
The short answer is: you cannot, completely. UTMs are present from ChatGPT but not from Gemini or Claude. Referrers are present from desktop browsers but stripped on mobile apps. AI Overviews show up under organic search with no separation capability. You can build a partial picture by filtering on utm_source=chatgpt.com and on chatgpt.com, claude.ai, perplexity.ai, gemini.google.com, and copilot.microsoft.com referrers, but you will be looking at roughly a quarter of the actual traffic.
The honest answer is to capture this signal server-side. The full attribution map, with every device-by-device test we ran, is in LLM Traffic Is a Blind Spot in Your Analytics.
Mobile is where the LLM attribution gap is largest
Here is the part of the referral undercount that matters most: the device most of your audience uses is also the device where AI attribution is hardest to see clearly. Mobile drives 70 to 90 percent of consumer-facing site sessions, and it is where the LLM apps live. Across the platforms we tested, mobile is the least-tracked surface for every LLM that matters.
The structural reason is that mobile LLM apps render outbound links in isolated WebViews, which strip the referrer and detach the session from any prior browser context. iOS WebKit’s Intelligent Tracking Prevention compounds the problem on iPhone, where the WebView is subject to cookie restrictions even when the conversion happens inside it.
The result is that a referral-only view of LLM traffic ends up looking mostly like a desktop view, even though the actual audience is mostly mobile. Cross-platform comparisons drawn from GA4 numbers are easier to interpret once that mobile gap is taken into account. The full device-by-device matrix, including the desktop scenarios, is in LLM Traffic Is a Blind Spot in Your Analytics.
KPIs that matter
For a deeper look at how this fits the broader buyer journey, see AI Is a Research Engine, Not a Sales Channel. The framing matters because AI-referred users are often researchers in mid-funnel, and judging the channel only on same-session conversion will undervalue it.
From Signal to Action: How LLM Traffic Data Drives Impact
Measurement only earns its keep when it leads to action. With the three signals tracked together, two recommendation patterns reliably surface from the cross-signal data, plus a running log a team can keep of what has been worked on.
Content Freshness
The top training-crawled URLs over the last 30 days, ranked by fetch frequency. The recommendation is direct: keep them accurate, because models will learn whatever is on the page right now and represent your brand accordingly to future users. A stale page that OpenAI is training on nightly is a stale page that ChatGPT will misrepresent for the next training cycle.
Action: review each top-crawled page on a recurring cadence, fix anything outdated, and keep a record of what was reviewed and when so freshness becomes a tracked workflow rather than a one-off audit.
Pages Crawled But Not Cited
Pages AI training bots are fetching but that have produced zero citations and zero referrals in the same window. Split into Products and Categories for ecommerce, into article types for publishers. This is the citation gap, and it is usually the highest-leverage pattern to act on.
The bots have read the page. The model has not chosen to cite it. The fix is almost always the same family of changes: clearer titles, better descriptions, FAQ-style copy, and JSON-LD that names the product, the audience, and the answer to the obvious question.
Action: rewrite the page to be more citable, then track the citation signal on that URL over the next two weeks to verify the fix.
The cross-signal action map
Once you have all three signals captured, four patterns surface and each points to specific work. Each cell shows whether that signal is active (the page is showing up there), missing (the signal is absent), weak (the signal is there but underperforming), or n/a when the question does not apply for the pattern.
This is what we mean by “LLM traffic data drives action.” When the three signals sit side by side, each pattern points to a specific kind of work, so a team can prioritize a change to ship this week and check the impact next week.
How to Benchmark Your Content Coverage for Training and Citations
In the Google Search era, indexation was the foundational health metric. Before a page could rank, earn clicks, or convert, it had to be in the index. Coverage is the same metric for the LLM era, and it deserves to be tracked as its own thing, not folded into any one signal.
Coverage is the percentage of the content you care about that AI is actually reading and using. It is the closest thing to a north-star number for the AI channel because it sits upstream of every other signal. If a model has not absorbed your page, AI cannot cite it. If a page is never cited, AI cannot send a user to it. Coverage is the gate that decides whether the rest of the funnel is even possible.
Treating coverage as its own metric, distinct from any one of the three signals, is what makes the AI channel measurable in a way teams familiar with organic search already understand. The question shifts from “are we being crawled” to “is enough of the right content reaching the answer.” Coverage splits cleanly along the first two signals.
Training coverage benchmark
Define your content universe of record first. For an ecommerce team this is typically all active product pages, all category pages, and all evergreen guides. For a publisher it is the article archive plus reference and topic landing pages. The universe is the denominator.
Then ask: in the last 30 days, what percentage of those URLs were fetched at least once by a verified training crawler from any of the major AI engines? That is your 30-day training coverage. A healthy site with reasonable internal linking and a clean sitemap should run 90 percent or higher. Below that, you have a discoverability problem: the bots cannot find or do not return to a meaningful slice of your inventory.
Then split the same number by AI engine. Per-engine coverage is where the leverage lives. OpenAI might cover 95 percent while Google covers 40 percent and Anthropic covers 70 percent. That spread tells you exactly where to invest in surface-specific access (robots.txt review, sitemap submission, structured data improvements) and which models will represent your brand accurately versus poorly when a future user asks.
Citation coverage benchmark
Citation coverage is built the same way but against a tighter denominator: your reference-grade content, the pages you would expect AI to cite if it understood your site correctly. For an ecommerce team this is typically the guides, comparisons, and FAQ-style content, not the product pages themselves. For a publisher it is your evergreen and reference articles.
Ask: in the last 30 days, what percentage of those reference URLs received at least one live fetch from ChatGPT, Claude, or Perplexity? That is your 30-day citation coverage. A site whose reference content is well-structured, well-titled, and JSON-LD enriched should run 60 to 80 percent. Below 40 percent is a citability problem, almost always solvable by clearer titles, better descriptions, FAQ-style copy, and more structured data.
Then split by assistant. ChatGPT will dominate volume on most sites; Claude will be under-represented in any tool that does not handle unverifiable bot traffic correctly (see the Anthropic note above); Perplexity will over-index on structured product and reference content.
What healthy looks like overall
Exact thresholds vary by archive depth, content mix, and audience, but the pattern of a healthy profile is consistent enough to publish:
- Training coverage of your top 100 pages: at or near 100 percent on a 30-day window, with at least three to four AI engines actively fetching.
- Citation coverage of your reference-grade content: 60 to 80 percent on a 30-day window. Below 40 percent points to citability gaps in titles, descriptions, and structured data.
- Top citation pages: evergreen, in-depth, reference-grade content. The homepage should not be in the top 20.
- Citation concentration: 60 to 80 percent of all citations on your top 30 pages is normal. Concentration on your top 5 only means your reference surface is too narrow. If your numbers diverge sharply from this pattern, the diagnosis is usually structured-data gaps, content the model does not consider citable, or a coverage problem that internal linking and sitemaps can fix.
For the foundational primer on AI visibility before any of this, see Understanding AI Visibility. For the seven-KPI framework that maps neatly onto the three-signal model, see AI Performance Metrics: Seven KPIs Every Brand Should Track.
Stop Estimating, Start Activating
When LLM traffic is reported as a single number, a lot of useful context goes missing. Strong content can look quieter than it really is because most of its impact lands outside the analytics tool. Promising performance is hard to trace back to the upstream activity that earned it. Teams end up leaning on indirect signals because the direct ones are not visible yet.
Three signals on your site fill that picture in. Training shows you what AI is absorbing about your brand and content. Citations show you which pages AI is reaching for to answer real questions in real time. Referrals show you the high-intent users that AI sends through. Sitting upstream of all three is Coverage, the LLM-era equivalent of indexation, and the gate that decides whether the rest of the funnel is even possible.
Read together, the three signals give a team a clear cross-signal pattern to act on each week. AI superstars are pages to protect, citation gaps are pages to rewrite, click-through problems point to load speed and snippet framing, and coverage gaps point to discoverability. Off-site, Share of Voice is a useful directional benchmark for the conversations happening in AI answers, but for measuring real performance impact, the on-site three-signal model is what we believe brands should orient around as they invest in AI channel growth.
Together, they turn the AI channel from something to estimate into something your team can plan, measure, and act on with confidence.