Marketing teams love data and dashboards. So when a new wave of AI trackers promises insights into business performance inside AI platforms, it sounds exciting. Unfortunately, the data will never show you what you want it to.
The idea is seductive.
“Track how often your brand appears in AI answers.”
“Measure your visibility inside ChatGPT.”
“Understand your share of voice in AI conversations.”
It sounds like the next evolution of SEO and brand monitoring, but while the logic seems familiar, unfortunately, the game has changed significantly, so we can no longer track the same visibility metrics.
It’s nonsense, not because the tools are badly built, but because the thing they’re trying to measure doesn’t exist in a stable, observable form.
Tracking brand mentions in AI isn’t just noisy; it’s structurally impossible to do it well right now. And no tool, however polished, can fix that.
This article is not a hit piece on any one company. It is a blunt look at why this whole category struggles to help serious marketers and business owners who need reliable AI visibility data.
AI answers aren’t shared experiences
Search results on Google and Bing aren't perfect, but they’re at least fairly easy to compare.
If two people search the same keyword, in roughly the same place, they’ll see something recognisable. Rankings can shift, personalisation creeps in, but there’s still a good common reference point.

AI doesn’t work like that. Every response is shaped by:
- What the person is searching for. We can see the data for search engines but not for AI conversations - the framing has moved from keywords and keyphrases to full, deep and detailed questions. Everyone is different, so the answers will vary.
- Prior conversation history has a huge impact on the answer. Nobody has the same history, so we all see different answers.
- Inferred intent - depending on how the question is asked, it will change the framing completely.
- Location and language nuance
- Model-side experimentation
- Memory and preference signals
- When the data was collected
- Where the data was gathered
Two people can ask what appears to be a similar question and get entirely different answers. Different examples. Different framing. Different brands.
So when a tracker says, “Your brand appears in 300 AI conversations,” the only sensible response is: whose conversations? What about?
There is no stable baseline. Without a baseline, you’re not measuring visibility; you’re collecting anecdotes at scale, which is ironic because the scale you can collect these anecdotes is still tiny in comparison to the scale you'd need to collect to make it meaningful.
AI outputs aren’t repeatable, so trends are fiction
Ask an LLM the same question twice, and you'll get different answers. The structure changes. The examples change. Sometimes even the conclusion changes.
That’s not a bug. It’s how probabilistic and predictive models work.
AI trackers rely on sampling and repetition to infer trends. But when the underlying system is non-deterministic, repetition doesn’t reveal truth; it just produces different dice rolls.
Charts go up. Charts go down. None of it reliably maps to user reality. You’re not watching trends or movement. You’re watching variance and calling it insight.
You can’t see the real questions people are asking
This is the fatal flaw that most tools quietly step around.
In search, we can discover keywords, model demand, estimate coverage, and ultimately, analyse coverage.

In AI, you have no idea:
- What people are actually asking
- How questions evolve mid-conversation
- Which prompts matter commercially
Queries aren’t short keywords. They’re messy, conversational, multi-layered, and often unguessable.
AI trackers invent a small, artificial query set and extrapolate confidence from it. It's theatre, ultimately, and not science.
You’re measuring what could be asked, not what is being asked, and guessing at the answers.
Brand mentions don’t equal sentiment or impact
Even if your brand appears, you still don’t know:
- Why was it mentioned
- Whether it was framed positively, neutrally, or cautiously
- Whether it helped or harmed perception
- Whether the user trusted it or ignored it
LLMs hedge constantly. They compare. They list options. They soften language.
A brand included in a paragraph alongside five others isn't always a recommendation. A warning wrapped in polite language is still a warning.
Sentiment analysis on AI answers is guesswork stacked on top of guesswork.
There is no exposure data, so the numbers float in space
Every serious channel gives you some sense of exposure:
- Impressions
- Reach
- Frequency
- Clicks
AI platforms give you none of that.
You don’t know how often an answer was shown, whether anyone read it, or whether it influenced a decision.
There’s no denominator, so the metric can’t mean anything.
You’re measuring outside the system, blindfolded
Until AI platforms provide brands with first-party analytics, citation reporting, referral transparency, and query class data, everything else is reverse-engineering shadows.
AI trackers don’t observe reality. They infer it from the outside and sell confidence through visualisation.
That’s not analytics. It’s speculation with a UI.
What actually matters instead
If AI visibility matters, it will show up downstream. Not in dashboards, but in referral traffic from AI platforms, brand search lift, assisted conversions, and sales conversations that mention AI as the source.
Those signals are imperfect; they’re delayed, they’re messy. But they're also vital. If AI is influencing demand, your pipeline will feel it.
The best way of thinking about it, in my opinion, is that if you need a tracker to prove it, it probably isn’t happening.
Why AI tracking tools fail
AI brand tracking isn’t failing because the tools are immature. It’s failing because:
- There’s no shared surface
- No fixed query set
- No impression data
- No platform feedback
- No stable outputs

It's not a tooling gap, it's a physics problem. Until AI platforms open up first-party data, the smartest move isn’t finding a unicorn tool; it’s restraint.
Focus on being genuinely useful. Focus on being cited naturally. Focus on what shows up in traffic, leads, and revenue. Work on your brand, coverage, and messaging.
Three persistent illusions that keep AI trackers alive
These tools survive because they tap into expected analysis, and they provide comfort. I see the same three illusions keep coming up.

Illusion 1: More metrics mean more control
AI trackers love complexity, more segments, more filters. More “AI-specific” views and slices.
They feel precise. They look sophisticated in a board deck. But precision built on unstable inputs is just decoration.
If the underlying feed is based on guessed queries, shifting models, non-repeatable outputs, and loose attribution, piling on detail doesn’t fix the problem; it hides it. You end up debating decimal points on numbers that were never solid to begin with.
Illusion 2: Benchmark gaming
As soon as benchmarks exist, behaviour bends around them. Vendors can so easily tune data to score well. Users can design prompts to “win” tests. Teams can optimise for benchmark lift instead of commercial impact.
You see the same pattern everywhere: AI visibility scores creep up, internal reports look healthier, and nothing meaningful changes in leads, pipeline, or revenue.
When success is defined by narrow, synthetic tests, people optimise the test and not the outcome.
Illusion 3: Sampled answers represent reality
AI trackers rely on a small set of sampled prompts and outputs, then present the results as if they describe what users actually experience - they don’t.
AI answers are non-deterministic, heavily personalised, and shaped by prior context you cannot see. The same question can produce different brands, different framing, and different conclusions from one moment to the next. Sampling a handful of prompts and calling it “AI visibility” assumes stability that simply isn’t there.
What these tools really measure is how a model responded in a controlled test environment, not how real users encounter brands in live conversations. That gap matters a lot!
Treating sampled AI outputs as representative insights is like testing a single search result and assuming it explains the entire SERP landscape.
Conclusion
AI trackers sell a comforting idea: if something can be tracked, it can be controlled. With AI, that promise falls flat. These tools try to pin stable numbers onto systems that shift at extremely high speed, from LLM performance drift, through to constant model releases and rapid changes in answers. The result is a stack of graphs that might look solid, but they wobble under light pressure.
There is a better option. Keep measuring traffic, leads, and revenue with the analytics tools you already trust. Let those numbers tell you whether content and campaigns are working, instead of chasing misleading AI usage reports or AI tool popularity myths.
Yes, that means leaving some things untracked. False measurement creates more risk than honest gaps.
