Michalik (Claneo): Experiment shows Answer Engines can be gamed too easily
- 19. Feb.
- 2 Min. Lesezeit
Aktualisiert: 6. März

Key Takeaways:
Michalik created 3 listicles for a made up matcha tea that got listed in most major AI chats
Claude and AIO cited the fake Matcha quite frequently
ChatGPT and AI Mode also cited the fake product from time to time
Gemini did the best job in not mentioning the fake product
Now all chats caught up, still, way to easy to game in the short run
A while ago Fishkin reported that only in <1% of the runs of the same prompt the KI chat returns the same list of recommended brands
AIs rarely give the same list of brands or recommendations twice (<1 in 100 times, no matter the question)
AIs almost never give the same list of brands/recs in the same order, even in spaces with limited options like LA Volvo dealers or SaaS Cloud Computing Providers (<1 in 1,000 times, no matter the question)
These tools are probability engines: they’re designed to generate unique answers every time. Thinking of them as sources of truth or consistency is provably nonsensical.
Users almost never craft similar prompts, even when they have the same intent. The variation of brands/recs in AI answers around a space in the messy wilds of AI prompting is likely much higher than what our controlled experiments revealed here.
Measuring your brand’s presence in AI answers with precision is a fool’s errand. You can, with enough prompts run enough times, get a dartboard-pattern-like answer comparing you with others. I’ve been swayed from my initial position and now believe visibility % across dozens to hundreds of prompts run multiple times is a reasonable metric.
But, any tool that gives a “ranking position in AI” is full of baloney.


Sources:


