Michalik (Claneo): Experiment shows Answer Engines can be gamed too easily

19. Feb.
2 Min. Lesezeit

Aktualisiert: 6. März

Michalik (Claneo): Experiment shows Answer Engines can be gamed too easily

Key Takeaways:

Michalik created 3 listicles for a made up matcha tea that got listed in most major AI chats
- Claude and AIO cited the fake Matcha quite frequently
- ChatGPT and AI Mode also cited the fake product from time to time
- Gemini did the best job in not mentioning the fake product
- Now all chats caught up, still, way to easy to game in the short run
A while ago Fishkin reported that only in <1% of the runs of the same prompt the KI chat returns the same list of recommended brands
- AIs rarely give the same list of brands or recommendations twice (<1 in 100 times, no matter the question)
- AIs almost never give the same list of brands/recs in the same order, even in spaces with limited options like LA Volvo dealers or SaaS Cloud Computing Providers (<1 in 1,000 times, no matter the question)
- These tools are probability engines: they’re designed to generate unique answers every time. Thinking of them as sources of truth or consistency is provably nonsensical.
- Users almost never craft similar prompts, even when they have the same intent. The variation of brands/recs in AI answers around a space in the messy wilds of AI prompting is likely much higher than what our controlled experiments revealed here.
- Measuring your brand’s presence in AI answers with precision is a fool’s errand. You can, with enough prompts run enough times, get a dartboard-pattern-like answer comparing you with others. I’ve been swayed from my initial position and now believe visibility % across dozens to hundreds of prompts run multiple times is a reasonable metric.
- But, any tool that gives a “ranking position in AI” is full of baloney.

Sources: