Metehan & Gabe: How Common Crawl Rank might influence your Authority in AI visibility
- 22. Jan.
- 2 Min. Lesezeit
Aktualisiert: 26. Jan.
Key Takeaways:
Metehan analyzed Common Crawl (CC) data and found a correlation to citation frequency in LLM Chats such as ChatGPT, Perplexity and others:
Most major LLMs were trained on CC data (64% of models studied, 80%+ of GPT-3 tokens)
CC prioritizes high-authority domains in its crawling via Harmonic Centrality
These same domains tend to be cited most frequently by LLMs
Feel free to use his tool to analyze Common Crawl authority scores for your domain or industry: https://webgraph.metehan.ai/
Common Crawl crawls the web and provides analytical data such as authority scores for domains:
Harmonic Centrality (HC) that measures how “close” a domain is to all other domains in the link graph
PageRank - rank of a page based on the quality and quantity of incoming links
Domain Rankings - ranking of domains based on the mentioned metrics This data is released every month (covering appr. 94-163 million domains per crawl period) and represents one of the largest publicly available authority datasets.
Kopp raised attention for a patent that was described by Google from 2015: "Producing a ranking for pages using distances in a web-link graph"
Seed pages are selected based on their position in the web graph and considered important
The outgoing links from seed pages to other pages in the set to be ranked are critically important
Thus working with the web graph to identify relevant seed pages for your industry or rather close neighbours to address and generate/earn links or mentions from





Top 20 Domains (October-November-December 2025):
Rank | Domain | HC Rank | PageRank | |
1 | ||||
2 | ||||
3 | ||||
4 | ||||
5 | ||||
6 | ||||
7 | ||||
8 | ||||
9 | ||||
10 | ||||
11 | ||||
12 | ||||
13 | ||||
14 | ||||
15 |
Example: Insurances Germany 2025 - sorted by HC Rank
Domain | Sub-domains | HC Rank | HC Trend | CC Rank | CC Trend |
|---|---|---|---|---|---|
5 | 7.369 | +3.123 | 6.062 | -1.732 | |
76 | 10.686 | -1.798 | 6.443 | -370 | |
8 | 13.338 | -1.751 | 5.134 | +667 | |
132 | 15.710 | +22.886 | 4.077 | -423 | |
18 | 19.155 | +328.011 | 22.414 | +4.024 | |
215 | 21.314 | -4.737 | 21.720 | +1.069 | |
7 | 29.226 | +34.597 | 72.368 | +16.016 | |
3013 | 31.227 | +342.094 | 34.346 | -1.929 | |
344 | 43.631 | +668.189 | 61.428 | -3.294 | |
2 | 62.320 | -3.004 | 55.111 | +615 | |
17 | 77.200 | -4.268 | 161.590 | -26.410 | |
5 | 93.794 | -22.917 | 128.134 | -13.681 | |
15 | 422.501 | +36.467 | 28.542 | +1.616 | |
34 | 433.325 | -330.981 | 55.203 | +2.590 | |
22 | 437.305 | -8.500 | 46.309 | +5.591 | |
5 | 449.630 | +2.467.213 | 144.098 | +160.840 | |
19 | 454.666 | +65.197 | 53.146 | +9.543 | |
29 | 471.036 | -411.855 | 75.468 | -20.599 | |
5 | 480.006 | -39.945 | 64.376 | -2.061 | |
19 | 503.423 | +281.306 | 159.222 | +129.511 | |
341 | 512.536 | +195.967 | 74.697 | -17.039 | |
2 | 550.813 | +244.427 | 91.925 | +12.771 | |
13 | 551.071 | +2.293.850 | 106.593 | +2.403 | |
4 | 578.180 | -461.829 | 126.621 | -25.640 | |
8 | 638.127 | +161.542 | 73.106 | -9.404 | |
7 | 922.227 | -213.647 | 83.593 | +68.304 | |
5 | 1.259.191 | +348.468 | 162.731 | +73.867 | |
13 | 1.440.449 | +81.071 | 188.669 | -36.774 | |
13 | 2.546.977 | -1.731.620 | 334.430 | +325.784 | |
23 | 2.733.178 | +1.301.347 | 194.200 | +43.684 | |
11 | 2.736.198 | -1.966.447 | 290.086 | -83.430 | |
8 | 7.384.759 | -6.503.663 | 96.936 | +20.177 | |
2 | 8.399.916 | 1.485.122 | |||
3 | 9.097.511 | -6.014.790 | 311.634 | -110.495 |
Sources:


