top of page

Metehan & Gabe: How Common Crawl Rank might influence your Authority in AI visibility

  • 22. Jan.
  • 2 Min. Lesezeit

Aktualisiert: 26. Jan.

Key Takeaways:

  • Metehan analyzed Common Crawl (CC) data and found a correlation to citation frequency in LLM Chats such as ChatGPT, Perplexity and others:

    • Most major LLMs were trained on CC data (64% of models studied, 80%+ of GPT-3 tokens)

    • CC prioritizes high-authority domains in its crawling via Harmonic Centrality

    • These same domains tend to be cited most frequently by LLMs

  • Feel free to use his tool to analyze Common Crawl authority scores for your domain or industry: https://webgraph.metehan.ai/

  • Common Crawl crawls the web and provides analytical data such as authority scores for domains:

    • Harmonic Centrality (HC) that measures how “close” a domain is to all other domains in the link graph

    • PageRank - rank of a page based on the quality and quantity of incoming links

    • Domain Rankings - ranking of domains based on the mentioned metrics This data is released every month (covering appr. 94-163 million domains per crawl period) and represents one of the largest publicly available authority datasets.

  • Kopp raised attention for a patent that was described by Google from 2015: "Producing a ranking for pages using distances in a web-link graph"

    • Seed pages are selected based on their position in the web graph and considered important

    • The outgoing links from seed pages to other pages in the set to be ranked are critically important

    • Thus working with the web graph to identify relevant seed pages for your industry or rather close neighbours to address and generate/earn links or mentions from



Top 20 Domains (October-November-December 2025):


Example: Insurances Germany 2025 - sorted by HC Rank

Domain

Sub-domains

HC Rank

HC Trend

CC Rank

CC Trend

5

7.369

+3.123

6.062

-1.732

76

10.686

-1.798

6.443

-370

8

13.338

-1.751

5.134

+667

132

15.710

+22.886

4.077

-423

18

19.155

+328.011

22.414

+4.024

215

21.314

-4.737

21.720

+1.069

7

29.226

+34.597

72.368

+16.016

3013

31.227

+342.094

34.346

-1.929

344

43.631

+668.189

61.428

-3.294

2

62.320

-3.004

55.111

+615

17

77.200

-4.268

161.590

-26.410

5

93.794

-22.917

128.134

-13.681

15

422.501

+36.467

28.542

+1.616

34

433.325

-330.981

55.203

+2.590

22

437.305

-8.500

46.309

+5.591

5

449.630

+2.467.213

144.098

+160.840

19

454.666

+65.197

53.146

+9.543

29

471.036

-411.855

75.468

-20.599

5

480.006

-39.945

64.376

-2.061

19

503.423

+281.306

159.222

+129.511

341

512.536

+195.967

74.697

-17.039

2

550.813

+244.427

91.925

+12.771

13

551.071

+2.293.850

106.593

+2.403

4

578.180

-461.829

126.621

-25.640

8

638.127

+161.542

73.106

-9.404

7

922.227

-213.647

83.593

+68.304

5

1.259.191

+348.468

162.731

+73.867

13

1.440.449

+81.071

188.669

-36.774

13

2.546.977

-1.731.620

334.430

+325.784

23

2.733.178

+1.301.347

194.200

+43.684

11

2.736.198

-1.966.447

290.086

-83.430

8

7.384.759

-6.503.663

96.936

+20.177

2

8.399.916


1.485.122


3

9.097.511

-6.014.790

311.634

-110.495



Sources:

© 2026 David Epding.            Erstellt mit Wix.com.

david epding logo

David Epding ist GEO & SEO, Data Analytics und Automation Manager mit über 10 Jahren Erfahrung in Technischem SEO mit breiter Expertise für LLMs und langjähriger Erfahrung in der Daten-Analyse.

bottom of page