1. 5 DISCUSSING
  • Rand Fishkin   Aug 10 2012   Flag

    Really good, detailed piece. Interestingly, Google just stated they crawl ~20 billion webpages each day. At Moz, we're crawling ~1 billion/day, and can go as high as 3 billion/day with our current infrastructure. We think we can get close to 6 or 7 billion with some upgrades, but Google's 20 billion is very impressive and will be hard to match. Of course, for us, processing is a far bigger bottleneck than crawling...

  • Ravish   Aug 12 2012   Flag

    Mind sharing how much does it costs you to crawl billions of web pages?

  • Dan Bochichio   Aug 11 2012   Flag

    Very cool. I saw this on Hacker News yesterday.

  • Tad Chef   Aug 11 2012   Flag

    Why should you crawl them? There are only a few use cases I can think of, most of them black hat.

  • dchuk   Aug 11 2012   Flag

    Take off that tinfoil hat champ, there's lots of reasons to crawl the top million domains. Backlink spread, content analysis (keywords, sentiment analysis, etc). Social sharing diversity. Just because something is automated doesn't mean it's evil.

  • Tad Chef   Aug 12 2012   Flag

    I didn't say it's evil. I said that there must be reasons to do so. It's not just l'art pour l'art.

You must login to post comments.