You must login or register!
Inbound.org uses Twitter to register and create accounts. Your Twitter handle will also be your username here on Inbound and registration/login will enable you to submit content, post comments and create/edit your Inbound profile. Use the button below to verify your Twitter account.
Login or Register
Really good, detailed piece. Interestingly, Google just stated they crawl ~20 billion webpages each day. At Moz, we're crawling ~1 billion/day, and can go as high as 3 billion/day with our current infrastructure. We think we can get close to 6 or 7 billion with some upgrades, but Google's 20 billion is very impressive and will be hard to match. Of course, for us, processing is a far bigger bottleneck than crawling...
Mind sharing how much does it costs you to crawl billions of web pages?
Very cool. I saw this on Hacker News yesterday.
Why should you crawl them? There are only a few use cases I can think of, most of them black hat.
Take off that tinfoil hat champ, there's lots of reasons to crawl the top million domains. Backlink spread, content analysis (keywords, sentiment analysis, etc). Social sharing diversity. Just because something is automated doesn't mean it's evil.
I didn't say it's evil. I said that there must be reasons to do so. It's not just l'art pour l'art.