commit: 6b2ca8a - #414 (2014-03-05 12:54:44 -0500)
Great article on the art of scraping. The one issue it failed to bring up was depending on the content you are scraping, you could run into some duplicate content problems with Google.
It depends on what you actually *do* with the content. I left it up to the imagination of the scraper to decide how to use those methods. Obviously, if you're re-posting other site's content as-is, you're probably going to have some issues (not just SEO ones) especially if the site who got scraped feels like you're damaging their branding or hurting their revenue stream.
If you're using python as alluded to in the post, I'd suggest taking a look at Scrapy http://scrapy.org/. It's got a great parser, and a bunch of other built in niceties like the ability to write middlewares so you can have a full crawler, and not have to cobble other libraries together to get at what you need.