commit: 6b2ca8a - #414 (2014-03-05 12:54:44 -0500)
Definitely a topic I'd like to research more in my own time. I remember when Rand did this post I thought it would start a ton of theories about co-citations but it really didn't, until this post.
Ouch! Rand's original article confused this a lot, calling something loosely similar to co-occurrence by the name of co-citation. They aren't the same thing, and Rand's post somehow confused things even worse. Co-occurrence is when you perform a search for a particular query, and then look at the top n documents, where n could be the top 10, or 100, or even more results. When you're looking at those documents, you want to see what words or phrases tend to appear over and over within them.
For example, if the query was "baseball stadiums," you might see certain terms or phrases appearing over and over again in the top results, such as "home plate," or "left field," or "dugout". Since these terms or phrases tend to occur again and again in the results, they are said to co-occur. This is part of the process behind Google's phrase-based indexing, as described in a number of Google patents.
Words/phrases that tend to co-occur in the top results for a query might cause documents within that query set to be boosted in search results if they contain a number of those co-occurring terms, at least to a point. If they contain too many of the co-occurring terms, above a statistical threshold, they might be considered spam.
Co-occurrence also recently appeared in another patent from Google which attempted to understand when queries might be reasonable substitutes for each other, by looking at co-occurring terms in a top number (10 or 100, for instance) of results for each. So, if a lot of the same words co-occurred on a search for [frenchopen] and for [french open], then those query terms could be said to be substitute queries, and Google might expand a search for either to show results for both on a search for one of them.
Co-occurrence does not involve which words tend to appear frequently near your brand name, or the name of your site on a page. That's not co-occurrence or co-citation either. :(
Thanks for your clarification Bill. Could you explain then what you think co-citation actually is?
I don't know why, but there seem to be a lot of people confusing co-occurrence with co-citation these days. What the author of the post you linked appears to have missed is that after I wrote a response to Rand's article where I explained that what he was talking about in the Whiteboard Friday was a lot closer to co-occurrence than co-citation. Rand renamed the post, removing the co-citation mention, and called it, "Prediction: Anchor Text is Weakening...And May Be Replaced by Co-Occurrence- Whiteboard Friday".
One of the clearest expressions of co-citations on the Web the last few years are the "similar sites" that you've probably seen in Google.
When website A and Website B are both linked to Website C, they can be said to be somewhat similar because of it. When Website A and Website B are linked to by a lot of the same sites, they can be said to be even more similar to each other. That's co-citation. Jim Boykin wrote a good article about it back in 2006:
Co Citation – understanding how it effects your SEO.http://www.internetmarketingninjas.com/blog/jim/co-citation-understanding-how-it-effects-your-seo/
I go into a lot more detail in my response to Rand's article here:
Not All Anchor Text is Equal and other Co-Citation Observations
Somehow though, people started thinking of co-occurrence as when certain words started appearing around the mention of a brand name or a URL, that those words would somehow be associated with the brand name or URL. That's not quite how the search engines seem to have implemented co-occurrence either. See my examples of a couple of uses above (reranking for phrase-based indexing, and finding substitute query terms).
Shortly after I wrote about the Co-Occurrence Whiteboard Friday, Google published a patent that uses co-occurrence in ranking web pages (other than the phrase-based indexing patents), titled "Document ranking using word relationships". I wrote about that in:
Ranking Webpages Based upon Relationships Between Words (Google's Co-Occurrence Patent)
For another perspective on this co-citation/co-occurrence topic, I highly recommend a post from Joshua Giardino:
It’s Not Co-Citation.. but it’s still awesome! (Or what’s really going on in the SERPs?)
Thanks Bill! Great list of resources on this topic.
You're welcome, Mark. :)