My take on Semantic Search and Hummingbird.
A hummingbird is a beautiful, lively and ever-so-loveable creature that gives one a feeling of animation and personality.
What I think people mistake is that an algorithm is absolutely none of these things. I'm not saying that people actually believe the Hummingbird algo to be loveable and lively but that they see it as caring about things which it actually cannot.
An algorithm is a cold and calculating recipe for pulling data based on certain pre-defined elements; even though it can be very expansive and take a massive amount of such elements in for it to spit out the results - it doesn't actually "consider" anything at all.
Looking anew at Hummingbird and Semantic Search in this rather cold and calculating manner will help SEOs to cater their sites to it and hence rank better, in my humble opinion.
To figure out what the algorithm is programmed to feature requires that we look at the actual facts.
As David Amerland (likely the leading name in Semanitc Search) writes here:
"When complex analysis and mapping takes place, it inevitably reveals usage patterns that can signal intent. Google makes full use of this capability. Hummingbird can now broaden the search horizon by suggestion links and information that are related to a search but were not asked for. A request for local museums, for instance, may lead to data being shown about artists whose works are exhibited there or a search for a local restaurant can also bring up reviews, recipes and food styles".
"The point is that from a practical perspective those seeking to be found need to focus on the creation of content that is as information-rich as possible. Keywords on their own, now ... simply ain't gonna cut it".
I agree with him especially on creating content that answers questions but how do keywords play into it?
The cold algorithm still requires keywords in order to determine everything (meaning that without words you cannot have a language-based search engine). This is axiomatic. Hummingbird must seek to determine user intent via the keywords, phrasing and other information (e.g. previous searches by the very same person, results clicked on and not bounced from, as well as those same aspects by others who've searched Google plus the authority of the content and writer who presents those low-bounced results which appear to answer the question sought.
As for David's reason that 100% Not Provided was announced coincidentally on the launch of Hummingbird (during Google's 15th birthday) is only partially right (IMO).
"By removing keyword reporting in Google Analytics Google has removed the "what keyword am I going to build content, around, today?" strategy. Content will always contain keywords, but it should be created to answer specific, potential, end user questions rather than surface a page because of a keyword".
I think there is an additional, and even more important reason for this convenient timing of the two launches. I believe that long-tail keyword data previously gained from Analytics could allow a more in-depth reverse engineering of Hummingbird by tracking the difference in keywords, terms, phrases and interrogatories that began generating the traffic over the previous more-general keywords that provided traffic (ergo rankings) prior to the launch of Not Provided.
By seeing how Google's Hummingbird specifically changed the landscape (i.e. which keywords now do and do not perform – as well as which pages rose and fell – could allow clever SEOs to lift the curtain on the algo. Google was being the mama bear again, protecting its cub, the algorithm, from SEO's poking sticks.
With the mass majority of keywords falling into the Not Provided designation we require other tactics. This means that we look at what Hummingbird does - coldly and mechanically so to say.
If it is able to glean user-intent (to some degree) and it looks at topics to a greater degree - then it is important for the content writer or SEO to determine what questions and desires their potential market is searching on and how they can best answer them (best meaning in comparison to others answering them).
Also it's important to look at other features that bolster the relevance of your content to the question or desire. This means looking into supplemental content and things that can increase the information gained. Supplemental content can be links to other topically related content or things that assist the user such as a mortgage calculator on a real estate site or a translation feature on an international site.
Authority is becoming more and more important. How those who are considered authoritative deal with your content contributions to the topic (e.g. linking to, citing, and even mentioning of you, your name, your site, brand, etc) and how those who are not deemed to be authoritative do the same.
Building authority is primarily based on citation (see Google's original PageRank paper entitled The PageRank Citation Ranking: Bringing Order to the Web).
It shows you that citations are how Google connects the dots between people, sites, brands, etc. That's what links are but also mentions.
Branding is ever more important and research has shown that established brands rank better simply because they're established brands. Branding is a clearly defined thing... it requires you to connect your name or mark to the topic and keywords so that they become closely associated with each other via usage. Google did this so well that their brand became a verb synonymous with searching online.
Building authority and branding can go hand in hand so getting people to use your name or mark in association with the terms and topics helps on very many levels. Sometimes optimizing for semantic search can be very difficult.
You can use this topical "find question or desire – then answer" approach more easily on subpages of content but If you're a service provider, for example, then you may not be able to describe on your homepage just how you're a Los Angeles Plumber that is more a Los Angeles Plumber than the next Los Angeles Plumber. So one must look at the qualifications of such that Google finds most relevant to people seeking one.
Location in this case is going to be important, as will reviews and social and authoritative signals. But when it comes to the content and the terms used, it will require some reverse engineering of your ranking competition as well as those who rank below you. Find what appears to work and what appears not to and begin an approach based on that data.
For SEOs it still comes down to analysis, theories, sleeve-rolling, creation, trial and error, results, analytics analysis, conclusions and a repetition of that very process. Remember, in the end, you're not fighting against a loveable little bird; you're fighting against a cold-calculating algorithm.
Well that's my two cents – I'd love to hear yours.
Fantastic "Mechanical Hummingbird" art by JossHorror