Monday, July 10, 2017

Here’s how RankBrain does (and doesn’t) impact SEO

In the past couple of weeks there has been a reinvigorated fervor surrounding artificial intelligence, with “AIO” (Artificial Intelligence Optimization) rearing its head on agency websites and blogs.

HTTPS and mobile first seem to be cooling as topics, so attention is turning to RankBrain.

The reality of this however is that artificial intelligence optimization is seemingly a paradoxical notion. If we imagine that Google is a child, when the child goes to school and reads a book, we want the child to learn and understand the information in that book. If the book isn’t “optimized” for the child to learn – structured information, images, engaging, positive user experience etc. – then the child won’t learn or understand the content.

Optimizing for RankBrain isn’t something new, or complicated. The tweet above from Google’s Gary Illyes on June 27 2017 echoes this. So why is there this need to turn RankBrain optimization into a product of its own, when the practices aren’t anything new?

In this post I’m going to explore exactly what RankBrain, and isn’t, as well as how the pre-existing concepts and practices of good SEO (as outlined by Google’s guidelines) apply to RankBrain.

What is RankBrain?

RankBrain uses a form of machine learning and is used by Google to process unfathomable amounts of qualitative data (written content) into quantitative data (mathematical entities), vectors that the algorithm and other computers can understand.

15% of all queries that Google processes are new, so it’s common for RankBrain to encounter a query or phrase it hasn’t seen before. Using previously processed data in vectors and shards, RankBrain looks to make an intelligent guess based on similar queries, and similar meanings.

The number of new queries has reduced from 25% in 2007, but volume has increased exponentially thanks to the rise of smartphones and increased internet penetration rates globally.

Simply put, RankBrain:

  • Interprets the user query
  • Determines search intent
  • Selects results (items) from the databases

What is machine learning?

Machine learning is a computer science and was defined in 1959 by Arthur Samuel as follows: “Machine learning gives computers the ability to learn without being explicitly programmed”. Samuel conducted the initial research into this field, which evolved from pattern recognition studies and computational learning theory.

Machine learning in essence explores the construction of algorithms and makes predictions based on data and statistical frequencies. Machine learning has been used in a number of software applications prior to Rank Brain, including spam email filtering, network threat and intruder detection and optical character recognition (OCR).

While this is a form of artificial intelligence, it’s not a high functioning form.

Association rule learning

ARL (association rule learning) is a method of machine learning for discovering relationships between variables in large databases using predetermined measures of interestingness.

This has previously been used by supermarkets to determine consumer buyer behaviour, and is used to produce loyalty coupons and other educated outreach methods. For instance, through store loyalty/points cards, a store can gather data that when analyzed can predict buying patterns and behaviors.

ARL can also be used to predict associations, for example if a user buys cheese slices and onions, it could be assumed they are also going to buy burger meat. RankBrain uses this principle in providing intelligent search results, especially when a phrase can have multiple meanings.

An example of this is an English slang term “dench”. If a user searches for dench it can have three meanings; the slang term, a line of clothing, or the actress Judi Dench. The term can also be associated with individuals, such as professional athlete Emmanuel Frimpong and rapper Lethal Bizzle.

As the query is ambiguous, Google’s own search quality evaluator guidelines explain that the search engine will show as many variations as deemed possible in order to satisfy the users search intent as best they can.

Concepts of association rule learning

The main concepts and rules of ARL are Support, Confidence, Lift and Conviction, but for the purposes of RankBrain I’m going to focus on Support and Confidence.


Support in ARL is the measure of how frequently the item in question appears in the database. This is not the same as keyword density, or the number of times keyword variants appear.


Confidence of ARL is a measure of how often the rule has been found to be true. This is based on associative terms, i.e. if a user searches for “POTUS”, then there is an X% chance that they may also search for, or find, Donald Trump a satisfactory result. They may also find Barack Obama, George Bush or Abraham Lincoln satisfactory results.

Confidence can often be confused with probabilities, as the two principles with regards to organic search are quite similar (if a user searches for X, then Y and Z can also be valid).

RankBrain uses association rules to satisfy user specified minimum support, and user specified minimum confidence at the same time, and both support and confidence are generally split into two individual processes:

  1. Minimum support threshold is established and applied to all frequent items in the database.
  2. Minimum confidence constraints are applied to the frequent items, in order to form rules.

Using these rules, RankBrain helps Google prioritize which ranking signals are most relevant to the user query, and how to weight those signals.

RankBrain and SEO

RankBrain was launched in a dozen or so languages (as confirmed by Gary Illyes on Twitter in June 2017) ranging from English to Hindi, and its sole purpose is to help Google provide more accurate results and an overall better search experience for users, satisfying their queries.

The main difference between the pre and post RankBrain world is that before RB, Google’s team of software engineers would amend and alter the mathematical algorithm(s) that determine search results and rankings, and this algorithm would remain constant until an update was made. However Rank Brain is a part of the core algorithm and is used by Google for all searches (as of 2016), meaning that there is constant change and fluctuation.

This means that search results are now reactive to real world events, as well as a lot more volatile outside of the big algorithm update announcements.

“Optimizing” for RankBrain

Given how RankBrain interacts with the core algorithm and other ranking signals, there may be a need to change strategic focus (especially if the strategy is built on backlinks). But RankBrain is not a “classic algorithm” like Panda and Penguin.

With the classic algorithms, we knew how to avoid Penguin penalties and thanks to guidelines, we know how to satisfy Panda. RankBrain on the other hand is an interpretation model that can’t be optimised for specifically. There are however a number of standard SEO practices that are now more relevant than ever.

Doorway pages are dead

The idea of writing content with a “focus keyword” and producing one page for one keyword are outdated. The Hummingbird updated killed this in 2013, and RankBrain has taken this one step further.

I’ve seen this practice still being used in a number of sectors. When creating content and URL structures, both user experience and keyword matrices should be used, with the focus being on creating high value and resourceful pages.

Different queries = different weighting factors

Because of the way RankBrain has changed how certain variables and ranking factors are weighted for different queries, it’s no longer a practical approach to take a one size fits all approach with queries (and query categories).

Taking queries that trigger Venice results and the map pack out of the equation, some queries may demand high velocities of fresh content, shorter content, longer content, lots of links… The new weighting model that RankBrain presents means that there will need to be deviations from the standard best practice.

Internal linking structures

We know from Google’s search quality evaluation guidelines that Google considers main and supplemental content when ranking a page; this extends to pages within a URL subdirectory and pages linked to from the main content.

It’s standard to optimize internal linking structures so that link equity is passed to key pages on the site (as well as deeper pages), but it’s also important to include a good number of internal links to improve the user experience.

What does the future hold?

When RankBrain was first launched in 2015 it only handled around 15% of queries, but by the same time 2016 Google’s confidence in the algorithm had grown, and it let RankBrain loose on all queries. This will have been a phased rollout and responsible for a number of changes we saw in 2016.

As RankBrain learns on the job, it will only get better at understanding semantic and concepts, and relationships between topics and queries. This will benefit voice search results accuracy as well as traditional search results pages and now cards.

In summary

In conclusion, a number of leading figures in the SEO community (including Gary Illyes and Rand Fishkin) have come out in various ways highlighting that RankBrain isn’t something that can’t be specifically optimized for.

That being said, understanding how the RankBrain algorithm works is important to understanding the ranking volatility in your (or your client’s) verticals.

No comments:

Post a Comment