BrandPepper

Stay Hello!

+3 535 157 1606

Semantic keyword clustering is an indispensable part of keyword research. But it is often a time-consuming and complicated task. Fortunately, developments in artificial intelligence are accompanied by the advance of tools and techniques that help you automate this work. This way you apply automation without having to worry about complicated code, such as Python. What does automatic clustering mean? Why do we cluster keywords in the first place? How do you ensure that you cluster properly? And most importantly: why is this so important? In this article I provide answers to these questions.

From Google Hummingbird to MUM

In 2013, Google announced Hummingbird : the codename for a new algorithm update. It includes several components, including RankBrain. Hummingbird is able to understand the semantics of a user’s query. It considers the entire query – one word or a whole phrase – rather than individual words.

RankBrain

In 2015, Google announced RankBrain , a machine learning technology and an extension of Hummingbird, which helps interpret search queries even better. RankBrain is able to see patterns between seemingly unrelated queries and learn how they are similar.

RankBrain has teamed up with Hummingbird to provide better results for user queries. Only Rankbrain goes further than semantic search. Based on what it learns, the self-learning algorithm is able to apply this “learning experience” to future searches. These can be similar searches but also unknown or combinations of searches.

BERT

In 2019, Google introduced a new algorithm update called BERT . This model uses natural language processing (NLP) and sentiment analysis, among other things, to understand each word in a query in relation to all the other words in a sentence.

MUM

And recently Google announced that they are developing a new technology that is 1000x more powerful than BERT: MUM.

“MUM is a technique that enables transferring knowledge across languages. MUM not only understands language, but also generates it. It’s trained across 75 different languages ​​and many different tasks at once, allowing it to develop a more comprehensive understanding of information and world knowledge than previous models.” ( Nayak, Google, 2021 )

UM is also multimodal . This means that MUM can simultaneously understand information from different content formats, such as web pages, photos, videos and more.

“.. MUM is multimodal, so it understands information across text and images and, in the future, can expand to more modalities like video and audio.” ( Nayak, Google, 2021 )

According to Google, search engines are not yet able to solve very complex searches in one go. Often the models do not ‘understand’ the context or user need behind a query. This results in multiple searches before the user need is met.

“ People issue eight queries on average for complex tasks ….” ( Nayak, Google, 2021 )

With MUM, Google is getting closer to providing immediate answers to complex issues. For example, think of the “next query you’re going to type in”. With this, Google actually wants to display the answer to your 3rd question during your first search. Latent needs thus become even clearer, even during the user’s search.

It is therefore essential for Google’s models to understand what someone’s intention is. And which keywords have the same intention and information needs.

Why is semantic keyword clustering relevant?

Two (or more) apparently unrelated searches can therefore respond to the same information need and intention of the searcher. How does this work in practice? Take the following example:

Queries 1 and 2

“Arabica coffee beans”

“Robusta coffee beans”

At first sight, behind the keywords lie the following intentions and information needs.

  • Informative intent : information about what Arabica / Robusta coffee beans are (and where you can possibly buy it)
  • Commercial / transactional intent : there is a chance that Google Ads and Shopping campaigns will be shown

If you were to manually group these keywords based only on syntax or underlying meaning, a possible cluster name could be “types of coffee beans”. There is a semantic relationship, but is this information enough for you as an SEO marketer?

The search results mainly show blog articles

The SERP (search results page) shows that the top results in Google are almost dominated by blog articles that respond to the question of what the difference is between Arabica and Robusta coffee beans:

The semantic link is obvious at first, but without analysis you might have overlooked that the user’s intent and information needs are pretty much the same in both searches.

This is the power of RankBrain.

The commercial and transactional intent is also responded to by various Ads. Except for one URL, the organic search results did not contain any URLs that catered to these intents. For example, you don’t see an e-commerce landing page listing Robusta (or Arabica) beans. Apparently, informational intent and need predominate in these searches.

Think before you start creating content

You probably only got the above insights by analyzing the SERP for each keyword separately.

You also understand that there is no way to manually perform an analysis for a keyword research of 5,000 or more keywords.

As an SEO consultant, you have the task of connecting the right actions to the insights from a keyword analysis. Before creating content, you first want to be able to answer the following questions:

  • Which type of pages (and functionalities) best capture the search query?
  • What intention(s) does the seeker have?
  • What explicit and underlying latent questions and needs does the searcher have?
  • Which content formats are best suited to answering the questions?

And ideally also:

  • Who is looking for this? (who is your target audience)

Before you can link the right actions to a keyword, you first need to know which keywords are primarily semantically related and which keywords respond to the same intentions and information needs.

Clustering keywords correctly gives you a rock-solid start for a content strategy. You get a better understanding of how to organize pages and which clusters of keywords you can rank for.

What is Semantic Keyword Clustering?

Keyword clustering means that you cluster keywords that are semantically related and respond to the same search intent in a group. How does this work in practice?

In 2017 I wrote an article about ‘whiskey for beginners’ for Gall & Gall. At the time I was a novice whiskey drinker and I consulted myself to understand what the content of the article would be. I asked myself the aforementioned 4 questions:

  • Which type of page (and functionalities) best capture the search query? 
    • A blog article
  • What intentions does the seeker have?
    • Informative and possibly commercial. The searcher especially wants to know which whiskey she or he can try as a novice whiskey drinker.
  • What explicit and underlying latent questions and needs does the seeker have?
    • How do I know if I like whisky?
    • What is good whiskey?
    • Which whiskey is soft?
    • I don’t understand whiskey jargon, so I want to understand what I read
    • I don’t want to spend too much money
    • Where do I start?
  • Which content formats are best suited to answering the questions?
    • Text, images and products

Below you can see the top 18 current article rankings without sitelink rankings, 4 years later.

What stands out about this list and the content of the article?

Initially the main keyword. ‘Whiskey for beginners’ is called 0x. Is this bad?

No.

At the time, I sometimes forgot to place keywords in the text. Instead, I focused on the questions above.

There are also several themes or topics in the article:

  • Whiskey for beginners / best whiskey for beginners
  • Learn to drink whiskey
  • Whiskey flavours
  • Soft whiskey
  • Sweet whiskey
  • Tasty whisky
  • Combination of soft / tasty / sweet whiskey

What makes the article rank for all these terms in the top 3?

Largely because the semantic relationship and user search intent are roughly the same for all terms. In addition, the overarching theme is “whiskey for beginners”. A beginner looks for specific things that an experienced whiskey drinker simply doesn’t look for.

A novice whiskey drinker looks for sweet or soft whisky, because those are the entry-level whiskies. The predominantly easy to drink whiskies. You only learn to drink whiskey as a beginner. And the underlying information needs are that you want to know how to get started and which whiskies. “Whisky tastes” and “nice whiskey” are typical searches for someone who knows little or nothing about whiskey.

With the help of RankBrain, among others, Google Search was already able to determine the semantic relationship between these searches in 2017 via the self-learning algorithm. That is why the article also ranks for keywords that do not necessarily appear in the article, but do respond to the underlying intention and information needs.

Clustering keywords correctly gives you a rock-solid start for a content strategy. You get a better understanding of how to organize pages and which clusters of keywords you can rank for.

Benefits of Semantic Clustering

Other benefits of semantic clustering include:

  • Stronger rankings for long-tail keywords
  • Better understand the underlying relationship between clusters
  • Improved rankings for short-tail keywords
  • More options for internal linking
  • Building expertise and authority in a niche

Semantic keyword clustering, how do you do that?

Keyword clustering obviously starts with keyword research. Collect as many relevant search terms as possible including all variations, long-tail keywords and subtopics.

When creating a keyword list, make sure you take relevance and search intent as a guideline. Of course you only want to spend time and budget on content that brings relevant visitors to the site. It can be quite a job to determine exactly why a query is relevant. Especially since relevance is a commonly used buzzword.

Improve the accuracy of your keyword research

The following factors help narrow the scope and improve the accuracy of keyword research:

  • The type of website . Is it an inspiration platform? Or a news site? An e-commerce platform? Or a combination?
  • The partiality or impartiality of the site. For example, do you sell 1 brand with the same product or different products? Or do you have a comparison site that hosts different brands?
  • B2B or B2C . Are you active in B2B or B2C, or perhaps both?

Depending on these factors, certain search queries may or may not be captured by a site.

The type of page and content formats also play a role

In addition, the type of page and content formats also play a role in understanding information needs. For example, some queries can only be met with a certain type of page and content format. This depends on how the user wants to see and consume content.

An example

Suppose you are hired as an SEO marketer for the imaginary bicycle brand SnelFietsie. The bicycle brand has its own platform with a webshop. You found the keyword “buy a bicycle” in the keyword research. Is this keyword relevant?

No.

The top search results in Google show established parties that:

  • Sell ​​different brands
  • Ranking with landing pages with an overview of a wide range of bicycles and brands
  • Both offering products and providing additional information

People who search for “buy a bicycle” are considering buying a bicycle, but do not yet have a specific brand in mind. It feels logical to see an overview of different brands and bicycles with accompanying content that responds to any explicit and latent questions.

It is likely that SnelFietsie as an individual bicycle brand cannot respond to this information need and intentions. A relevant page is therefore also beyond the reach of SnelFietsie.

Understanding which keywords are or are not relevant based on these factors is essential to create the right content clusters and thus drive relevant traffic to the website.

Automatically cluster keywords

AI-driven keyword clustering tools are increasingly appearing , which analyze a list of thousands of keywords in minutes and group them semantically. The disadvantage of most tools, often free of charge, is that the technology behind the clustering algorithm is based on language. Groups or clusters are defined based on corresponding terms between the different search terms in the keyword list. It is a relatively imprecise method of mapping search intent and semantic relationships.

Most paid tools go a step further by also factoring the SERP results of each individual keyword into the algorithm. By looking at the search results in the SERP, search intentions can be derived for each individual keyword. In addition, it is also possible to discover semantic relationships by comparing all SERP URL rankings of the entire set of keywords.

Algorithms with own variables

Each tool has its own developed algorithm that uses different variables to determine semantics and intent. For example, variables help to determine whether there is a semantic relationship and, if so, how strong, what intentions a keyword has and in which cluster shell or layer a keyword ends up.

Some examples of variables could be:

  • Semantics – Comparing the number of matching URLs in the top X of the SERP
  • Semantics – The number of matching root domains in the top X of the SERP
  • Intent – ​​The presence or absence of Google Ads
  • Intent – ​​signaling certain words that release intent (e.g. “buy”)
  • Intent – ​​signaling SERP features
  • Shell – optimally merge small clusters into groups that have the most semantic overlap

The disadvantage of clustering based on language

Clustering based on language remains a disadvantage. You can’t easily switch between languages ​​or countries. You also encounter contradictions. For example: “buying a house” does not only mean that you want to buy a house right away. You also want information about the process of buying a house.

When do you opt for automatic semantic clustering?

The automatic semantic clustering of keywords stands or falls with the quality of the algorithm, the flexibility of the tool and the time savings you achieve with it. The flexibility to cluster a keyword set per region or language, for example. This means that your keyword clustering is semantically grouped based on local search results and preset language. Or the convenience of getting an entire set of keywords clustered in 10 – 20 minutes.

But automatic solutions for semantic clustering are especially useful if the clustering is of better quality than manual clustering. Time savings are therefore partly dependent on the quality of the clustering. It makes little sense if you spend hours correcting clusters yourself. Saving time and flexibility are therefore not an automatic condition for using a tool.

What does automatic keyword clustering yield if the quality is good enough?

  • Clusters are data-driven and based on assumptions or intuitive clustering
  • It eliminates clustering errors, especially if a tool is not language dependent
  • You have a set of clusters and subclusters within minutes
  • You maximize the number of keywords to rank for
  • No more irrelevant keywords in the same cluster
  • Direct insight into user intentions for both individual search terms and (sub)clusters
  • Direct insight into the visibility of your domain for both keywords and (sub)clusters

Is automatic clustering always better?

This need not be the case.

Are you working on a keyword analysis with 200 keywords? Then manual clustering is a good alternative. Although in practice the number of keywords is usually in the thousands for SEO marketers working at SMEs and large companies. It is precisely at these companies that the existing work processes and the set of SEO tools partly determine the extent to which a new tool is adopted. Companies already work with existing templates in Google Sheets, dashboards and Excel documents for processes such as keyword analysis and content planning. In some cases even with in-house developed software. A switch to a different or additional working method and tool is not desirable in all cases.

Also, an automation does not immediately replace the analysis. Automatic clustering ensures that you have more time for gathering insights, drawing up actions and formulating strategy. Pressing the button, leaning back and waiting: of course that does not yield anything. Automation boosts your productivity, it’s not a substitute for activity.

Want to get started with a tool yourself?

The following questions will help you determine whether a tool or service that offers automatic clustering is of sufficient quality:

  1. Is the tool language independent?
  2. Do you have the flexibility to switch between language, country and region?
  3. Does the tool look at the SERP results to determine intent and semantics?
  4. Can you observe intent per keyword or (sub)cluster?
  5. Do your manual checks in the SERP match the results of the tool?
  6. Can you get actionable insights from the output of the tool?

The following questions will help you determine whether a tool or service that offers automatic clustering is right for your service type or organization:

  1. Is there the flexibility to set up work processes around the tool?
  2. Does the tool offer semantic clustering software exclusively, or is it part of a suite of SEO tool products?
  3. What data do you need to provide and what data and insights do you get?
  4. What are the limiting factors of the software for your organization?

Do you have questions about semantic keyword clustering? Feel free to use the comment option below this article.