internacious Blog

Indexing and Ranking. Take Nothing for Granted

This article is about both the mundane and the fascinating. The reasons outside Google’s Webmaster Guidelines and a thousand helpful blogs as to why your web page may not be indexed at all. And if it is indexed, why your page may not be served.

Then if your page is ranked and served with any visibility, how stable is your SERP?

Prompted by How to earn your place in Google’s index in 2020 by Bartosz Góralewicz (CEO Onely). Bartosz continues his thorough analysis of how inclusion in Google’s search index is not a given.

I’m interested in how index inclusion and subsequent ranking performance can suffer for reasons beyond the usual list of possibilities such as reasons listed in Google’s support article Why is my page missing in from Google Search? There are plenty of good reasons why your web page is not turning up in search. And plenty of great advice available on how to remediate both routine and more esoteric site indexing issues.

Of more interest to me are conversations like this one.

Yes we know. No place for complacency ever. Past achievements in gaining SERPs visibility mean nothing today.

The “all bets are off” stance rippling through Google Search reps public commentary on indexing and ranking is is what it is. Capacity in Google Data Centres is not infinite

More here:

We found that: 51% of those pages weren’t crawled by Googlebot, 37% weren’t indexable, 77% weren’t getting any organic search traffic  

Why is that?

While there’s no simple answer, we knew that the growing size and complexity of the web had some role to play…

Mundane considerations amplify the stakes

Pragmatic trade-offs between page quality and space (Google’s server disk/ram) are a deflating reality for website owners. These mundane considerations amplify the stakes.

Assume we’re talking about great content here, ticks all the boxes. Captain Obvious would say – not only must your content beat your competitors content on all criteria to rank, it should do this to get indexed in the first place.

Captain Obvious has a potentially misguided assumption about control. Lets hand the microphone to Captain Uh-Oh, who has the same mantra. But Captain Uh-Oh sounds a little panicky now, with the realisation factors outside anyone’s control impact site performance in search.

Dropped for totally innocuous reasons

My favourite part in amongst all the helpful advice in Google’s Why is my page missing in from Google Search? troubleshooting document is this:

“The page may have been dropped or omitted from the index for totally innocuous reasons. (The web is immense, and Google doesn’t get to every page, though we try to!)…”

The sheer scale of the job – discovery, crawling, rendering, and indexing, making sense of and storing all that data – has necessary limitations.

We already got that …

This is anecdotally because I can’t find the youtube video (I’ll get back to citing adequately after this!) – Google’s Martin Splitt on why certain web pages are not indexed. I paraphrase.

“Maybe we chose not to index that page because we already have that”.

Other than the sheer power behind those words, my response to this Google spokesperson’s responses is a certain revelatory clarity. Or more likely I’m about to indulge in more obvious thoughts.

State of play in the search space

A mission statement like “to organise the world’s information” has a broad, fluid definition of “organise”. In today, out tomorrow. Ranking today. Tomorrow? Potentially not so much.

As it is there is enough hidden on how search works. For valid reasons to be reasonable. Perhaps SEO maps neatly to the famous saying “half of the money spent on advertising is wasted, I just don’t know which half”. Even though online marketing has this massive data advantage – so much is measurable. Maybe the corollary for SEO is – we don’t know what is going on half the time due to the search engine black box. Necessary for a level playing field on the SERPs.

Organise the world’s information – within certain constraints and design choices

Google’s mission is “to organise the world’s information”.

However constraints (capacity and processing constraints – disk, ram, network) and design choices (algorithmic implementations, determinations on quality and so on) create a single corpus with a distinct curation of the world’s information. Like any curation – how the edits are arrived at are necessarily flawed.

And transient. No index selection inclusion or ranking position is safe.

Gary Ilyes of Google described index selection challenges here:

Indexing and ranking are edits

And it is editing, I regard indexing as an edit. Getting into the index becomes a pragmatic exercise – relevance and quality versus capacity. For those web pages indexed, some signals are evaluated at this point.

Ranking is an edit. Algorithmic curation to pick ranking winners. Edits performed by systems built to scale.

The world’s information – Google’s corpus

I’m currently taken by the idea of the world’s information as “Google’s corpus”. Imagine the ability to say – yep, our entire body of work is – the world’s information. Not bad.

The tone is just right in the Google Search off the Record Podcast. There is good value in hearing John Mueller, Gary Ilyes, and Martin Splitt mostly evade saying much at all on certain search subjects.

What we tend to miss is their intention running through their breezy open-ended conversation. Which is to reiterate foundational website ranking principles that will protect your website from ignominiously fall out of their index.

John Mueller reinforces my view of Google’s index as a subjective curation.

At 21:55 in the episode “How to think about ranking in search and much more”.

“I mean, it’s something that we have to keep working on. Kind of, like I said, it’s not like a science where you can say, this is the absolute correct answer and everything else is wrong, but rather you could discuss that maybe this is the best answer or something else is the best answer.

It might be today this is the best answer, but maybe next week or next month we discovered, well, actually we should have been looking at it differently. Maybe something else is the best answer. And obviously there are some things where there is a scientific answer.

A lot of things really don’t have absolute answers and this changes over time, fairly quickly as well.”

You can create clearly superior content, be rewarded for it in search visibility, then all of a sudden, actually no – its not great at all. Why? Our comprehension of your content has improved yet another increment closer to how you humans comprehend.

Search queries 1/2 second trip along the path of pride or route of regret

Reinforces the reality of Google Search as a massive bunch of continuously improving systems that constantly change how they treat a given search query. The search query’s traversal of Google’s search systems becomes a 1/2 second trip to the SERP along a path of pride or route of regret for your web page.

An example of Google Search search systems is described in this presentation by Paul Haahr, Distinguished Engineer for the Google Search Team. He spoke about the synonym system and emoji system. Emojis are interesting as a case study in building a discrete system to address evolving search needs. Paul Haahr describes the issues integrating the new emoji system into the entire Google Search suite of systems through which user queries are routed.

How to think about ranking

In closing I come back to John Mueller in his “how to think about ranking” podcast speil.

His language is elusive, yet oddly genuinely helpful.

“…The best way for a website to kind of remain in a stable position, which is not guaranteed at all, is really to make sure that you have a wide variety of different factors that you work on and kind of keep this diversity of your website upright.

So similar to how you might want to improve diversity in a team to get different viewpoints. That’s the same thing that you’d want to see on a website So that regardless of how things are routed through this network to find the Search results, we can understand that this website is relevant in different ways.

And all of these add up into kind of telling us that it’s actually relevant for a particular query…”

SEO matters more than ever. But stay nimble. Given massively scaled generalised machine learning search systems making decisions about indexing and ranking. Those decisions change quickly. A rapidly continuously improving Google means perpetually unstable SERPs that require attention and yes optimisation.

Entities, Keywords, and Links

Entities (people, places, organisations, things) are things that exist separately from other things.

Search engines are building a database of every-thing, their characteristics, the classifications they belong to, and the relationships between them. For example Google’s Knowledge Graph.

Andrei Prakharevich for SEO Powersuite has created a very accessible introduction to entities, including the core concept of entity classes, which enhance Google’s ability to understand content at a concept level – to a greater extent than is possible with their keywords based approach or links. Links are great input into understanding the quality of the page’s content, but not the relevance.

A deep dive on entities can be found here The fascinating point here is how the move to mobile-first indexing was not primarily about mobile-first indexing!