Google’s John Mueller answered whether removing pages from a large site helps with the problem of pages being discovered by Google but not being crawled. John offers a general understanding of how to solve this problem.
Discovered – Currently not indexed
Search Console a service provided by Google for communicating search-related issues and feedback.
Indexing status is an important part of search console because it tells publishers how well a site is indexed and eligible for ranking.
The indexing status of web pages is found in the search console Pages Indexing Report.
Reporting that a page has been detected by Google but not indexed is usually a sign that the problem needs to be resolved.
There are many reasons why Google might discover a page but refuse to index it, even though the official Google documentation lists only one reason.
«Discovered – currently not indexed
Google found the page but hasn’t crawled it yet.Normally, Google wants to crawl URLs but this is supposed to overload the site; As a result, Google rescheduled the crawl.
This is why the final crawl date is blank on the report.”
Google’s John Mueller gives more reasons as to why a page should be discovered but not indexed.
De-index non-indexed pages to improve site-wide indexing?
It has been suggested that removing certain pages will help Google crawl the rest of the site by crawling fewer pages.
It has been suggested that Google has a limited crawling capacity (crawl budget) allotted to every website.
Googlers have repeatedly said that there is no such thing as a crawl budget the way SEOs perceive it.
Google has several considerations about how many pages to crawl, including the ability of the site server to handle extensive crawling.
One fundamental reason why Google is picky about how much to crawl is that it can’t afford to host every single web page on the Internet.
That’s why Google tends to index pages with some value (if the server can handle it) and not other pages.
For more information on Crawl Budgets, read: Google shares crawl budget insights
Here is the question that was asked:
“Will de-indexing and aggregating 8 million used products into 2 million unique indexable product pages improve crawlability and indexability? currently – the issue is not currently indexed)?”
Google’s John Mueller first admitted that it was impossible to solve a person’s specific problem then offered general recommendations.
He replied:
«Can not say.
I recommend reviewing our large site budgeting guide in our documentation.
For large sites, sometimes more crawling is limited by how your site can handle more crawling.
However, in most cases, it’s about overall site quality.
Are you significantly improving the overall quality of your site by going from 8 million pages to 2 million?
Unless you’re focused on actual quality improvement, it’s easy to spend a lot of time reducing the number of indexable pages but not really making the site better, and that won’t improve. everything for search.”
Mueller gave two reasons for the incident that was found not to be indexed
Google’s John Mueller gave two reasons why Google might detect a page but refuse to index it.
- Server capacity
- Overall site quality
1. Server capacity
Mueller said that Google’s ability to crawl and index web pages may be «limited by how much more crawling your site can handle».
The larger the site, the more bots are needed to crawl the site. Complicating matters is that Google is not the only bot to crawl a large website.
There are other legitimate bots, such as from Microsoft and Apple, that are also trying to crawl the site. In addition, there are many other bots, some legal and others related to hacking and data collection.
That means for a large site, especially in the evening, there could be thousands of bots using site server resources to crawl a large site.
That’s why one of the first questions I ask publishers about indexing is the state of their servers.
In general, a website with millions of pages or even hundreds of thousands of pages will need a dedicated server or a cloud server (as cloud servers provide scalable resources such as bandwidth, GPU and RAM).
Sometimes the hosting environment may need more memory specified for a process, such as a PHP memory limit, to help the server cope with high traffic and prevent 500 Response Messages Error.
The server fixes problems related to the analysis of server error logs.
2. Overall site quality
This is an interesting reason for not indexing enough pages. The overall quality of a website is like the score or decision Google assigns about a website.
Parts of the website can affect the overall site quality
John Mueller has said that a part of a website can affect the determination of overall website quality.
Mueller said:
“…for some things, we look at the quality of the site as a whole.
And when we look at the overall quality of the site, if you have lower quality critical sections it doesn’t matter to us why they are of lower quality.
…if we find that there are important parts of lower quality, we may think that the site as a whole is not as great as we thought.”
Definition of website quality
Google’s John Mueller gave his definition of website quality in another Business Hours video:
“When it comes to content quality, we don’t mean the same thing as your content.
It really is the quality of your overall website.
And that includes everything from layout to design.
Like, how you lay things out on your pages, how you integrate images, how you work with speed, all those elements that come into play there.”
How long does it take to determine overall site quality
Another fact about how Google determines website quality is how long it takes Google to determine website quality, which can take months.
Mueller said:
“It took us a long time to understand how a website fits in with the rest of the Internet.
…And that’s something that can easily be lost, I don’t know, months, half a year, sometimes even longer than half a year…”
Website optimization for crawling and indexing
Optimizing an entire site or a portion of it is a general way of looking at the problem. It usually comes down to optimizing individual pages on a minified basis.
Particularly for e-commerce sites with thousands of millions of products, optimization can take many forms.
Things to pay attention to:
The main menu
Make sure the main menu is optimized to take users to the important parts of the site that most users are interested in. The main menu can also link to the most popular pages.
Links to popular sections and pages
The most popular pages and sections can also be linked from the featured section of the homepage.
This helps users get to the pages and sections that are most important to them, but also signals to Google that these are important pages that need to be indexed.
Improve thin content pages
Thin content is basically pages with little useful content or pages that are mostly duplicates of other pages (sample content).
Just filling the pages with words is not enough. Words and sentences should be meaningful and relevant to website visitors.
For the product, it could be measurements, weight, available colors, suggestions on other products to combine with it, brands that the product works best, links to instructions for use. , FAQs, ratings and other information that users will find valuable.
Solve crawling not indexed for more online sales
In a physical store, it may seem like simply putting the product on the shelf is enough.
But the reality is that it often takes knowledgeable salespeople to get those products off the shelves.
A website can play the role of a savvy salesperson who can inform Google why the page should be indexed and help customers choose those products.
Watch Google SEO Office Hours at 13:41:
#Google #Fixing #Detected #indexed