What is the difference and how to fix both

Google Search Console warns publishers of 404 errors: 404 and Soft 404.

Although they are both called 404, they are very different.

Consequently, it is important to understand the difference between the errors in order to fix them.

HTTP status codes

A web page accessed by a browser responds with a status code that indicates whether the request was successful and if not, why not.

These replies are transmitted with so-called HTTP response codes, but are officially referred to as HTTP status codes.

A server provides five categories of response codes; This article is specifically about one response, the Page 404 Not Found status code.

The meaning of a 404 response code

All codes within the 4xx response series mean that the request could not be fulfilled because the page could not be found.

The official definition is:

4xx (Client Error): The request contains bad syntax or cannot be fulfilled

The 404 response is ambiguous as to whether the webpage could return.

Examples of why 404 page is not found

  • If someone accidentally deletes a web page, the server responds with a 404 page not found response.
  • When someone links to a non-existent website, the server responds that the page was not found (404).

The official documentation is clear about the ambiguity of whether a page disappeared temporarily or permanently:

“The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is unwilling to disclose that one exists.

A 404 status code does not indicate whether this lack of representation is temporary or permanent…”

In summary, the 404 code Page not found means that the browser request failed because the requested page could not be found.

What is a soft 404 error?

A soft 404 error is not an official status code. The server does not send a soft 404 response to a browser because there is no soft 404 status code.

soft 404 describes a situation where the server presents a web page and responds with a 200 OK status code indicating success when the web page or content is actually missing.

Four common reasons for a soft 404

A website is missing and a server sends the status 200 OK.

This type of soft 404 occurs when a page is missing, but the server configuration redirects the missing page to the home page or a custom URL.

The page is gone, but the publisher has taken steps to address the missing page request.

Content is missing or “thin”.

When content is completely missing or there is very little (also known as sparse content), the server responds with a 200 status code, meaning the request for the page was successful.

But for indexing webpages that are not successful webpage queries, search engines call this 404 soft errors.

The missing page redirects to the home page.

Some mistakenly believe that there is something wrong with a 404 error response.

So, to stop the 404 error responses, a publisher can redirect the missing page to the home page even though the home page is not the requested one.

Google calls these failed page requests soft 404s.

Missing page redirected to a custom webpage.

Sometimes missing pages redirect to a custom webpage that provides a 200 status code, resulting in Google flagging these pages as 404 soft errors.

Who Invented the Phrase Soft 404?

The concept of a soft 404 may have come from a 2004 research paper titled Towards an Understanding of the Web’s Decay (PDF).

The missing pages that are improperly replaced pose a problem for search engines trying to index real pages.

Here’s how the research paper frames Soft 404:

“According to the HTTP protocol, when a request is made to a server for a page that is no longer available, the server should return an error code…

… In fact, many servers, including the most reputable ones, do not return a 404 code – instead, the servers return a replacement page and an OK code (200).

… Our study shows that these types of substitutions, called “soft 404s,” account for more than 15% of dead links.”

Soft 404 due to coding errors

There are cases where the page is not missing, but certain issues (e.g. coding errors) caused Google to categorize it as a missing page.

Soft 404 errors are important to investigate as they could signal bad code.

Typical coding problems:

  • Missing file or include that should fill a webpage with content.
  • database error.
  • Missing JavaScript.
  • Empty search results pages.

404 errors have two main causes

  • An error in the link redirects users to a page that doesn’t exist.
  • A link to a page that used to exist but has suddenly disappeared.

link error

If the cause of the 404 is a linking error, you need to fix the links.

The tricky part of this task is finding all the broken links on a website. Large, complex websites with thousands or millions of pages can be more difficult to crawl.

In such cases, crawling tools come in handy.

You have so many site crawler software options to choose from: the free Xenu and Greenflare; or paid software like Screaming Frog, DeepCrawl, Botify, Sitebulb, and OnCrawl, some of which have free trial versions or free but limited feature versions.

A page that no longer exists

If a page no longer exists, you have two options:

  • Restore the page if the removal was accidental.
  • 301 redirect to the next related page if the removal was intentional.

First you need to locate any linking errors on the site. Similar to finding all the errors in linking a large website, you can use crawling tools.

However, crawling tools may not find orphan pages: pages that are not linked from anywhere within the navigation links or from any of the pages.

Orphan pages may exist if they used to be part of the site, then after a site redesign the link to that old page will disappear, but external links from other sites can still point to them.

To check if these types of pages are present on your website, you can use different tools.

How to identify 404 response pages

Google Search Console reports

The coverage report lists 404 error URLs on a website.

Screenshot from Google Search Console, August 2022

Search Console reports 404 pages while Google searches all the pages it can find. This may include links from other websites to a page that previously existed on your website.

Google Analytics

By default, you won’t find a missing pages report in Google Analytics. However, you can track them in a variety of ways.

For one, you can create a custom report and segment pages with a page title mention Error 404 – Page not found.

Another way to find orphan pages in Google Analytics is to create custom content groupings and assign all 404 pages to a content group.

Location: operator search command

One cannot use the site: search command to find 404 errors because Google does not index 404 webpages or soft 404 webpages.

Website by Google: The search operator is useful for finding web pages on a website that contain a specific keyword phrase in the web page content.

Google’s Search Console is the best source for identifying a list of soft 404s and regular 404s.

Website traffic error logs are a useful source for identifying 404 error responses.

Other backlink research tools

Backlink research tools like Majestic, Ahrefs, Moz Open Site Explorer, Sistrix, Semrush, LinkResearchTools and CognitiveSEO can also be helpful.

Most of these tools will export a list of backlinks pointing to your domain. From there you can check all linked pages and check for 404 errors.

How to fix soft 404 errors

Crawling tools don’t recognize a soft 404 because it’s not a 404 error. But you can use crawling tools to catch something else.

Here are a few things to find:

  • Thin content: Some crawling tools report pages with thin content and a sortable word count. Start with pages with the fewest words to assess if the page has sparse content.
  • Duplicate content: Some crawling tools are sophisticated enough to recognize what percentage of the page is template content. And there are also tools specifically designed for finding internal duplicate content like SiteLiner. If the main content is almost the same as on many other pages, you should take a look at those pages and determine why there is duplicate content on your site.

Aside from the crawling tools, you can also use the Google Search Console and look under Crawl Errors for pages listed under Soft 404s.

By scouring an entire website to find problems that cause soft 404 errors, you can pinpoint and fix problems before Google finds them.

After identifying these Soft 404 problems, you need to fix them.

Most of the time the solutions seem to be common sense. This can include simple things like expanding pages with thin content or replacing duplicate content with new and unique content.

There are a few things to keep in mind during this process:

consolidate pages

Sometimes thin content is caused by the page topic being too specific, giving you little to say.

Merging multiple thin pages into one page may make more sense if the themes are related. Not only does this solve thin content issues, but it can also fix duplicate content issues.

For example, an e-commerce site that sells shoes in different colors and sizes might have a different URL for each size and color combination. This leaves a large number of pages of thin and relatively identical content.

The more effective approach is to instead summarize this on one page and list the options available.

Find technical issues causing duplicate content

Even with the most basic web crawling tool like Xenu (which doesn’t search for content, only URLs, response codes, and title tags) you can still find duplicate content issues by looking at URLs.

This includes www vs. non-www URLs, HTTP and HTTPS, with index.html and without, with tracking parameters and without, etc.

404 errors and soft 404 errors

The most important thing to remember about 404 errors is that if the pages are really missing, there’s nothing to fix. It’s okay to show a 404 response for requests for pages that don’t exist.

But if the pages exist but under a different URL, then that needs to be fixed by redirecting a broken link to the actual URL, restoring a missing page, or redirecting the old URL to a new page that replaced it.

A soft 404 is always the result of an issue that needs to be diagnosed and fixed.

Understanding the difference between the 404 errors is essential to running a peak performing website.


Featured image: Paulo Bobita/Search Engine Journal

Comments are closed.