Digital Marketing Experts

JavaScript Website & Crawler Issues

Googlebot was once unable to crawl and index content that was created dynamically using JavaScript. Static HTML source code was a lot more straight-forward for the bots to take a content understanding from.

Google Advances

However, with advancements, in recent years, Google are now able to render and understand web pages like a modern-day browser does.

Google Guidelines

Google updated their guidelines in 2019 and are now able to render pages server-side (server-side rendering) or as some call it – pre-rendering or dynamic rendering rather than rely on client-side JavaScript.

Google explained – "it’s difficult to process JavaScript and not all search engine crawlers are able to process it successfully or immediately".

This has in no way prevented the increase in people using JavaScript to build websites, with JavaScript MVW framework, AngularJS and other frameworks like React and single page applications (SPA) and progressive web apps (PWA) increasing in usage.

Search engines now, essentially, need to read the DOM after JavaScript has come into play and then construct the web page and understand the differences between the original response HTML, when crawling and evaluating websites.

In 2019 Google updated their web rendering service (WRS) to be evergreen, so it’s now up to date with the most recent version of Chrome. So, rendering and indexing JS websites for Google is no longer an issue.

How to Crawl a JS website

So how do you go about performing a crawl of a JS website? Screaming Frog enables this functionality but you’ll need a license, there are also a whole host of websites offering this service. Personally, we go straight to Search Console and check performance and then page data to ensure all URLs are listed, so ensuring indexation. Alternatively – will show all pages currently indexed in Google.

This works for some websites, a 500,000 pages plus website may not work with the above method, well not at a manageable level.

JavaScript Core Principles

All the resources contained on a web page will add weight to a server; things like images, CSS and JS calls add weight to a page as its being loaded.

A web page is only fully rendered once all the resources are pulled in. So, not really an issue for small websites though a large website with thousands of pages may encounter issues.

Ultimately, JS used to dynamically manipulate a web page significantly will cause issue in terms of load times, as they may often more than not rely on client-side JS for key content or links. JS frameworks can be quite different to one another; the SEO implications are very different to a traditional HTML website.

To fully ensure a web page will be crawled and indexed the following principles and limitations need to be understood:

  • All the resources on the page have to be available to be crawled, rendered and indexed.
  • Clean and unique URLs are required by Google, and links need be in a proper HTML anchor tag (static as well as JS functioning links can be offered).
  • JS isn’t used to load additional events after the render occurs (hover, clicks or scrolling for example).
  • All aspects of the page are rendered for the snapshot, ideally when network activity as stopped, or over a certain time threshold. If it takes too long to render elements of the page may be skipped and so not seen and indexed.
  • Google will render all pages but won’t queue pages that have a ‘noindex’ in the initial HTTP response or static HTML.
  • With Google, rendering is completely separate from indexing. Initially the static HTML is crawled, and defers rendering until it has resource. Only then it discovers further contant and links available in the rendered HTML. Google now do this quickly, sometimes in seconds when in the past it could take up to a week.

Google advises against relying on client-side JavaScript and recommends developing with progressive enhancement, building the site’s structure and navigation using only HTML, with improvements coming in secondary using AJAX.

If you are using a JS framework, rather than relying on a fully client-side rendered approach, Google recommends using server-side rendering, pre-rendering or hybrid rendering which can improve performance for users and search engine crawlers.

Server-side rendering (SSR) and pre-rendering execute the pages JavaScript and deliver a rendered initial HTML version of the page to both users and search engines.

Still having problems? Then get in touch, we’ve dealt with many websites which utilise JS to aid functionality.

There are 0 comments

Add your comment