The Site Crawler simulates how Googlebot moves through your site and reports back the issues a search engine would hit β broken links, redirect chains, orphan pages, duplicate content, blocked URLs and sitemap mismatches. This index covers exactly how to fix every category of finding it raises, with concrete examples for static sites, WordPress, and crawler-friendly architectures.
Most crawler issues fall into one of these categories. Click through for the step-by-step fix:
nofollow on outbound that you cannot fix, and Plesk/cPanel/htaccess redirect patterns.http:// β https://www. β https:// destroys crawl budget. How to collapse multi-hop redirects to a single 301, detect redirect loops, and handle the case where your CDN and origin both want to redirect.rel="canonical" on the duplicate pointing at the primary, or 301 if you do not need the duplicate URL. Covers query-string variants, pagination, and printer-friendly URLs.Disallow: in robots.txt or noindex in meta. Common cause: staging configuration left in production. Audit, remove, and verify in Google Search Console.noindex, when to merge, and when to delete entirely. Covers WordPress, Drupal and custom CMS patterns.Where the fixes are actually configured depends on your CMS or hosting setup:
noindex trap in Reading Settings, fixing pagination canonicals, and the wp-admin crawl trap./products/ vs /collections/products/ issue, and what you cannot fix on Shopify.The crawler simulates a real Googlebot crawl on up to 10,000 URLs per scan. It checks response codes, redirect chains, internal link structure, canonical consistency, sitemap accuracy, robots.txt compliance, page rendering speed, and content depth. For the complete reference of every check, read the complete Site Crawler Guide, or see the example crawl report for what a clean crawl looks like.
Before fixing, run the crawler and see exactly what fails. Most sites have 3-5 main categories of finding β fixing those clears 80% of issues.
Run Site Crawler β