/ Site Crawler Fixes / Broken Links

How to Fix Broken Internal Links

Broken internal links are the most common Site Crawler finding on established sites. Every link that 404s wastes crawl budget, leaks link equity, and harms user experience. The fix is conceptually simple — replace, redirect or remove — but doing it at scale across thousands of pages takes a structured approach. This guide walks through diagnosis, bulk-fixing across major CMS platforms, and the CI safeguards that prevent recurrence.

1. Generate the complete list

Before fixing anything, get the full list of broken internal links with their source page and target URL. Spot-fixing one at a time wastes hours.

Step 1
Run the Site Crawler
Run a fresh Site Crawler on your domain. Export the broken-links report as CSV. You'll get columns for source URL, target URL, anchor text, and status code (404, 410, 500, etc.).
Step 2
Group findings by target
Sort the CSV by target URL. You'll typically see that 50-200 broken links collapse into 10-30 unique broken targets. Fixing one target eliminates many findings at once.

2. Classify each broken target

Every broken target falls into one of these buckets, each with a different fix path:

Bucket 1
Typo in the URL
/blog/post-titl/ instead of /blog/post-title/. The destination exists; the link has a typo. Fix the source link, not the destination.
Bucket 2
Trailing-slash mismatch
Site canonical is /path/ but the link omits the slash, and the server doesn't auto-redirect to the canonical. Either fix the server to 301 the non-canonical to canonical, or fix the links.
Bucket 3
HTTP-vs-HTTPS mismatch
Old internal links still pointing at http:// after HTTPS migration. The server should 301 these but the redirect is a wasted hop. Find and replace http://yourdomain.com with https://yourdomain.com in CMS content.
Bucket 4
Deleted destination
The destination page genuinely no longer exists. Two paths: 301 the old URL to the closest live replacement (and update internal links to point at the replacement to avoid the hop), or update internal links to point at a different page entirely.
Bucket 5
Case-sensitivity mismatch
/Blog/Post/ vs /blog/post/. Linux servers are case-sensitive; the URL with wrong case 404s. Force-lowercase via server rewrite, or fix the links.

3. Bulk-fix in WordPress

Step 1
Install Better Search Replace
WP Admin → Plugins → Add New → search "Better Search Replace" → Install + Activate. The plugin runs database-level find-and-replace.
Step 2
Run a dry-run first
Tools → Better Search Replace:
  • Search for: https://yourdomain.com/old-broken-path
  • Replace with: https://yourdomain.com/new-path
  • Tables: select wp_posts, wp_postmeta, wp_options
  • Tick Run as dry run?
  • Click Run
Dry run reports how many rows would change. Verify the count looks right.
Step 3
Backup, then run for real
Take a database backup (UpdraftPlus or via hosting panel). Untick Dry Run. Click Run. Changes apply immediately.
⚠️ Better Search Replace serialises PHP arrays correctly — DO NOT use raw SQL UPDATE queries with REPLACE() because they corrupt serialised data.

4. Bulk-fix in Shopify

Step 1
Search theme code
Shopify Admin → Online Store → Themes → Actions → Edit code. Use the search box at the top of the file list. Search for the broken URL fragment (e.g. old-broken-path). Edit each occurrence in .liquid templates and .html snippets.
Step 2
Search product descriptions
Theme code is only half. Products, collections, pages and blog posts contain HTML links too. Use Shopify's bulk-edit URL admin/bulk?resource_name=Product&edit=descriptionHtml to mass-edit. Or for very large catalogues use a CSV export, find/replace in spreadsheet, re-import.

5. Bulk-fix in headless / custom CMS

Step 1
Use the CMS API
Contentful, Sanity, Strapi, Storyblok all expose APIs to bulk-update content. Write a small script that queries documents containing the broken URL, replaces it, and updates. Test on staging first.
# Example: Contentful Node.js
const contentful = require('contentful-management');
const client = contentful.createClient({ accessToken: 'YOUR_TOKEN' });
const space = await client.getSpace('SPACE_ID');
const env = await space.getEnvironment('master');
const entries = await env.getEntries({ 'fields.body[match]': 'old-broken-path' });
for (const entry of entries.items) {
  entry.fields.body['en-US'] = entry.fields.body['en-US'].replace(/old-broken-path/g, 'new-path');
  await entry.update();
}

6. Re-crawl and verify

After bulk-fixing, run the Site Crawler again. The broken-links count should be near zero. Investigate any remaining findings — usually they're edge cases the bulk-fix missed (different URL format, hardcoded in template code, etc.).

7. Prevent recurrence with CI

The fix is permanent only if you prevent future breakage. Add a link-check step to your deployment pipeline.

Step 1
Add a CI link-check step
Use linkinator (Node) or muffet (Go) in your CI pipeline:
# GitHub Actions example
- name: Check links
  run: npx linkinator https://staging.yourdomain.com --recurse --skip ^https?://(twitter|facebook)
Fail the build on any broken internal links. Now new broken links never reach production.
💡 Many CMS platforms offer "broken link checker" plugins that run continuously. They catch breakage caused by external changes (linked-to sites deleting their pages) faster than periodic crawls.

🕷 Re-run the Site Crawler

Verify broken links are gone after the bulk fix.

Run Site Crawler →
Related Guides: Site Crawler Fixes  ·  Fix Orphan Pages  ·  Fix Redirect Chains  ·  Site Crawler Guide
💬 Got a problem?