Broken internal links are the most common Site Crawler finding on established sites. Every link that 404s wastes crawl budget, leaks link equity, and harms user experience. The fix is conceptually simple — replace, redirect or remove — but doing it at scale across thousands of pages takes a structured approach. This guide walks through diagnosis, bulk-fixing across major CMS platforms, and the CI safeguards that prevent recurrence.
Before fixing anything, get the full list of broken internal links with their source page and target URL. Spot-fixing one at a time wastes hours.
Every broken target falls into one of these buckets, each with a different fix path:
/blog/post-titl/ instead of /blog/post-title/. The destination exists; the link has a typo. Fix the source link, not the destination.
/path/ but the link omits the slash, and the server doesn't auto-redirect to the canonical. Either fix the server to 301 the non-canonical to canonical, or fix the links.
http:// after HTTPS migration. The server should 301 these but the redirect is a wasted hop. Find and replace http://yourdomain.com with https://yourdomain.com in CMS content.
/Blog/Post/ vs /blog/post/. Linux servers are case-sensitive; the URL with wrong case 404s. Force-lowercase via server rewrite, or fix the links.
https://yourdomain.com/old-broken-pathhttps://yourdomain.com/new-pathwp_posts, wp_postmeta, wp_optionsUPDATE queries with REPLACE() because they corrupt serialised data.old-broken-path). Edit each occurrence in .liquid templates and .html snippets.
admin/bulk?resource_name=Product&edit=descriptionHtml to mass-edit. Or for very large catalogues use a CSV export, find/replace in spreadsheet, re-import.
# Example: Contentful Node.js
const contentful = require('contentful-management');
const client = contentful.createClient({ accessToken: 'YOUR_TOKEN' });
const space = await client.getSpace('SPACE_ID');
const env = await space.getEnvironment('master');
const entries = await env.getEntries({ 'fields.body[match]': 'old-broken-path' });
for (const entry of entries.items) {
entry.fields.body['en-US'] = entry.fields.body['en-US'].replace(/old-broken-path/g, 'new-path');
await entry.update();
}
After bulk-fixing, run the Site Crawler again. The broken-links count should be near zero. Investigate any remaining findings — usually they're edge cases the bulk-fix missed (different URL format, hardcoded in template code, etc.).
The fix is permanent only if you prevent future breakage. Add a link-check step to your deployment pipeline.
linkinator (Node) or muffet (Go) in your CI pipeline:
# GitHub Actions example - name: Check links run: npx linkinator https://staging.yourdomain.com --recurse --skip ^https?://(twitter|facebook)Fail the build on any broken internal links. Now new broken links never reach production.