⭐ Beginner — No coding experience needed
Ecommerce Crawl: How to Audit a Product Catalogue
Learn how to crawl an ecommerce site, what unique problems product pages have, and how to handle the faceted navigation crawl-explosion problem.
What you will learn in this guide
- What's different about crawling ecommerce sites
- Product Detail Pages (PDPs) vs Product Listing Pages (PLPs)
- The faceted navigation problem and how to handle it
- How to deal with out-of-stock products
- Orphan products and how to find them
1 Why ecommerce is different
Ecommerce sites are crawl-budget intensive. A modest catalogue of 1,000 products can generate 100,000+ indexable URL variants once you factor in faceted navigation, sorting, pagination and variant URLs.
The core challenge:Telling Google which URLs matter (canonical PDPs and PLPs) and which don't (filter combinations, sort orders, paginated tail). Get this wrong and Google wastes its crawl budget on URL variants.
2 PDP vs PLP audit
| Page type | What matters | Common issues |
|---|---|---|
| Product Detail Page (PDP) | Product schema, images, reviews, price | Missing schema fields, stock-out handling, variant URLs |
| Product Listing Page (PLP) | Faceted nav, pagination, sorting | Crawl explosion, thin or duplicate content |
| Category hub | Internal linking, descriptive content | Thin content above products |
| Collections | Curated product groups | Often orphaned from main nav |
3 Faceted navigation
Faceted nav lets users filter products: by colour, size, price, brand, etc. The SEO problem: every combination is a unique URL.
- 1Identify which facets are valuable for SEOBrand and major category facets often have search demand. Sort order and price range usually don't.
- 2Allow valuable facets to be indexedBrand pages can rank well.
?brand=nikemay be worth indexing with appropriate canonical and content. - 3Block low-value facets from crawlingSort, view, price-range, in-stock filter combinations — add
Disallow:rules to robots.txt for those parameters. - 4Canonicalise variants to the parentMulti-facet URLs (colour+size+price) usually shouldn't be indexed. Canonical them to the parent category.
4 Out-of-stock and orphan products
- 1Audit OOS productsTemporary OOS: emit Product schema with availability=OutOfStock, keep the page live. Permanent OOS: 301 redirect to the parent category or a similar product.
- 2Find orphan productsProducts that exist in the database and sitemap but have zero internal links from any category, collection, or related-product widget. Either link them or hide them from the sitemap.
- 3Audit related-product widgetsMost ecommerce platforms have related-product carousels. Make sure they actually link to other products, not just back to the same product.
- 4Track per-section healthUse the ecommerce crawl to measure: how many PDPs are indexed, how many have valid schema, how many have unique content. Each metric is a separate health indicator.
Quick win for most ecommerce sitesBlock sort/order/view parameters from crawling. This usually reduces indexable URLs by 60-80% without hurting any actual rankings, and dramatically improves crawl efficiency.