/ Robots & Sitemap Fixes / Sitemap Size

How to Fix Sitemap Size

Sitemap files have hard limits: 50,000 URLs OR 50MB uncompressed, whichever comes first. Exceed either and Google rejects the file entirely or processes only the first valid portion. Most sites never hit these limits, but ecommerce stores, large publishers, and forums regularly do. The fix is the sitemap index pattern — one master file references multiple smaller child sitemaps, each within limits. This guide covers the split strategies, generation patterns, and gzip compression.

1. Check current state

Step 1
Count URLs in your sitemap
curl -s https://example.com/sitemap.xml | grep -c "<loc>"

# If > 50,000 you need to split
# If close to 50k, split sooner rather than later
Step 2
Check file size
curl -sI https://example.com/sitemap.xml | grep -i content-length

# Compare to 52428800 bytes (50MB)
# If close to 50MB, split or gzip immediately

2. The sitemap index pattern

Single index file references all child sitemaps:

<!-- sitemap.xml (the index) -->
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2024-01-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2024-01-20</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products-1.xml</loc>
    <lastmod>2024-01-20</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products-2.xml</loc>
    <lastmod>2024-01-20</lastmod>
  </sitemap>
</sitemapindex>

Child sitemap files have the same format as a standalone sitemap:

<!-- sitemap-pages.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/about/</loc>
    <lastmod>2024-01-15</lastmod>
  </url>
  <!-- ... up to 50,000 URLs ... -->
</urlset>

3. Pick a split strategy

Strategy 1: By content type

Most natural for most sites:

Strategy 2: By date (large publishers)

Useful for news sites and high-volume blogs:

sitemap-posts-2024.xml
sitemap-posts-2023.xml
sitemap-posts-2022.xml
sitemap-archive.xml

Older content rarely updates, so older sitemaps don't need frequent regeneration.

Strategy 3: By section (ecommerce / marketplace)

sitemap-products-electronics.xml
sitemap-products-clothing.xml
sitemap-products-home.xml
sitemap-stores.xml
sitemap-brands.xml

Strategy 4: Numbered chunks (auto-split)

When one logical group exceeds 50k:

sitemap-products-1.xml  (URLs 1-50,000)
sitemap-products-2.xml  (URLs 50,001-100,000)
sitemap-products-3.xml  (URLs 100,001-150,000)

4. Plugin / framework patterns

WordPress: Yoast SEO

Auto-generates sitemap index at /sitemap_index.xml. Splits by content type by default, further splits each at 1,000 URLs.

// Customise URLs per sitemap
add_filter('wpseo_sitemap_entries_per_page', function() {
  return 2000;  // Default 1000, can go up to 50000
});

// Exclude content types
add_filter('wpseo_sitemap_exclude_post_type', function($excluded, $type) {
  if ($type === 'attachment') return true;
  return $excluded;
}, 10, 2);

WordPress: Rank Math

Similar pattern at /sitemap_index.xml. Configurable per content type in plugin settings.

Next.js

// app/sitemap.ts — sitemap index
import type { MetadataRoute } from 'next';

export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  return [
    {
      url: 'https://example.com/sitemap-pages.xml',
      lastModified: new Date(),
    },
    {
      url: 'https://example.com/sitemap-posts.xml',
      lastModified: new Date(),
    },
  ];
}

// app/sitemap-pages.xml/route.ts — generate child sitemap
import { generateSitemap } from '@/lib/sitemap';

export async function GET() {
  const pages = await getPagesForSitemap();
  return new Response(generateSitemap(pages), {
    headers: { 'Content-Type': 'application/xml' }
  });
}

Astro

// @astrojs/sitemap auto-splits at 45,000 URLs per file
import sitemap from '@astrojs/sitemap';

export default defineConfig({
  site: 'https://example.com',
  integrations: [sitemap({
    entryLimit: 45000,  // optional, default 45000
    filter: (page) => !page.includes('/admin/'),
  })]
});

Custom generator (Python)

from xml.etree.ElementTree import Element, SubElement, tostring
from datetime import datetime

def generate_sitemap_chunk(urls, output_path):
    urlset = Element('urlset', xmlns='http://www.sitemaps.org/schemas/sitemap/0.9')
    for url in urls:
        u = SubElement(urlset, 'url')
        SubElement(u, 'loc').text = url['loc']
        if 'lastmod' in url:
            SubElement(u, 'lastmod').text = url['lastmod']
    with open(output_path, 'wb') as f:
        f.write(b'<?xml version="1.0" encoding="UTF-8"?>\n')
        f.write(tostring(urlset))

def generate_sitemaps(all_urls, base_url, chunk_size=45000):
    chunks = [all_urls[i:i+chunk_size] for i in range(0, len(all_urls), chunk_size)]
    
    sitemap_urls = []
    for i, chunk in enumerate(chunks):
        filename = f'sitemap-{i+1}.xml'
        generate_sitemap_chunk(chunk, f'public/{filename}')
        sitemap_urls.append(f'{base_url}/{filename}')
    
    # Generate index
    sitemapindex = Element('sitemapindex', xmlns='http://www.sitemaps.org/schemas/sitemap/0.9')
    for url in sitemap_urls:
        s = SubElement(sitemapindex, 'sitemap')
        SubElement(s, 'loc').text = url
        SubElement(s, 'lastmod').text = datetime.now().isoformat()
    
    with open('public/sitemap.xml', 'wb') as f:
        f.write(b'<?xml version="1.0" encoding="UTF-8"?>\n')
        f.write(tostring(sitemapindex))

5. Gzip compression

XML compresses extremely well — sitemap files often shrink 10x with gzip. Google and Bing both support .xml.gz.

Static gzip files

# Generate compressed version
gzip -k sitemap.xml
# Creates sitemap.xml.gz, keeps original

Reference the gzipped version in robots.txt:

Sitemap: https://example.com/sitemap.xml.gz

nginx on-the-fly gzip

server {
  gzip on;
  gzip_types application/xml text/xml;
  gzip_min_length 1000;
  gzip_comp_level 6;
}

Browser sets Accept-Encoding: gzip, nginx compresses the response automatically. Original file stays uncompressed on disk.

6. Verify with Google

Step 1
Submit index to Search Console
Submit only the index URL (https://example.com/sitemap.xml or /sitemap_index.xml). Google discovers child sitemaps automatically.
Step 2
Wait for processing
Search Console → Sitemaps. Each child sitemap appears listed with URL count, status, last read date.
Step 3
Check for errors
Status "Couldn't fetch" = file too large, malformed, or 404. "Has issues" = some URLs problematic. Click for details.

7. Common mistakes

Mistake 1: Submitting child sitemaps individually

Submit only the index. Google reads the index and fetches children automatically. Submitting children separately creates redundant management work.

Mistake 2: Inconsistent lastmod between index and children

Child sitemap has lastmod 2024-01-20. Index references that child with lastmod 2023-12-01. Crawlers use the older date and may not re-fetch the child. Always update both.

Mistake 3: Forgetting to update index when adding children

New child sitemap deployed but not added to index. Google never finds it. Always regenerate the index after child changes.

Mistake 4: Mixing sitemap and sitemap-index in one file

One file is either a sitemap (urlset root) or an index (sitemapindex root). Can't have urlset elements in a sitemapindex root or vice versa.

8. Verify resolution

Step 1
Each file under limits
for f in /var/www/html/sitemap-*.xml; do
  size=$(stat -c%s "$f")
  count=$(grep -c "<loc>" "$f")
  echo "$count URLs, $((size/1024)) KB - $f"
done

# Each line should show count <= 50000, size <= 51200 KB
Step 2
Re-run Robots Tester
Size warnings clear. Each sitemap validates. URL counts match expectations.
💡 Start splitting by content type even before hitting limits. Smaller specialized sitemaps regenerate faster and let crawlers focus their attention. A 10k-URL pages sitemap that's always fresh is more useful than a 60k-URL mega-sitemap that updates slowly.

🤖 Re-run the Robots & Sitemap Tester

Verify size warnings cleared and index loads cleanly.

Run Tester →
Related Guides: Robots & Sitemap Fixes  ·  Fix Sitemap Declaration  ·  Fix Sitemap 404s  ·  Robots & Sitemap Guide
💬 Got a problem?