/ Agent Readiness Fixes / Semantic HTML

How to Fix Semantic HTML for AI Engines

Div soup — pages built entirely of nested divs with class names — works visually but tells AI engines nothing about content structure. Where does the article end and the sidebar begin? Is that navigation or main content? Semantic HTML5 elements (article, main, nav, aside, header, footer, section) make boundaries explicit. AI engines extract main content cleanly and ignore boilerplate. This guide covers the swap.

1. The semantic elements

ElementPurposeImplicit ARIA role
<header>Page or section intro, brandingbanner (if page-level)
<nav>Major navigation blocknavigation
<main>Primary content of the documentmain
<article>Self-contained content (post, product, comment)article
<section>Thematic grouping with headingregion
<aside>Tangentially related (sidebar, callout)complementary
<footer>Footer of document or sectioncontentinfo (if page-level)
<address>Contact info for nearest article/body
<time>Date/time with machine-readable datetime
<figure> + <figcaption>Image/diagram with captionfigure

2. Page-level structure

Bad: div soup

<div class="page">
  <div class="header">
    <div class="logo">...</div>
    <div class="nav-bar">
      <div class="nav-item">...</div>
    </div>
  </div>
  <div class="content">
    <div class="article">
      <div class="title">...</div>
      <div class="body">...</div>
    </div>
    <div class="sidebar">...</div>
  </div>
  <div class="footer">...</div>
</div>

Good: semantic HTML5

<body>
  <header>
    <a href="/" class="logo">Acme</a>
    <nav aria-label="Main">
      <ul>
        <li><a href="/audit-tools.html">Products</a></li>
        <li><a href="/seo-audit-platform.html">About</a></li>
      </ul>
    </nav>
  </header>
  
  <main>
    <article>
      <header>
        <h1>Article title</h1>
        <p>By <a href="/learning-hub.html">Jane</a>, 
           <time datetime="2026-05-18">18 May 2026</time></p>
      </header>
      
      <section>
        <h2>Introduction</h2>
        <p>...</p>
      </section>
      
      <section>
        <h2>Methodology</h2>
        <p>...</p>
      </section>
      
      <footer>
        <p>Tags: <a href="/learning-hub.html">CRM</a></p>
      </footer>
    </article>
    
    <aside aria-label="Related">
      <h2>Related articles</h2>
      <ul>...</ul>
    </aside>
  </main>
  
  <footer>
    <p>© 2026 Acme. <a href="/seo-auth/privacy.html">Privacy</a></p>
  </footer>
</body>

3. Article vs section decision

Article: self-contained, could syndicate. Blog post, news item, product card, comment, forum reply.

Section: thematic chunk within larger content. Introduction, Methodology, FAQ within an article.

<!-- Article contains multiple sections -->
<article>
  <h1>Complete CRM buyer's guide</h1>
  
  <section>
    <h2>What is a CRM?</h2>
    <p>...</p>
  </section>
  
  <section>
    <h2>How to evaluate options</h2>
    <p>...</p>
  </section>
</article>

<!-- Multiple articles on a listing page -->
<main>
  <h1>Latest posts</h1>
  
  <article>
    <h2><a href="/learning-hub.html">Post 1</a></h2>
    <p>Excerpt...</p>
  </article>
  
  <article>
    <h2><a href="/learning-hub.html">Post 2</a></h2>
    <p>Excerpt...</p>
  </article>
</main>

4. Headings rules

One h1 per page. h2 starts main sections. Don't skip levels (h2 → h4 is wrong).

<main>
  <article>
    <h1>Complete CRM buyer's guide</h1>
    
    <section>
      <h2>Pricing</h2>
      
      <section>
        <h3>Subscription vs perpetual</h3>
        <p>...</p>
      </section>
      
      <section>
        <h3>Per-user vs flat-rate</h3>
        <p>...</p>
      </section>
    </section>
    
    <section>
      <h2>Implementation</h2>
      <p>...</p>
    </section>
  </article>
</main>

5. Figure and figcaption

<figure>
  <img src="/charts/cwv-improvement.png" 
       alt="CWV improvement chart showing 40% LCP reduction over 3 months"
       width="800" height="400" />
  <figcaption>
    Core Web Vitals improvement after CDN migration: LCP dropped 40%, 
    FCP dropped 35%, TTFB dropped 75% (Jan-Mar 2026)
  </figcaption>
</figure>

<!-- figcaption text is extractable; alt text is fallback -->
<!-- Both reach AI engines -->

6. Time element

<p>Published 
  <time datetime="2026-05-18T09:00:00Z">18 May 2026</time>
</p>

<p>Updated 
  <time datetime="2026-08-22T14:30:00Z">22 August 2026</time>
</p>

<!-- Machine-readable datetime attribute lets AI engines parse exact instant -->
<!-- Visible text can be human-friendly -->

7. Multiple navs — disambiguate

<nav aria-label="Main">
  <ul>
    <li><a href="/audit-tools.html">Products</a></li>
    <!-- ... -->
  </ul>
</nav>

<nav aria-label="Breadcrumb">
  <ol>
    <li><a href="/">Home</a></li>
    <li><a href="/audit-tools.html">Products</a></li>
    <li aria-current="page">Widget</li>
  </ol>
</nav>

<nav aria-label="On this page">
  <ul>
    <li><a href="#intro">Introduction</a></li>
    <li><a href="#pricing">Pricing</a></li>
  </ul>
</nav>

<!-- aria-label differentiates multiple navs to AI engines and screen readers -->

8. Validate

Step 1
W3C HTML validator
validator.w3.org/nu — paste URL. Catches nesting errors, missing required elements, deprecated attributes.
Step 2
Outline checker
Browser DevTools → Accessibility tab → "Landmarks" view. Shows the semantic structure as AI engines and screen readers see it. Missing main, multiple unlabelled navs, no banner — all visible immediately.
💡 The 80/20 fix: wrap your existing template in <header> + <main> + <footer> at minimum. That single change tells every AI engine where your boilerplate ends and content begins. Then iteratively swap inner divs for article, section, nav, aside as you touch templates.

🤖 Re-run Agent Readiness audit

Verify semantic structure across templates.

Run Agent Readiness →
Related Guides: Agent Readiness Fixes  ·  Fix Content Extractability  ·  Fix Heading Hierarchy
💬 Got a problem?