/ HTML Checker Fixes / Encoding

How to Fix Character Encoding Errors

Encoding errors produce mojibake — text like "café" rendering as "café" or "—" rendering as "â€"". The root cause is a mismatch somewhere in the chain: file saved as one encoding, server declares another, browser interprets a third, database stores a fourth. The fix is alignment: UTF-8 everywhere, BOM nowhere, utf8mb4 in MySQL. This guide walks through auditing each layer and fixing the breaks. For related fixes, see the HTML Checker Fixes index.

1. Audit your current encoding

Step 1
Check meta charset
View page source. The <head> should contain near the top:
<meta charset="UTF-8">
If you see charset="ISO-8859-1", charset="windows-1252", or no charset at all — that's a problem.
Step 2
Check HTTP Content-Type header
curl -sI https://yourdomain.com/ | grep -i content-type
Expected: content-type: text/html; charset=UTF-8. If charset is missing or different from meta, that's where mojibake comes from.
Step 3
Check file encoding
file -i /path/to/template.php
Expected: charset=utf-8. charset=iso-8859-1 means the file is saved wrong. charset=utf-8 with BOM means there's a Byte Order Mark to remove.
Step 4
Check database charset
MySQL:
SHOW VARIABLES LIKE 'character_set%';
Want all entries set to utf8mb4. utf8 (MySQL's mistake — only 3 bytes) doesn't support emoji or rare characters.

2. Set meta charset correctly

Step 1
Put it first in head
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Page Title</title>
...
Browsers must see the charset declaration within the first 1024 bytes of the document. Putting it first in <head> guarantees that.

3. Set Content-Type header on the server

nginx

http {
  charset utf-8;
  charset_types text/html text/css text/plain application/javascript application/json application/xml;
}

Apache

# In httpd.conf or .htaccess
AddDefaultCharset UTF-8
AddCharset UTF-8 .html .css .js .json .xml

PHP

// At the top of any PHP file or in a global bootstrap
header('Content-Type: text/html; charset=UTF-8');

4. Re-save files without BOM

Step 1
Find files with BOM
find . -type f \( -name "*.php" -o -name "*.html" \) -exec grep -l $'^\xef\xbb\xbf' {} +
Lists every file starting with a UTF-8 BOM.
Step 2
Strip BOM in bulk
find . -type f \( -name "*.php" -o -name "*.html" \) -exec sed -i '1s/^\xef\xbb\xbf//' {} +
Removes the BOM from the first line of each matching file. Test on a backup first.
⚠️ Always commit BOM stripping in a dedicated commit so you can revert if something breaks. Some legacy tooling expects BOM and may misbehave when it's removed.

5. Convert MySQL database to utf8mb4

Step 1
Backup first
mysqldump -u user -p database > backup-before-utf8mb4.sql
Step 2
Convert per database, table, column
ALTER DATABASE mydatabase CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- repeat for every table
In WordPress, set DB_CHARSET in wp-config.php to 'utf8mb4' after migration.
Step 3
Fix the connection charset
Tables in utf8mb4 + connection in utf8 = silently truncated data on insert. Set connection charset on connect:
// PHP / MySQLi
$mysqli->set_charset('utf8mb4');

// PHP / PDO
new PDO('mysql:host=localhost;dbname=db;charset=utf8mb4', ...);

// WordPress: handled by wp-config DB_CHARSET

6. Re-validate

Step 1
Re-run the HTML Checker
Encoding-related findings should clear. Spot-check problem characters (em-dash, smart quotes, accented characters, emoji) render correctly across pages.

📐 Re-run the HTML Checker

Verify UTF-8 is consistent across meta, headers, files and database.

Run HTML Checker →
Related Guides: HTML Checker Fixes  ·  Fix DOCTYPE  ·  Fix Missing Lang  ·  HTML Checker Guide
💬 Got a problem?