skills/seo-audit/references/international-seo.md
Detailed evidence backing the International SEO & Localization section of the SEO Audit skill. Organized by topic with source URLs and key quotes.
Google supports three equivalent methods: HTML <link> in <head>, HTTP Link headers, and XML sitemap <xhtml:link> elements. Google confirmed no method is prioritized over another.
Google combines signals from both HTML and sitemaps. If the same language-region pair points to different URLs across methods, Google drops that pair rather than guessing.
Google's docs: "If page X links to page Y, page Y must link back to page X. If not, those annotations may be ignored or not interpreted correctly."
Every page must include itself (self-referencing) in the hreflang set. Missing self-referencing is the #1 error found by Semrush audits. A study of 374,756 domains found 67% of hreflang implementations had issues.
Introduced April 2013. Designates the fallback page for users whose language/region matches no declared variant. Can point to the same URL as one of the language-specific alternates. Must be included in the complete set of annotations on every variant page.
Language: ISO 639-1 (2-letter). Region: ISO 3166-1 Alpha 2 (2-letter). Format: language[-script][-region].
You cannot specify a region code alone. Common mistakes: en-UK (should be en-GB), es-419 (not ISO 3166-1). A study found 8.9% of sites using hreflang contain invalid language codes.
With 20 locales, HTML <head> hreflang adds ~1.5KB per page for zero user benefit. Sitemap-based hreflang has zero runtime performance impact. <xhtml:link> child elements do NOT count toward the 50,000 URL sitemap limit (only <loc> elements count).
John Mueller recommends focusing hreflang on pages receiving wrong-language traffic, not every page: "I wouldn't do it for any of the other pages of the site because it's so complex & hard to manage."
Bing treats hreflang as a "weak signal." Bing relies on content-language meta tag, HTML lang attribute, ccTLDs, and server location. Yandex supports hreflang like Google.
For both engines: implement hreflang (Google/Yandex) + <html lang="..."> + <meta http-equiv="content-language"> (Bing).
Each locale page must canonical to itself. John Mueller: "Don't use a rel=canonical across languages/countries, only use it on a per-country/language basis."
Google's docs: "Specify a canonical page in the same language, or the best possible substitute language if a canonical doesn't exist for the same language."
Mueller: "If your canonical is pointing somewhere else, Google will follow that and ignore your hreflang annotation." The canonical URL must be one of the URLs in the hreflang set, or all hreflang markup is ignored.
Google also states: "Google prefers URLs that are part of hreflang clusters for canonicalization" -- when signals align, hreflang strengthens canonical selection.
Mueller (2023 Office Hours): "If the content is completely the same, and we can't tell any difference, then for simplicity and user experience we may just show one version -- even if hreflang is present."
Google's duplicate detection runs BEFORE hreflang evaluation. To keep both versions indexed, you need substantive content differences beyond currency symbols.
Google: "Don't use the first page of a paginated sequence as the canonical page. Instead, give each page its own canonical URL." Each paginated page in each locale gets self-referencing canonical. rel="next/prev" deprecated March 2019.
Each <url> entry includes <xhtml:link> alternates for every locale. Requires xmlns:xhtml="http://www.w3.org/1999/xhtml" namespace.
Split sitemaps by content type, not by locale. Splitting by locale creates maintenance problems because every locale sitemap must reference every other locale (reciprocal requirement).
50,000 URLs / 50MB uncompressed per sitemap. Only <loc> elements count toward the 50K limit. But with 20 hreflang alternates per entry, the 50MB file size limit becomes the bottleneck. Plan for 2,000-5,000 URLs per sitemap when using full hreflang.
Submit the sitemap index in Search Console AND reference it in robots.txt. Individual child sitemaps can be submitted separately for per-sitemap reporting.
Next.js alternates.languages does NOT automatically include a self-referencing <xhtml:link> for the <loc> URL. You must explicitly include the <loc> URL's own language in the languages object.
Google treats subdirectories and subdomains equivalently. Mueller: "From our point of view...they say subdomains and subdirectories are essentially equivalent."
URL parameters (?lang=en) are explicitly "Not recommended" per Google docs.
Mueller recommends: set / as x-default, put each language in its own prefix. Without marking / as x-default, "to Google it can look like '/' is a separate page from the others."
Google strongly advises against locale-adaptive pages. Googlebot crawls from US IPs and does not send Accept-Language headers. Separate URLs + hreflang are required.
Mueller: trailing slash is "a significant part of the URL and will change the URL if it's there or not." Pick one format for all locale paths, internal links, canonicals, hreflang, and sitemaps.
Mueller (2025): "Consistency is the biggest technical SEO factor."
The International Targeting report is deprecated. Google now relies entirely on hreflang, content language analysis, and linking patterns. You can add subdirectory properties for per-locale reporting.
Use localePrefix: 'always' (next-intl) or equivalent. Never hide locale from URLs -- Google needs unique URLs per language. Using 'never' mode disables alternate links entirely.
Google removed longstanding guidance advising against auto-translated content in mid-2025. Current stance: "Our policies do not strictly define content that has been translated by AI as spam." The scaled content abuse policy mentions translation as a possible vector, but does not ban it.
Reddit scaled AI translations to 35+ languages with Google's knowledge. The key distinction is intent and quality, not the method.
Google: "Localized versions of a page are only considered duplicates if the main content of the page remains untranslated." Pages with only translated boilerplate get clustered as duplicates.
Do NOT use noindex for unwanted locale pages (wastes crawl budget). Do NOT canonical cross-locale (conflicts with hreflang). Best approach: don't create locale pages you can't make genuinely helpful.
Merged into core ranking March 2024. Site-wide signal: "any content -- not just unhelpful content -- on sites determined to have relatively high amounts of unhelpful content overall is less likely to perform well in Search."
Low-quality translated pages can drag down the entire site. This is the strongest argument against creating locale pages that aren't genuinely helpful.
Google: "Translating only the boilerplate text of your pages while keeping the bulk of your content in a single language...can create a bad user experience." Google uses visible content (not lang attribute) to determine page language.
Translate ALL content on a page if you create a locale version. Untranslated metadata (title, description) in the wrong language reduces CTR.
Only a concern for 1M+ pages or 10K+ pages changing daily. But alternate URLs (hreflang targets) do consume crawl budget. Broken hreflang links waste budget AND invalidate signals.
Google identifies audience via: "local addresses and phone numbers on the pages, the use of local language and currency, links from other local sites, or signals from your Business Profile."