Technical SEO

What is XML Sitemap?

An XML sitemap is a file that lists all the important URLs on a website in a format search engines can read — helping Google discover, crawl, and understand the site's structure, especially for large sites, new sites, or pages with limited internal linking.

Why It Matters

Google discovers pages by following links. If every page on your site is linked from at least one other page, Google can theoretically find everything. But in practice, some pages have weak internal linking, some are new and not yet linked widely, and some are buried deep in the site hierarchy. An XML sitemap provides a direct route — a complete list of URLs that Google should know about.

For large sites, sitemaps are essential. An ecommerce store with 10,000 products needs Google to know about all of them. A news site publishing 50 articles per day needs Google to discover them quickly. Without a sitemap, Google relies entirely on crawling links, which may miss pages or discover them slowly.

How It Works

XML sitemaps follow the Sitemaps protocol:

URL list — The sitemap contains <url> entries, each with a <loc> tag specifying the full URL. Optionally includes <lastmod> (last modification date), <changefreq> (how often the page changes), and <priority> (relative importance).
Sitemap index — Large sites split URLs across multiple sitemaps (max 50,000 URLs per file) and list them in a sitemap index file. This is the file submitted to Google Search Console.
Auto-generation — Most CMS platforms and static site generators create sitemaps automatically. The sitemap should update when pages are added, removed, or significantly modified.
Submission — The sitemap URL is submitted to Google Search Console and referenced in robots.txt. Google then uses it alongside its normal crawling to discover and prioritise URLs.

Common Mistakes

Including URLs in the sitemap that should not be indexed — 404 pages, redirected URLs, noindex pages, duplicate URLs without proper canonicals. The sitemap should be a curated list of indexable, canonical URLs. Google treats sitemap inclusion as a signal of importance, so polluting it with low-quality URLs undermines that signal.

The other mistake is neglecting <lastmod> dates or setting them all to the current date. Google uses lastmod to prioritise crawling — recently modified pages get crawled sooner. If every page shows today's date, the signal is meaningless. Accurate lastmod dates help Google allocate crawl budget efficiently.

How I Use This

My SEO automation audits sitemap quality — checking for excluded important pages, included noindex pages, broken URLs, and inaccurate lastmod dates. For Astro-built sites like this one, the sitemap generates automatically at build time with only indexable pages included. The advanced SEO audit cross-references the sitemap against the crawl to identify discrepancies.

References & Authority

This term is recognised by established knowledge bases:

Wikidata: Q105762903

Related Services

How BrightIQ uses XML Sitemap

This concept is central to the following services:

SEO Automation → Advanced SEO Audit →

Related Terms

Crawl Budget

Crawl budget is the number of pages a search engine will crawl on your site within a given timeframe — determined by your server's capacity and the perceived value of your content. Managing crawl budget ensures Google spends its limited crawling resources on the pages that matter.

Robots.txt

Robots.txt is a text file at the root of a website that tells search engine crawlers which pages or sections they are allowed or disallowed from crawling — controlling how search engines access and discover content on the site.

Technical SEO

Technical SEO is the foundation layer of search engine optimisation — the crawlability, indexability, site speed, and structural elements that determine whether search engines can find, understand, and rank your pages.

More Technical SEO Terms

301 Redirect Broken Link Canonical Tag Core Web Vitals Crawl Budget Hreflang

← Back to Glossary

SEO Audits

SEO Services

Optimisation Tools

Content Creation Tools

SEO Audits

SEO Services

Optimisation Tools

Content Creation Tools

What is XML Sitemap?

Why It Matters

How It Works

Common Mistakes

How I Use This

References & Authority

Related Services

Related Terms

Crawl Budget

Robots.txt

Technical SEO

More Technical SEO Terms