What is a sitemap?
A sitemap is a list (or lists) of links that represent part of a website or the totality of a website. Sitemaps can contain more information about the content like creation date, last update, importance, run-time, content rating, etc. For optimal results, a sitemap should update dynamically when new content is added, but there may be instances where a static sitemap is all you can manage.
A sitemap is not necessary for proper SEO, but is highly recommended and usually easy to implement. If you have a small site, and the pages are linked together well, you might not need a sitemap. It becomes more important for larger sites that publish fresh content often.
A sitemap is a good way to let search engines know about all the pages on your website, but it should not be depended on as the sole way to discover pages on the site. Creating a strong site structure with relevant contextual linking is still very important to a search engine’s understanding of a site.
We will review a couple of different types of sitemaps: XML sitemap and HTML sitemap.
What is an XML sitemap?
An XML sitemap is a list of links in a standard markup language that Google prefers. This language provides additional metadata and context to a list of items. You can reference this page on the XML sitemap standard.
You can submit your XML sitemap to Google directly in Google Search Console, or by linking to your sitemap in your robots.txt file. For very large sites, you may need an XML sitemap index page that lists multiple XML sitemaps. This can simply be split into manageable sizes or organized by type of content.
Google offers guidelines on creating special XML sitemaps for Video, Images and News. By creating one of these special sitemaps, you can be included in a more rich search experience from Google. This article will focus on sitemaps for website pages only.
What are the guidelines for XML sitemaps?
Google shares best practices for XML sitemaps. A quick summary of the main things to be concerned with:
- Use fully qualified, absolute URLs that are status 200.
- Don’t submit relative URLs. Include the full URL, starting with https:// or http:/, including subdomain (if used) and the domain.
- Use canonical URLs, don’t submit URLs that create duplicate pages.
- Post the sitemap at the site root so that it will affect all files on the domain. It can be posted in a folder, but only affects files in that folder.
- Sitemap files must be UTF-8 encoded
- Maximum sitemap size is 50,000 URLs/50MB uncompressed. Use a sitemap index file to list individual sitemaps and submit this single file to Google.
- If you have different URLs for mobile and desktop versions of a page, Google recommends pointing to only one version in a sitemap.
- Use only ASCII characters
- Google ignores priority and changefreq values, so don’t bother adding them.
- Google reads the lastmod value, but if you misrepresent this value, they will stop reading it.
What is an HTML sitemap?
An HTML sitemap is a list of links representing the pages on your website in HTML format. You might create a simplistic HTML sitemap that just outlines the main sections of your website to give visitors a nice overview of what you have available.
A more powerful HTML sitemap is recommended to cover every page on your website, and if the linking is setup in an optimal way, it can help create a flat site structure that distributes PageRank more evenly to all pages.
Why are sitemaps important?
XML sitemaps can help search engines like Google discover new pages on your site or pages that are not linked very well on your site. It is usually easy to implement, and can be a good catch-all. It’s a basic SEO recommendation, but don’t expect it to boost search rankings or traffic.
An HTML sitemap can be a powerful tool, especially if the website is large (over 2000 pages) and the linking architecture is only set up around topic or listing pages with pagination. The HTML sitemap can improve site structure and link value distribution on a very large site.
If your website is only a few pages, and you don’t publish many posts or articles, an HTML sitemap isn’t likely to help very much.
Crawl depth tells you about your site structure
To understand how a sitemap can help a website, you really need to know more about crawl depth. Crawl depth is a measurement of how far away a page is from the homepage. In other words, how many levels (or clicks) away from the homepage is another page? Crawl depth starts at 0 for the homepage, and every page linked from that page is depth of 1, every page linked from those pages is depth of 2 and so on. Crawl depth doesn’t technically have to start with the homepage, but could be any page where you started crawling links on a website. Crawl depth for a page is always measured at the lowest level that a link to that page is found on, so additional links found at higher (worse) depths do not affect crawl depth.
This metric is a good way to see if you have structural issues. You can use a web crawler, like Screaming Frog, to crawl a website and check crawl depths for pages. If you find important pages at very deep crawl levels then you should add more linking structures to support those pages.
Pages that are not linked to often will appear unimportant to a search engine. It may not even prioritize crawling a page and including it in the search index if the page has low link value.
Ideal site structure
The pages that you want to rank well in organic search results (usually your content pages) should be found at a very low crawl depth, 2-3 is great. 4-5 is OK, and 6 or more is usually going to be an issue. Efforts should be made to create as shallow a site architecture as possible.
When a site gets large enough to have listing pages paginated in the hundreds, you become dependent on some type of next/previous navigation on those page sets. Link value is exponentially degrading at each level, and by level 6 or higher the pages are receiving such low internal link value it can affect their search ranking performance. If your entire site architecture depends on long, paginated topic/category pages your older content is probably suffering because of it.
Building an optimal HTML sitemap
When building an HTML sitemap, make sure it is not setup in a long, paginated sequence. The goal will be to create two basic pagetypes: a sitemap index page, and a sitemap page. The sitemap index will contain links to sitemap pages which contain links to content pages (or other important pages). The sitemap page would be a list of links to content pages.
How many links should you list on a page? Google used to recommend no more than 100 on a page, but has since expanded that guidance saying there is no limit, but it should be a “reasonable amount.” I recommend 100-500 links on a page, leaning toward the lower end if possible.
For example, you might have 10,000 articles on a website. You would create an index page for your HTML sitemap, using a URL like this:
This 1 page would contain all the links to individual sitemap pages. With 10,000 articles, that means you will have 100 individual sitemap pages with 100 links on each one (100×100=10,000). It’s useful to break this up by content type or some other descriptive filter. If you did it by content type, it might look like this:
The above example page contains a simple list of links like this:
News page 1
News page 2
News page 3
News page 4
Videos page 1
Videos page 2
Podcasts page 1
Podcasts page 2
Podcasts page 3
The trick here is listing every sitemap page on the initial sitemap index page. This will create a “flat” site structure. In the above example, it would ensure that every content page is only 3 levels away from the homepage. A flat site architecture will distribute link value equally among all content pages. An HTML sitemap index page can be very large if the site is very large. Here is a good example of an HTML sitemap.
Depth 0: Homepage (links to /sitemap in footer)
Depth 1: /sitemap
Depth 2: /sitemap/news1
Depth 2: /sitemap/news2
Depth 3: /20349423/this-is-your-news-article-page
Compare this to a setup on a site that doesn’t have an HTML sitemap and instead relies on their category pages to link to older content. Let’s assume that pagination links 1-10 are shown, on page 1 and each page has 20 links to content pages. If the category page has 4,000 content items, that’s 200 pages for this topic. The depth looks like this:
Depth 0 homepage (links to all topic pages)
Depth 1 parent topic (links to child topics)
Depth 2 child topic pages 1-10
Depth 3 200 content pages linked from page 1-10
Depth 4 Child topic pages 11-20
Depth 5 200 content pages linked from page 11-20
Depth 6 Child topic pages 21-30
Depth 7 200 content pages linked from page 21-30
Very quickly, the depth grows high and link value to older content pages shrinks to almost nothing. There are different types of creative pagination strategies, but they all have their drawbacks in page value distribution. A more effective way to let page value flow evenly through a site is by building an HTML sitemap.
It is recommend creating an HTML sitemap with some value to people, like organizing around content types or helping users get a better idea of everything the site covers. You can choose to build them with or without a template wrapper, it can be as simple as a list of links with no style. This can be a quick, low effort strategy to flatten a large site architecture without building a more complex set of linking structures.
You might want to look into other ways to slice and organize your content, by author, by date, by alpha, by topic, by tag, by location, by industry, etc. There are many ways to build more linking structures to bolster a strong site structure. Don’t forget about the classic contextual link, within your paragraphs, these are the strongest internal link signal of all because there is context (words) around the link.
Create a sitemap with an XML sitemap generator
If you need some help getting started, a sitemap generator might be useful. There are many tools out there that can create small or large XML sitemaps and even some that will update dynamically. If you are small and not using wordpress, don’t have many resources, there are ways to build a static XML sitemap, like Screaming Frog Web Crawler or Google has a page of recommendations for XML sitemap generators.
If you have a website development team, they might want to build a system to update the XML sitemap daily or even hourly, depending how often you publish new pages.
WordPress and XML sitemap plugins
There are plenty of free plugin options for wordpress. Google sitemap generator is free and highly rated. The Yoast plugin is very popular and has many helpful SEO features, including dynamic XML sitemap generation. The free version of it will generate a dynamic XML sitemap.