Sitemaps

From i.STAR Help

What are Sitemaps?

SiteMaps.org

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.

Detailed information on XML Sitemap protocols can be found here: http://www.sitemaps.org/protocol.php

Where is my Sitemap?

Current CAM build-outs provide an XML sitemap per department, with an XML sitemap index pointing to them, all located in the site root directory. These sitemaps are dynamically updated based on your content. The index and sitemaps are setup during the final build-out are typically accessible on your server at:

http://www.yourwebsite.com/sitemapindex.asp

http://www.yourwebsite.com/sitemap.asp?dept=01

http://www.yourwebsite.com/sitemap.asp?dept=02

...etc. Where yourwebsite = the domain name for your site and 01,02, etc = your department codes.

If you follow the links in the sitemap index you will see they point to those individual department sitemaps, so you should only need to submit the index itself to Search Engines.

Here is a brief article that introduces the sitemap index concept, for further reading on this topic: http://spin.atomicobject.com/2011/03/23/making-your-site-visible-to-search-engines-with-sitemaps/

To help Search Engines locate your sitemap index, current CAM build-outs also specify its location in your site's Robots.txt file as follows:

     Sitemap: http://www.yourwebsite.com/sitemapindex.asp

Additional Resources

These and more at sitemaps.org
  • Q: Where do I place my Sitemap?

It is strongly recommended that you place your Sitemap at the root directory of your HTML server; that is, place it at http://example.com/sitemap.xml. In some situations, you may want to produce different Sitemaps for different paths on your site — e.g., if security permissions in your organization compartmentalize write access to different directories.

We assume that if you have the permission to upload example.com/path/sitemap.xml, you also have permission to report metadata under example.com/path/.

All URLs listed in the Sitemap must reside on the same host as the Sitemap. For instance, if the Sitemap is located at www.example.com/sitemap.xml, it can't include URLs from subdomain.example.com. If the Sitemap is located at www.example.com/myfolder/sitemap.xml, it can't include URLs from www.example.com.

  • Q: How big can my Sitemap be?

Sitemaps should be no larger than 10MB (10,485,760 bytes) and can contain a maximum of 50,000 URLs. These limits help to ensure that your web server does not get bogged down serving very large files. This means that if your site contains more than 50,000 URLs or your Sitemap is bigger than 10MB, you must create multiple Sitemap files and use a Sitemap index file. You should use a Sitemap index file even if you have a small site but plan on growing beyond 50,000 URLs or a file size of 10MB. A Sitemap index file can include up to 1,000 Sitemaps and must not exceed 10MB (10,485,760 bytes). You can also use gzip to compress your Sitemaps.

  • Q: Do URLs in the Sitemap need to be completely specified?

Yes. You need to include the protocol (for instance, http) in your URL. You also need to include a trailing slash in your URL if your web server requires one. For example, http://www.example.com/ is a valid URL for a Sitemap, whereas www.example.com is not.

  • Q: My site has both "http" and "https" versions of URLs. Do I need to list both?

No. Please list only one version of a URL in your Sitemaps. Including multiple versions of URLs may result in incomplete crawling of your site.

  • Q: URLs on my site have session IDs in them. Do I need to remove them?

Yes. Including session IDs in URLs may result in incomplete and redundant crawling of your site.

  • Q: Does position of a URL in a Sitemap influence its use?

No. The position of a URL in the Sitemap is not likely to impact how it is used or regarded by search engines.

  • Q: Some of the pages on my site use frames. Should I include the frameset URLs or the URLs of the frame contents?

Please include both URLs.

  • Q: Is there an XML schema that I can validate my XML Sitemap against?

Yes. An XML schema is available for Sitemap files at http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd, and a schema for Sitemap index files is available at http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd. You can also read more about validating your Sitemap.


Back to SEO Info Page