Skip to content
This repository has been archived by the owner on Dec 17, 2019. It is now read-only.

Generate sitemaps for published projects so they can be indexed by search engines #168

Open
jbuck opened this issue Sep 23, 2015 · 0 comments
Labels

Comments

@jbuck
Copy link
Member

jbuck commented Sep 23, 2015

In MozillaFoundation/mofo-devops#191 Adam confirmed that we'll need to generate a sitemap so that search engines know what pages to crawl. Here's my suggestion for how to implement this feature:

  • Create a new app within publish. When you run the app it will fetch the list of files from the publish database ending in .html, convert that list into a sitemap, upload the sitemap to S3, and then quit.
  • Make sure the sitemap follows the limits set out in the protocol. In particular:
    • Make sure it's valid XML, with all of the fun encoding issues that it comes with
    • Has no more than 50,000 URLs within a single sitemap (you can include additional sitemaps to increase the total number of URLs you're indexing)
    • Is no larger than 10MB (10,485,760 bytes) uncompressed. You can use gzip to compress the sitemap, but the 10MB limit applies to the uncompressed size
  • Add the new app to the Procfile. We can then schedule it to run whenever we'd like by using https://devcenter.heroku.com/articles/scheduler
  • Submit the sitemap to various search engines. Maybe also add a robots.txt?
@sedge sedge changed the title Implement sitemap for search engine indexing Generate sitemaps for published projects so they can be indexed by search engines Sep 24, 2015
@gideonthomas gideonthomas added the P2 label Jan 8, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants