Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#3129 Sitemaps module #3229

Closed
wants to merge 30 commits into from
Closed

Conversation

ThisNetWorks
Copy link

This is a start towards a sitemaps module, loosely inspired by the IDeliverable.Seo sitemap components from O1.

It provides an extensible sitemap builder which other modules can plug into. Currently have just plugged autoroute in.

It is dependant on a robots.txt file so robots can locate. I'll get that together next week.

Not totally happy with the url generation in the autorouteprovider but was having issues with the UrlHelper failing to generate urls (kept getting index out of range exceptions), so that's something to resolve.

Hope I have placed projects / abstractions in the right places!

@Skrypt
Copy link
Contributor

Skrypt commented Feb 24, 2019

Some ideas : Add the ability to create multiple sitemaps ; which means to be able to select on which sitemap a content type should be indexed.

Some ideas could be taken from this blog post :

https://blog.spotibo.com/sitemap-guide/

  • We could Ping search engines immediately after every content update. We could add a setting to the module to list URL's where we want this to happen.
  • Add images, videos, rss feed, google news
  • Add related other culture URL's (need to be implement first on content items)

@ThisNetWorks
Copy link
Author

Multiple site maps sounds like a good idea - it might not add much to the seo functionality of them, but would improve the manageability of them from the backend. Ultimately the settings should probably provide a sitemap(s) view.

  • Ping search engines sounds good, not sure if we'd want to after each content update, or maybe just from the site settings on each sitemap, but that could be easily configurable.
  • Could possibly provide some tracking of last submitted date for a content item, for a view in the site settings.
  • Images, video, news feels a bit trickier. We don't have a good way at the moment of capturing metadata about media, so might need to look further at that. I found an old issue [Design] Media Services #561 where @sebastienros talks about Attachments as something that might be implemented, and there is another issue which talks about the Aspect system needing another look, so if there is something we could plug into that it would make more sense to collect metadata in one place and make this more achievable. Any thoughts there would be appreciated.
  • Culture URL's, absolutely, but would be pending Content Item implementation

@Jetski5822
Copy link
Member

Are you allowing the ability to mark different content types with different update frequencies?

@Skrypt
Copy link
Contributor

Skrypt commented Mar 1, 2019

That's why I requested to have multiple sitemaps files. Separate content that updates more often from content that pratically never changes. Also, we are capped to 50k records per sitemap files.

@deanmarcussen
Copy link
Member

Firstly, apologies for all the ghost comments on this pull request - had to rearrange my github account.

I've developed a working interface based on the AdminMenu module to support multiple sitemaps
code is on this branch (not merged as sitemap generation does not plug in yet) if anyone wants to take a look at
https://github.com/ThisNetWorks/OrchardCore/tree/sitemaps-multiple

Here's some screenshots
sitemap-set

sitemap-nodes

blog-sitemap

You can create sets of sitemaps which can contain indexes (only at the root level), or content, and select content items to go into each xml file.

Think I have the possible options mapped out, you can specify a base path for the sitemap set, or leave them all hanging of the root, and each sitemap, or index etc, can have a path based on the root, which allows the site to be split if required, and for each sitemap you can specify priority and change frequency. These can be extended to provide a media sitemap etc.

I need to figure out how best to do the routing next and could do with some help.

The sitemaps need to be able to participant in OC routing on varied routes such as
http://www.testoc.com/a-sitemap.xml
http://www.testoc.com/a-sitemap-index.xml
http://www.testoc.com/blogs/a-blog-sitemap.xml
or when count of items reaches over 50,000 containing (auto generated) links to
http://www.testoc.com/blogs/a-blog-sitemap0.xml
http://www.testoc.com/blogs/a-blog-sitemap1.xml

The options I've come up with so far are

  • Plug into Autoroute

This does seem the more sensible route. This would mean turning the sitemap sets and sitemaps into ContentItems and attaching an AutoRoutePart to them - something similar to the Menu ContentItem. The dificulty would be providing the parameter route - the site might have to count all the content items on startup to provide any extra fixed AutorouteEntries to the AutorouteRoute. Could drop the over 50,000 feature, or make it less automatic (a start stop index setting on the node)?

  • Implement a SitemapRoute : IRouter

This doesn't look dificult, but I'm unsure of the implications of a second IRouter instance.

Any help determining the best approach appreciated.

@sebastienros
Copy link
Member

@deanmarcussen that looks really good!

About the routes I think you can go with the custom router, though I don't see why you couldn't use the default route that would have a form like /sitemaps/{name}.xml and bind it to a custom action for sitemaps.

@deanmarcussen
Copy link
Member

thanks @sebastienros all credit to the guys that wrote the AdminMenus!

Routing not really my strongpoint but will have a crack at the custom router.
I think the problem with a default route is that /sitemaps/a-sitemap.xml tells the crawler that it only has access to content under /sitemaps/
I've always done sitemaps of the root of the site, but apparently it's valid to split them into areas as well

@sebastienros
Copy link
Member

oh I see, you need to be able to react to a sitemap.xml kind of url, but you can also do it with a simple middleware that would analyze if the url matches. No need for a custom route in this case.

@deanmarcussen
Copy link
Member

ah, ideal, I'll have a look at the middleware option, sounds much better. cheers

@agriffard agriffard mentioned this pull request Mar 21, 2019
…ng sitemap size. Included contained parts in ContentTypesSitemapNodeDriver. Not totally happy with that solution. Seems to restrictive
…s can't be contained, but sitemaps can, but can't contain childnodes)
# Conflicts:
#	src/OrchardCore.Modules/OrchardCore.Contents/Startup.cs
@deanmarcussen
Copy link
Member

This is coming along, so putting it up for some feedback

  • Sitemap Indexes and Content Type sitemaps are now supported. Have left room there to extend to support Image Sitemaps, and languages - not totally happy with the arrangement, but it's hard to say the best approach until languages are there.

  • Am still having to rely on detection of an AutoroutePart to determine whether the content should be included in the sitemap. This seems like an unnecessary dependance on Autoroute, and would cause problems if a dev used a different routing technique, but without it Content Items like the footer run the risk of appearing in the sitemap.

  • Same issue occurs when deciding which Content Types should be selectable in the settings. Wonder if a IsRoutable setting would be useful on Content Type Definition?

  • Sitemap part is now optional

  • Think I've got tenant routing / tenant sitemap support working using a RouteConstraint.

  • Not sure the home route scenario logic is good.

  • Have not implemented Webhooks to ping Google/Bing, but quite like the idea. Got started on this and accidently over engineered it to be more of a Webhook module that other modules could interact with and plugin into. Will open another issue to discuss.

@agriffard
Copy link
Member

Can you please fix the conflicts on the solution since new modules have been added?

# Conflicts:
#	OrchardCore.sln
@Jetski5822
Copy link
Member

Was looking for something like this.... How much is left to be done?

@deanmarcussen
Copy link
Member

Not much I hope

Probably me to demo it, and clean up the code after.

From memory the biggest issue is in stopping content like layer widgets and menu's turning up the sitemap. Which it does ok at the moment, but I'd like to do it a bit cleaner - see #3848

@deanmarcussen
Copy link
Member

Oh, and supporting localised sitemaps, as all the content localisation came in after I wrote this

@deanmarcussen
Copy link
Member

Moved to #4450

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants