Add a system to manage specific rules for some URL (spring configurable) #182
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello everyone,
As we are discuting about the future of the OpenWayback project, the BnF would like to present you a small feature that could maybe interest some of you and could be add to the openwayback project.
This feature has been developed by Nicolas Giraud, the previous software engineer of the BnF web archive team. This feature provides a system to manage specific canonicalization rules on URL. For instance, we have a news website which have session id in URLs and we want to remove it during the search and replay process.
The URLs looks like :
http://www.ouestfrance-enligne.com/scripts/consult/pdf/PDF_frame.asp?in_ses_id=2047240875806&pdf=S2192220&date=09/09/2014&art_id=68159490&zoom=125,11,78
We define the rules in Spring :
In this example, we want to remove the in_ses_id attribut, so we use a UriStripper processor with the corresponding pattern.
Then we also want to transcode an another attribute that cause trouble, from ISO-8859-1 to UTF-8 so we use the UriTranscoder with the right pattern.
In BnfUrlCanonicalizer, we override urlStringToKey method from AggressiveUrlCanonicalizer, adding :
which will transform the URL in case of match.
We could give to the OpenWayback project all the classes needed to run this functionality.