-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-evaluate CleanLinks default whitelisted domains and patterns #20
Comments
Most login domains seem to still work the same (at least google, ebay, maybe more), the other ones we should re-evaluate them indeed. Let us contribute to this thread all domains/regexes that are still valid or not valid anymore. Rulesets sound good in theory, but I'm not sure I can commit to updating the list regularly. Plus you still have to allow all suers to whitelist to their own taste. So let's just adjust defaults for now, I'll think about it. About removing tracking elements from links, this is not really the core functionality of CleanLinks, we rather remove redirections. This does cause some problems, e.g. if we whitelist a website we also stop cleaning the tracking elements in the link, even though we might just want to see a redirect page without being tracked. Anyhow, if this does not work please open another issue about it. |
but in practice – isn't tracking id usually removed as the link is prevented from redirection? |
We remove tracking ids of every link we examine, whether we remove a redirection or not. However when a link is whitelisted, by pattern or domain, we leave it as is, including tracking ids. ClearURLs seems to do a more in-depth job at removing tracking ids than we do, as they have a bunch of (domain-specific) rules and we just have a regex matching parameters. So the addons are in essence complementary even if there is some slight overlaps: they block some domains like uMatrix, we strip some ids like they do, etc. |
That's what gets me. I wish I could simply rely on one extension to handle URL stuff. I didn't even mention the slew of other complementary or duplicitous extensions all trying to do similar things. It's to the point where I really don't understand the differences amongst them. |
HTTPS Everywhere ...and a bunch of extensions to convert text links to clickable links. I'm tired of listing them. and then a bunch of about:config preferences: I just wish I had the least amount of extensions to do at least as much that makes sense. |
I'm not putting this on you, @Cimbali . It just seems like it's confusing that so many people have tried to achieve something with handling links and URLs. It would be nice if so many ideas were put into one solution. For example, I can tell if a referrer is the same as stripping utm parameters off of a url. I just don't understand the differences amongst all of these things. It's like, if there was one extension that acted as the engine to do a bunch of these things and then the community to figure out how to build the rule set. I guess I hoped to have this talk with @diegocr one day. |
Tbh I just resurrected an add-on I used to use to remove redirects, handling the rest with https everywhere & uMatrix. I mean you do have a point but there are 2 ways around it:
I think there's not a lot of extensions that remove redirects as we do, so we seem to be gearing more towards the second option here, even though you're right that it's not a great way to build a strong community that can contribute rules etc. |
Here's a quick oversight of what the add-ons you cite do:
I think Neat URL, Lean URL and Pure URL are forks of each other and mostly share the same codebase. They most probably identify target URLs in the same way we do. |
Thank you so much for that breakdown. You threw cleanlinks into the comparison but then you didn't put that it removes tracking parameters?(ex: utm_) I thought it does- well, at least there's a rule for it, right? If so, I wonder if I can go through some of those similar add-ons and try figuring out a more thorough regex pattern. |
I just misaligned some columns. Fixed now :) I'm wondering how many sites really have a <link rel="canonical"> tag, because that's definitely something we can leverage to auto-detect useless parameters. Any differences between the URL and the canonical link can be recorded, and stripped on the next visits. |
I know Amazon uses them. I've tried that add-on in the past and it was changing links. I actually use smile.amazon.com so I can donate to the Electronic Frontier Foundation but when I clicked on product pages it was redirecting me to its regular amazon.com url. |
In 5d71d2a I've separated the parameter parsing from the redirect cleaning, so we should be able to improve on that from there, e.g. adding per-domain lists of parameters to strip. |
As it stands I’ve imported a number of rules in the new defaults ruleset, from our previous default rules and from ClearUrls The following domains were whitelisted fully and are now no more:
The other domains in the old paths = Again, in most cases another rule is probably overriding these, if they are necessary. Finally, exceptions in ClearUrl apply to the whole rule, whereas we whitelist the individual query parameters. Therefore these following exceptions have not been integrated, and I’m posting them here for reference:
|
Skip Links Matching with
\/ServiceLogin|imgres\?|searchbyimage\?|watch%3Fv|auth\?client_id|signup|bing\.com\/widget|oauth|openid\.ns|\.mcstatic\.com|sVidLoc|[Ll]ogout|submit\?url=|magnet:|google\.com\/recaptcha\/
Remove from Links
(?:ref|aff)\\w*|utm_\\w+|(?:merchant|programme|media)ID
I'm not getting this regex to even work
I also like ClearURLs implementation of a mechanism to update the ruleset from github. And reviewing theirs we're clearly missing a ton.
Skip Domains
accounts.google.com,docs.google.com,translate.google.com,login.live.com,plus.google.com,twitter.com,static.ak.facebook.com,www.linkedin.com,www.virustotal.com,account.live.com,admin.brightcove.com,www.mywot.com,webcache.googleusercontent.com,web.archive.org,accounts.youtube.com,signin.ebay.com
Looking at Diego's commits these rules are all at least 4 years old. Sites change. I'd rather start with a clean slate and see what is applicable today.
The text was updated successfully, but these errors were encountered: