Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store some additional tracker data in tracking_map #2839

Merged
merged 8 commits into from
Apr 27, 2022
Merged

Conversation

ghostwords
Copy link
Member

@ghostwords ghostwords commented Apr 25, 2022

This adds tracking_map, a new Badger storage area. It is similar to snitch_map, but instead of tracker base domains pointing to arrays of site base domains, tracker base domains point to objects keyed by site base domains with arrays of detected tracking types for values. For example:

"arkoselabs.com": {
  "cheaptickets.com": [
    "canvas"
  ],
  "expedia.ca": [
    "canvas"
  ],
  "expedia.co.uk": [
    "canvas"
  ]
},

For now, just two tracking types are being tracked: "canvas" (fingerprinting) and "pixelcookieshare".

Here is a recent 20K site Chrome run: results.zip. Let's compare this/bigger scans to Disconnect's FingerprintingInvasive domains and Tracker Radar fingerprinters.

Should help with #1527 by at least helping us understand the extent of the problem.

Could be followed up by calling out domains that engage in particularly invasive tracking (such as canvas fingeprinting) in the UI (related to #963).

Replaces https://github.com/EFForg/privacybadger/tree/explain-blocking.

@ghostwords
Copy link
Member Author

ghostwords commented Apr 26, 2022

Also, here is a recent 20K site Chrome run with blocking disabled (TRACKING_THRESHOLD set to 20001): results-no-blocking.zip.

Useful console queries:

Prints the ten most prevalent tracker base domains along with their prevalence (how many site base domains they were seen on):

let sm = badger.storage.snitch_map;
Object.keys(sm._store).map(d => [d, sm._store[d].length]).sort((a, b) => b[1] - a[1]).slice(0, 10).reduce((memo, i) => { memo.push(i[0] + ": " + i[1]); return memo; }, [])

Prints the most prevalent canvas fingerprinters:

let tm = badger.storage.tracking_map;
Object.keys(tm._store).filter(d => Object.keys(tm._store[d]).some(s => tm._store[d][s].includes("canvas"))).map(d => [d, Object.keys(tm._store[d]).length]).sort((a, b) => b[1] - a[1]).slice(0, 25).reduce((memo, i) => { memo.push(i[0] + ": " + i[1]); return memo; }, [])

@ghostwords ghostwords merged commit c2c3776 into master Apr 27, 2022
@ghostwords ghostwords deleted the add-tracking_map branch April 27, 2022 21:12
@ghostwords ghostwords added the heuristic Badger's core learning-what-to-block functionality label Apr 27, 2022
ghostwords added a commit that referenced this pull request May 11, 2022
ghostwords added a commit to EFForg/badger-sett that referenced this pull request May 20, 2022
Ran into this issue with a 100K site no-blocking mode scan
during the merge step.

Following up on Badger Sett's --no-blocking mode and
EFForg/privacybadger#2839
ghostwords added a commit that referenced this pull request Apr 18, 2023
To prefer entries that also exist in snitch_map.

Following up on #2839
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
heuristic Badger's core learning-what-to-block functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant