Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify cookie syncing as third-party tracking #2088

Closed
bcyphers opened this issue Jul 8, 2018 · 5 comments · Fixed by #2147
Closed

Identify cookie syncing as third-party tracking #2088

bcyphers opened this issue Jul 8, 2018 · 5 comments · Fixed by #2147
Labels
enhancement heuristic Badger's core learning-what-to-block functionality

Comments

@bcyphers
Copy link
Contributor

bcyphers commented Jul 8, 2018

Cookie syncing is the practice of "syncing" cookies attributed to one domain with another domain. For example, a.com could set a cookie like id=12345, then make a request like https://b.com/sync?id=12345. This allows a.com and b.com to sync their unique ID for the user and to merge their records of the user's browsing habits.

This practice was found to be extremely common in the tracking survey here: 460 of the top 1,000 most common third parties synced cookies with at least one other party. See section 5.6 of the paper for a summary of their findings and appendix 13.3 for a description of how they detected cookie syncing. We should build and test a heuristic based on their method that lets us count cookie syncing with third parties as a tracking action.

I think this is mostly used to sync cookies from different third party trackers with one another, and so some of its harm should be mitigated by Privacy Badger already. Still, here's a hypothetical scenario that PB doesn't prevent right now:

Site tracker.com is used by sites a.com and b.com. Those sites each set their own first-party ID cookies. Alice visits a.com and her browser makes a request https://tracker.com/sync?site=a.com&id=123. Alice then visits b.com with the same IP address, and her browser pings https://tracker.com/sync?site=b.com&id=456. Now, tracker.com knows that user 123 on site a.com is user 456 on b.com, and can link their activity on those sites even after the user's IP address changes, and even if we block cookies for tracker.com.

Related: #367, #794, #1808

@bcyphers bcyphers added enhancement heuristic Badger's core learning-what-to-block functionality labels Jul 8, 2018
@ghostwords
Copy link
Member

Would the above example scenario be covered if we detected and blocked pixel tracking (#794)?

I think this is mostly used to sync cookies from different third party trackers with one another

Right, my concern is that this will involve significant bookkeeping while overlapping (entirely?) with existing or planned detection approaches that should be easier to implement.

@bcyphers
Copy link
Contributor Author

bcyphers commented Jul 9, 2018

Agreed, this overlaps a lot with #794. Maybe as a strict subset.

I think this is worth pursuing because it's an easier heuristic to write and test than a general "tracking pixel" heuristic, and it will hopefully catch a lot of the harmful uses of tracking pixels. Ideally, the code for this issue would eventually be incorporated into a more general tracking pixel detection function.

@ghostwords
Copy link
Member

Another cookie matching paper: https://lukaszolejnik.com/rtbdesc

@ghostwords
Copy link
Member

ghostwords commented Jul 3, 2019

Related study: Tracking the Pixels: Detecting Unknown Web Trackers via Analysing Invisible Pixels.

We've implemented detection of "first to third party cookie syncing" (section 4.2.3 in the paper).

@ghostwords
Copy link
Member

Removed from production builds (still used in pre-training) due to #2548.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement heuristic Badger's core learning-what-to-block functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants