-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Privacy Review: handle start_url tracking #399
Comments
/CC @PaulKinlan |
I think you want to avoid showing the URL to users by default. I would assume that it might frighten some and most don't know what to do with it. What you could do is that if a URL contains ?, you might show a small information text and an edit button for super users. But I wouldn't show that by default if the url doesn't even contain '?' |
Correct me if I'm mistaken. But is throwing the problem on users the recommended solution to the security/privacy issues of here ? :) |
We will look at mitigation strategies on the Firefox side and make some suggestions: |
It's possible the recommendation could be that UAs strip any query string (or fragment identifier) from the URL when launching, but there are likely legit, non-privacy-invasive uses for these as well (e.g., language preference). I guess my question would be whether this particular potential abuse vector—a dynamic To be clear, I'm not dismissing this as a concern; it very well may be a big privacy hole. I would just like to know if the privacy concerns we identify would be unique to this particular case. |
The meaningful difference with existing vectors is that they can all be explicitly cleared by the user (e.g. by clearing cookies and site data in Chrome and equivalents in other browsers). A query parameter on the For instance, one way to solve this is to clear query parameters from |
Can't a site embed a tracking ID in the path as easily as the query parameters? |
I made a study and indeed most use of parameters are legit:
The points I raise are mostly: there is no way to manage these identifiers, the use of them is not transparent, and they allow respawning others (i.e. if user removes cookies, they can be brought to life). |
Yep, you can absolutely generate a per-user page in the start_url. That would be functionally equivalent (so stripping parameters is not a 100% solution), but I did not use this particular thing. |
@lknik I believe we have data from our Bing crawler around manifest usage (over a million). Let me ask if we can do a little deeper digging as well. |
Would be interesting. But do you have data on the actual start_url's used? If so, would be happy to get see how it looks at this scale. |
I believe we have the full manifests. I need to verify though. It may be a week or so before I get word. |
@aarongustafson well a package with loads of manifests and their url's would be a nice present ;) |
To build on @dominickng's suggestion, I think one option is to explicitly consider the Alternatively, we could tell sites that they shouldn't use manifest data that is customized to the user in any way, and start work on the challenging problem of automatically identifying sites that are customizing See this guidance on identifying local state mechanisms so that they can be cleared: https://www.w3.org/TR/fingerprinting-guidance/#clearing-all-local-state |
Depends on a few things:
I agree - but this is different from a malicious supercookie (e.g., HTST). There is explicit opt-in to install a web application, and it includes the possibility to inspect the URL. Granted, examining the URL is useless for 99% of people. The mitigation strategy is really to just delete the shortcut to the PWA.
Generally yes, I agree - and the data purge should be supported... however, going back to the supercookie attack, I don't see how it helps when the start URL is: "http://example.com?user=123"... you can just restore user123's cookies/state from the server when they open the app.
We can amend:
Sure. |
Well, even if the user has moved the shortcut, if the user initiates clearing local state when they are active in that browser-run app, then there would need to be some control over it, right? I could see that that might involve coordination between the browser and OS. (Is uninstall functionality included in the spec?)
I don't think that's what-about-ism at all; that seems like a totally plausible use case. I personally would like to be able to log in to my email, then click the 'make an app' button (which stores a bearer token or something) and then be logged in whenever I click on my new 'app'. The challenge, in that situation, is either to indicate to the user that clearing local state isn't possible or, when the user does choose to clear state, to get them back to the site with state cleared in such a way that they have to choose to re-create the state themselves (by logging in again, say) before 'installing'.
Right, that's exactly the attack that we're talking about. If you re-load from the same
In that case, we'd be saying that the feature doesn't support a bookmark/manifest-install from |
Devil's advocate here. Let's assume the user is an avid PWA browser and has, like, 50-100 of these. Then he/she choose in the browser "clear all private data". Would that mean removing 50-100 apps, and require reinstalling/logging in, possibly reconfiguring? That would make the today's experience of clearing data significantly degraded.
Thanks for the lengthy reply. I wonder if in the end we won't end up in merging the two anyway (some browser/UI change; indication; researchers/browsers working on identifying misuses) |
@npdoty wrote:
Yes, and it recommends purging storage, permissions, etc.
I guess the core question is: is the I agree that there is a possibility for a browser to classify and treat a start_url as a tracker, but I don't feel this raises to the level of a super cookie. So, I'm not saying we shouldn't do anything here - but I don't think it's a dire situation. @lknik wrote:
Sounds like a UX problem, tbh. I could "select all" apps and dump them in the trash... or select a bunch and dump them in the trash. Compare how Firefox and Chrome have "bookmark managers" that provide for sophisticated UIs for managing this problem. One could imagine the same for PWAs. |
Can current pages create unique to-be-bookmarked pages and are they opened without displaying a URL?
Well it does allow cookie respawn. |
no, as Fullscreen API requires a user gesture.
Yeah. 🤔 |
The bookmarks case is an interesting corollary - they offer pretty much the same capability to embed some identifier that's always present even after site data deletion. To me, the only meaningful difference between bookmarks and installed web apps for this particular case is that installed web apps don't show the URL bar when they're opened from their shortcut. In the bookmarks case, relying on users noticing that there's a unique tracking token in the URL bar seems to effectively reduce to exactly the same problem here - relying on users to inspect the start URL to notice there's a unique tracking token. In both, clearing site data then using the shortcut to reopen the site could allow cookie respawn, and bookmarks have been around for a very long time with this. We certainly could provide easier ways to inspect the start URL. Perhaps, for instance, we could show the location bar the first time you open an installed web app after clearing data. That seems to reduce back to precisely the guarantees offered by bookmarks in this situation? |
User agents can solve this by reinstalling the web app from a fixed install URL designated by the user after every session similar to using bookmarks in an incognito window. |
@alancutter, I don't follow... can you give a concrete example of what you mean? I'm looking at my bookmarks in private browsing mode, and I don't see the browser changing them in any way when I click on them? |
Given an install URL decided by the user/admin policy (not the app and not containing tracking data) the browser could do a fresh install of the app for every user session. This resolves the tracking problem by making start_url ephemeral rather than persistent. |
@alancutter I don't think that's very useful (unless the app is being controlled by an administrator or particularly careful user who is inspecting the URLs of the manifests being installed). You can always encode user-identifying info into one of the many URLs. If we fresh install "the app" every session, we're fresh installing from some manifest URL which could have user IDs in it. Or from a start URL that has IDs in it. At some point, what we consider to be "the app" could in reality be one of millions of different apps, one for each user. The only way to prevent that is to have the user manually inspect all the URLs to see if any of them have something that might look like an ID. That's not feasible for the majority of users. Even a power user ... well how am I going to know if something is an ID or just something like a content hash? The problem becomes quite intractable to solve properly even for power-users. I think we should just admit that it's a potential attack. |
Agree. I'm closing this as we acknowledge this problem, but it's not solvable because it's inherent to URLs. We let implementers know this is a problem and provide possibilities to mitigate through the UI. https://www.w3.org/TR/appmanifest/#privacy-consideration-start_url-tracking |
I laid out three possible approaches here: #399 (comment) To repeat that question: How should clearing local state interact with installed PWAs? The current privacy note in the spec just suggests that maybe users should be able to inspect the URL and hope they find, recognize and realize they can remove identifiers. And if the user clears local state, do we expect the |
I came to post the same solution, but in cases of legit, useful parameters, it's not very hard to expand the manifest params to include these "legit, non-privacy-invasive uses". If language prefs are vital to PWAs, it can have its own key:value pair within the manifest, while still trimming start_url down to just the top level domain. start_url could even be parameterized from a list of approved key:value pairs within the manifest, and just drop any that don't match the key, or the format of the value isn't recognized. |
|
Sounds like what is needed is a watchdog service that checks manifest files for privacy concerns and app stores that accept PWAs should have a check for this also. Also if disallowing randomly generated strings or user state in the URL were mentioned in the spec then maybe manifest validators would check for it. So while this answer is outside of an immediate fix, being in the spec gives better direction into validation of manifests. |
But it's not just the manifest file that may contain an ID.
What if I modify my server as follows: Whenever any HTML or Javascript file is fetched, the server adds the current time as a string to its contents (in a suitable place). If any browser tries to re-fetch it with the If-Modified-Since HTTP header set, the server returns Not Modified.
Using this server, you can make a Progressive Web App that 'knows' the exact time at which it was downloaded, without needing to put anything special in the Manifest. Provided the download rate is not too high, this timestamp can be used to identify a user across cookie-clear events etc, unless a fresh copy of ALL files is re-downloaded (i.e. app is uninstalled and reinstalled).
Policing the Manifest won't help unless you also police the server that serves the rest of the files (a trusted "app store" server should be OK, but a third-party server can do tricks).
|
I think you need to sign the app and have a predefined list of files signed. This is what happens in the native app process. This definitely makes the process of changing code less flexible but also increases security. Right now most apps are open to man in the middle attacks and file tampering. So I could see PWAs going that direction no matter what. |
I would be wary from inventing clever schemes, which at times can always be done this way or another in relation to many web features. I simply feel it would be more useful to limit the focus to the PWA/manifest/start_url. I simply fear that if we continue to expand the view here, we may end up in undesirable place ;-) |
Interesting. So something like Certificate Transparency - but intended to dynamic manifest files, like say, Manifest Transparency Extension? (It would require additional infrastructure, though - I just do not know exactly if we are there today in regards to how serious privacy/tracking is treated in practice, as to the motivation for rolling up such a scheme) |
Yes I think something like that is going in the right direction. And yes it would require more infrastructure. That's why I think the first step is to put it in the spec and then start a monitoring service. Giving PWAs a privacy "rating" with such service might be enough of an incentive to not do the practice without a huge Transparency framework. Although as PWAs gain more traction and higher privileges, I think you'll go in that direction anyway for both privacy and security. |
A proposal made internally was just to use a well-known URL. That would basically solve most things: it strips fragments, queries, and arbitrary paths where identifying information could be stored (doesn't solve for sub domains, only tld+1 would do that but that seems impractical for things like GitHub pages). That could then be coupled with a hybrid solution: when a user installs an app, partition it into its own storage compartment. Then, for sites that depend on authentication, require the user to log in again using password autofill, webauthn, WebOTP, Credential Management API, or whatever standard authentication mechanism the site depends on. It's a small inconvenience for a big privacy assurance. |
Using a well known URL creates a large migration problem: all currently installed PWAs that don't already conform to the well-known URL would be broken, and for many sites, fixing that problem would require a site re-architecture that might not be that likely to happen. How would that problem be practically addressed? Additionally, removing fragments, queries, and arbitrary paths removes a significant amount of positive utility (the classic tradeoff of the design of URLs). How could you replicate that utility? |
We'd have to start warning and ask developers to migrate over the next N years. Or a browser vendor would need to take the compat hit. Alternatively, we see what percentage would be impacted and make some determination based on that.
Those we would need to look at on a case-by-case basis and see if we can provide the same utility in some other way. |
Do the same privacy concerns apply to all URLs in the manifest that get navigated to? E.g. file handlers and shortcuts. |
Yes. I don't think special-casing This isn't specific to PWAs. This is true of bookmarks and any other mechanism that saves URLs to later navigate back to the site. (As discussed much earlier on in this thread.) The most helpful approach which I'd like to focus on is @npdoty 's thoughts along the lines of clearing storage. In my opinion, we should treat the existence of a PWA installed on the user's device as another form of local storage, like a cookie or indexed DB. If you clear cookies for an origin, but you don't uninstall the PWA, then you haven't completely cleaned out the presence of that origin on your device. Therefore, I think the best recommendation we can make to browser manufacturers is that any dialogue that offers to clear cookies and other local storage should also offer to uninstall any PWAs or shortcuts (and maybe bookmarks?) whose scope lies in that origin. A "clear all" button (or "select all" checkbox) should include clearing PWAs. Edit: I filed crbug.com/1112220 to track this in Chromium. |
Like @mgiuca and others have said. Special casing the start_url would really only move the goal post. Installing an application is an inherit action of storage. The best solution to this issue would be to allow the user to uninstall the application via the clear data functions of various UAs. I would suggest that any clear all function allow for the opting out of uninstallation; especially for an option to "clear all data across multiple origins" as I am sure that removing a number of apps from the user's device by clearing data would confuse many users and lead to unintended data loss. It is also extremely possible that users will distinguish installed apps and their website counterpart as separate and not expect that clearing data from one to affect the other at all. So maybe during a clearing dialog the user is presented with the option to check mark the current context (web / app) and the other context. |
We (editors) agree that something like 1 would be good guidance. We can include language like that in the spec to dissuade user ids in the start_url. We will send a pull request. |
We've put up some proposed text to address this at #1029 ... we would appreciate feedback on that to close off this issue. |
I think it's fair. It's just that it does not prescribe any solution. Isn't it too soon to close the issue? It seems to be open since 2015 for a good reason. |
To summarize @npdoty comments in https://lists.w3.org/Archives/Public/public-privacy/2015JanMar/0117.html there are concerns about
start_url
containing special ids or simply something that hints that the user is coming from a homescreen application. This is fingerprinting/privacy sensitive information that the user might not be aware of.I think the issue of people doing
start_url: 'index.html?from_homescreen'
is something we might want to mention in the spec but I don't think we should encourage browsers to prevent this because it is clearly something websites want for various reasons (mostly statistics).However, I am concerned about having
start_url: 'index.html?$GUUID'
because it is a way to track the user without them being aware of it. I'm not sure what the spec should say or the browsers could do. Maybe we could recommend showing thestart_url
to the user and allow them to edit it?The text was updated successfully, but these errors were encountered: