Replace hashed email with manifest-based offline check #31

kzu · 2023-08-10T15:39:56Z

It was brought to my attention that this wasn't sufficiently anonymizing given that especially for corporations, the pattern for emails is not hard to probe if you have a list of emails from somewhere, but attempting to access the cloud URL for the sponsorship.

The current proposal (client-side CLI implementation PR) would work as follows:

Users build for the first time with a SponsorLink-enabled library. Analyzer looks for a manifest as a user envvar. Since it's not found (or empty), it cannot determine sponsoring status and informs (Info diagnostic) that Project X is seeking sponsorships. If you are already a sponsor, please sync your sponsorships using gh sponsors sync....
User follows the diagnostic link, which explains the following steps clearly
Install the GH CLI and the gh-sponsors CLI extension (basically running gh extension install devlooped/gh-sponsors).
Runs gh sponsors (sync being the default command). On the first run, the tool explains again what's going to happen and performs the following steps:
a. Creates a user envvar with a random GUID to use for salting all hashes
b. Gets the user's active sponsored accounts
c. Gets the user's verified emails
d. Gets the user's organizations and their verified domain(s), as well as their sponsored accounts
e. Hashes each user's email c) with each sponsored account b) (salted with a)) and turns them into JWT claims ("hash"=hash)
f. Hashes each verified org domain d) with sponsored accounts too (salted with a too), and turns them into JWT claims too
g. POSTs this to a GitHub-authenticated SponsorLink endpoint that signs the JWT with SL private key. All the endpoint validates is that the logged in GitHub user (via Auth0) is the same as for the GH CLI.
h. The backed responds with a signed JWT with an expiration that covers the current month (sponsorships expire at the end of each month).
i. The token is saved to the envvar checked in 1)

On a subsequent build:

Analyzer sees both the manifest and salt envvars
If the token is expired or invalid (i.e. not signed), an Info diagnostics tells the user to run gh sponsors again.
Analyzer does hash(salt, email, sponsorable) and tries to find that claim within the JWT (all local). It also does a fallback check for hash(salt, domain(email), sponsorable), to support org sponsorships.
If the hash is found, the user (or an org) is sponsoring. Otherwise, an Info tells the user to please sponsor, with a link to do so for the given sponsorable (project/user/org).

Of note:

User email(s) are never accessible to SponsorLink, since installing the GH App (on the website) is no longer required
Organization info is also not accessible to SponsorLink (no need for GH Admin App install on the website)
It is inconvenient to work around, but obviously not impossible (but that is out of scope anyway).
The one-time install of the GH CLI and gh-sponsors extension is a barrier, but so was installing the GH SponsorLink app (on the website).

The goal is for integrators to just have a documented standard mechanism for verifying the JWT manifest token, even without any SL-provided code. But a simple loose file "helper" should be provided for simplicity.

GH CLI experience is similar to the following:

The text was updated successfully, but these errors were encountered:

teo-tsirpanis · 2023-08-10T18:53:50Z

An idea I had is to use a CLI tool to authenticate to GitHub and prompt to install the SponsorLink app, and then get a certificate from a backend service and put it in the certificate store. This way we address two of the biggest complaints with the existing approach:

Users give explicit consent to provide their emails.
Building a project that uses SponsorLink will have less overhead. Instead of spawning a process and making a network request, verifying that you are a sponsor will be done by verifying a certificate. If that certificate does not exist or is expired (each project can choose the lifespan of its sponsorship certificates), the user will be warned accordingly.

ScarletKuro · 2023-08-10T22:57:35Z

Instead of spawning a process and making a network request, verifying that you are a sponsor will be done by verifying a certificate.

This is what I believe the author should consider as well. The validation process must work in offline mode. If I haven't forgot how asymmetric encryption works then achieving this should be possible.
One potential solution involves designing a webpage that enables users to input relevant details(email address) to verify that this person is a sponsor. The server would generate a certificate. This certificate would be downloadable and placed within in the solution directory. The analyzer would verify the certificate that it was signed by a trusted source.
Sadly, this could be abused if the user were to share their generated certificate. But I'm not sure if more complex solution is required, after all I don't think that author should focus in making some DRM.
There is alternative approach, which can be abused less. It would require to write a cross-platform tool, the user(sponsor) would need to run this tool(it should include terms of use that user accepts to avoid any legal issues), this tool collects the HWID of this machine and generates a certificate. Then the user goes on the page, enters some information (like email) the webservice validates that this is sponsor and allows to upload the certificate, the website signs it. This signed certificate would be downloadable and placed within in the solution directory. The analyzer would verify the certificate that it was signed by trusted source and decrypt the HWID from it, then the analyzer would get the HWID from current machine and compares it with the one from the certificate, if they match then congratulation, you are the sponsor.

egil · 2023-08-10T23:13:55Z

If the premise is that users need the SponsorLink app installed on their dev boxes, perhaps that app could require the user to sign into GitHub (e.g. like the Co-pilot plugin does) and then it could query GitHub for the projects the user sponsors.

That information could then be made available locally on the machine to the analyzer running on it. That way, there is no "behind the scenes" communication going on at build time. SponsorLink app downloads and caches a list of sponsored projects to the user's machine.

It could also enable users that are part of an organization that sponsors a project to be counted as a sponsor, i.e. the organizations are the sponsors and not the individual devs working for the company. In many places, organizations are able to deduct sponsorships from taxes, whereas individuals are not, and would thus have to go through expensing this to their work if the work is to cover the sponsorship.

teo-tsirpanis · 2023-08-10T23:48:58Z

if the user were to share their generated certificate

My thought was to put the certificate in the OS's certificate store, marking it non-exportable if possible.

teo-tsirpanis · 2023-08-11T00:34:29Z

I highly doubt that this is illegal in Europe.

And requesting the email can be skipped if the sponsoring accounts of a user can be determined from the GitHub API without it.

egil · 2023-08-11T08:05:40Z

My suggestion is to replace it with nothing, because that's the only acceptable solution under EU law, and the only solution many corporates will accept. You can't do the mapping without harvesting data to create a persistent identifier, and both halves of that are illegal in Europe under separate enactments (data protection law for the persistent identifier, computer misuse law for gathering the data you need).

This is of course not at all true. You can collect all the data you want in the EU as long as you just ask the user and the user gives consent. That is the entire problem with the current implementation. One or more emails are collected without asking permission first.

TeddyAlbina · 2023-08-11T08:25:41Z

My suggestion is to replace it with nothing, because that's the only acceptable solution under EU law, and the only solution many corporates will accept. You can't do the mapping without harvesting data to create a persistent identifier, and both halves of that are illegal in Europe under separate enactments (data protection law for the persistent identifier, computer misuse law for gathering the data you need).

Read the source code, nothing is happening if you don't install the github sponsor app and give explicit consent and also if you are not using an IDE

mattleibow · 2023-08-11T08:48:53Z

What about api keys? NuGet, Azure and all the others have api keys. Not sure how a hash of an email is somehow less private that an api key that directly maps to your email on another service.

However, just in case, may e people can just use a GitHub access token to read? So the token is the key? Then corporates can just set an envvar on ci/local and the build can do that?

we-sell-bags · 2023-08-11T09:42:52Z

What about api keys? NuGet, Azure and all the others have api keys. Not sure how a hash of an email is somehow less private that an api key that directly maps to your email on another service.

However, just in case, may e people can just use a GitHub access token to read? So the token is the key? Then corporates can just set an envvar on ci/local and the build can do that?

You make two faulty assumptions.

All development machines can see a live network
SHA 256 is no different than other API keys... a SHA 256 has has a very very low collision, that basically makes the hash unique..... for each email account & as such it is an identifier that can uniquely identify a SPECIFIC user to EXTERNAL sources....
by using some simple code, it becomes a situation where you can probe the back end database for SHA hashes... and as such identify specific users , it matters not that you do not have an "exploit", the fact is , it can uniquely identify people to EXTERNAL sources....

egil · 2023-08-11T09:54:22Z

What about api keys? NuGet, Azure and all the others have api keys. Not sure how a hash of an email is somehow less private that an api key that directly maps to your email on another service.

However, just in case, may e people can just use a GitHub access token to read? So the token is the key? Then corporates can just set an envvar on ci/local and the build can do that?

The point is that you have to get consent from a user before collecting their info. A 3rd party app or web site should not just go and collect your info without your consent, nor should it go and fish out API keys you may have access to without asking permissions.

For example, If SponsorLink uses OAuth2 to login to GitHub, it can request the users email address, and the user has to click the big green "Install and authorize" button. That is the user giving consent, and in that case, SponsorLink is not doing anything wrong.

macco3k · 2023-08-11T10:09:17Z

Read the source code, nothing is happening if you don't install the github sponsor app and give explicit consent and also if you are not using an IDE

If I read the explanation in the main readme, that does not seem to be true:

Check number 3. explicitly uses the email collected in 2. to send a remote request. All without any user consent.

kzu · 2023-08-11T19:25:53Z

@macco3k the SHA256 of the email. But point taken. That's what this issue is about.

@mattleibow I was thinking just depending on the GH CLI, so I don't have to deal with auth at all.

jozefizso · 2023-08-12T19:37:47Z

if the user were to share their generated certificate

My thought was to put the certificate in the OS's certificate store, marking it non-exportable if possible.

No licensing systems sends you certificate with a private key. They sign the license with their private key.

jozefizso · 2023-08-12T19:39:42Z

if the user were to share their generated certificate

My thought was to put the certificate in the OS's certificate store, marking it non-exportable if possible.

And obviously the private key material is stored somewhere and administrator can backup/export the private key.

https://www.yuenx.com/2022/certificate-security-export-cert-with-non-exportable-private-key-marked-as-not-exportable-windows-pki/

seanterry · 2023-08-13T14:24:45Z

SHA256 of the email. But point taken.

Hashing does not anonymize data.

A hash of PII should be treated as PII, as it effectively produces an easily-reproducible unique identifier. Email addresses are such a small domain of values that there is an infinitesimal chance of collisions and one can produce the hashes for simpler email addresses (combinations of common names @ common domains) very rapidly.

For those that require FIPS-compliance in the US, this is not an approved use of any hash algorithm. Hash algorithms are only approved for verifying message integrity, in approved digital signature algorithms, and in approved key-derivation functions.

The current implementation is not sound from a technical standpoint. A license, digitally signed by you and verifiable at runtime, would be a better implementation. Though do allow for multiple types of algorithms as nothing is universally acceptable. I'm not even going to get into the privacy laws you'll now need to be comply with, nor the legal liability you'll be taking on with it being a paid product. Those are better discussed with your attorney, and not random people on the internet.

As a developer of some FOSS libraries that next to nobody uses, this move sucks. I don't monetize those projects, and consequently won't use any that require any monetary investment from me.

As a manager of a development team, this would never pass muster, and would be an instant "pin/fork a version that doesn't require payment or else eliminate it as a dependency". And I say that not because we're cheap and don't want to pay for things; I say that because the implementation is poor, the value provided is minimal, and there is no option to sponsor at an organizational level to unburden my team. And perhaps most importantly, it makes me look like an idiot to my boss when we suddenly have an unplanned expense (no matter how small).

For example:

SponsorLink will never interfere with a CI/CLI build, neither a design-time build.

While I am certain there are some organizations that prioritize getting their builds out with no delay or interference, mine is not one of them. You add 1, 2, or even 10 seconds to my CI builds, then you're wasting an almost-free amount of compute time.

If you add a millisecond to my developers' unit test times, we're going to break out the pitchforks and torches. Moq was already painfully slow, and only gets a pass because it was historically more convenient to write tests with than arranging an integration test. But with AI making integration tests far less tedious to write, and which typically execute as fast or faster than Moq while providing much greater assurances of correctness with greater readability... Moq's current value lies almost entirely in not needing to rewrite existing tests. And while that is a tangible and measurable amount of value, it's not an intolerable one.

But the biggest issue--the underlying one that I haven't seen mentioned in this discussion--is the stain this places on the FOSS community in general. People want to monetize their work, and that is understandable. And while I don't know what the right way is, this is definitely one of the wrong ways. In this case, suddenly charging for something that was previously free is exactly what businesses that were wary of using FOSS were afraid would happen. There was so much FUD around FOSS in years past that it took ages to be allowed to use it in our code.

Once we got past that (look, even Microsoft is doing open-source now), it put developers in the driver's seat and we became able to stop re-inventing the wheel and use existing packages because they saved us time. We didn't use free ones simply because we were cheap; we used them because it was the path of least resistance. They offer value to our productivity that doesn't require justifying a purchase order. Once money is involved, things get painful.

But today, FOSS has become yet-another vector for malware. And where once we feared submarine patents or unfixed defects in abandoned projects, we're now being scalded by submarine licensing fees. We either need to admit we made a poor judgement call by asking the company to pay for it, or else pay for it out of our own pockets. It's a breach of trust that affects the whole community.

Perksey · 2023-08-13T23:20:18Z

And now I present the most overkill solution to preserving anonymity:

Use a k-anonymity model with the repo name (this prevents giving out email digests willy nilly, the digests are always scoped to the repo). Digest is calculated as H(sponsorEmail || sponsoredRepo) where H is the hash function. API exposes an endpoint that will take the first 5 characters of the digest, and return the suffixes of matching digests.

FindDigestSuffixes(digestStart)
    // Read public blob with name digestStart
StoreDigest(digest)
    // Append to a public blob with name digest[:5] a line containing digest[5:]
AcknowledgeNewSponsor(sponsorEmail, sponsorRepo)
    digest := H(sponsorEmail || sponsorRepo)
    StoreDigest(digest)

There is no solution that prevents the storage of PII given that what you want to do here is identify individual persons (sponsors), but you can at least stop users who have not agreed to share their email with you by linking their account sending their PII to you.

DavidAtImpactCubed · 2023-08-14T11:50:57Z

And now I present the most overkill solution to preserving anonymity:

Use a k-anonymity model with the repo name (this prevents giving out email digests willy nilly, the digests are always scoped to the repo). Digest is calculated as H(sponsorEmail || sponsoredRepo) where H is the hash function. API exposes an endpoint that will take the first 5 characters of the digest, and return the suffixes of matching digests.
FindDigestSuffixes(digestStart)
    // Read public blob with name digestStart
StoreDigest(digest)
    // Append to a public blob with name digest[:5] a line containing digest[5:]
AcknowledgeNewSponsor(sponsorEmail, sponsorRepo)
    digest := H(sponsorEmail || sponsorRepo)
    StoreDigest(digest)
There is no solution that prevents the storage of PII given that what you want to do here is identify individual persons (sponsors), but you can at least stop users who have not agreed to share their email with you by linking their account sending their PII to you.

IIRC, haveibeenpwned.com does something similar to this to avoid ever sending email/password hashes over the wire: https://www.troyhunt.com/understanding-have-i-been-pwneds-use-of-sha-1-and-k-anonymity/

kzu · 2023-08-15T05:17:45Z

I've been exploring creating a GH CLI extension (initial proof of concept here). It might be the easiest to integrate since it automatically conveys that you have to be signed in to the GH CLI to operate:

The idea is that the CLI tool would allow you to run gh sponsors sync [sponsorable project], which would download a SponsorLink "license" file. If you hadn't installed the GitHub App at that moment (which requires your permission to access your email), it would tell you so and offer to go to install it (just like I do via a diagnostics analyzer link right now).

At (IDE-only, like today) build time, the analyzer would look for this file and verify your (local git configured) email against it, entirely offline, and at most (say) once a day. So the email would never leave your machine if you hasn't installed the GH app at all, and neither would it if you had.

Since a calendar month is the cut-off for sponsorships (one-time or recurring), you'd need to download this file periodically, as it will therefore have an "expiration" built-in.

The file would need to contain sufficient information to:

Verify current working dir git repo email is one associated with sponsorable/account as a sponsor
Support org-wide sponsorships (so, email domain-only check)
Use a k-anonimity scheme when generating the file to avoid leaking PII

So, based on @Perksey suggestion, the algorithm that would also support org-wide sponsorships might look like the following:

# this runs locally on the dev's machine, via gh CLI extension
sync ($sponsorable: "moq")
    foreach (org in me if org.verified)
        hash = H(org.domain | $sponsorable)
        # digest fetched from blob storage
        if (digest(hash[..5]) contains hash[5..])
            append($sponsorable.txt, hash)
    foreach (email in me where email.verified)
        hash = H(email | $sponsorable)
        # digest fetched from blob storage
        if (digest(hash[..5]) contains hash[5..])
            append($sponsorable.txt, hash)

# this runs on backend *after* installing the GH App that 
# authorizes SL to read user's email(s)
sponsored($sponsor, $sponsorable)
    if ($sponsor is user)
        foreach (email in $sponsor where email.verified)
            hash = H(email | $sponsorable)
            append(diggest[hash[..5]], hash[5..])
    elseif ($sponsor is org && org.verified)
        hash = H(org.email.domain | $sponsorable)
        append(diggest[hash[..5]], hash[5..])
        # if org has a website, i.e. gh api orgs/particular
        hash = H(org.url.domain | $sponsorable)
        append(diggest[hash[..5]], hash[5..])

unsponsored($sponsor, $sponsorable)
    # same but removing hashes

# this runs locally in the editor, note: no network requests
check($sponsorable: "moq")
    if (!exists($sponsorable.txt)) return false;
    if (contains($sponsorable.txt, H(email | $sponsorable))) return true;
    if (contains($sponsorable.txt, H(email.domain | $sponsorable))) return true;

Does that sound correct? From @DavidAtImpactCubed's link to Troy's most excelent blog post, I take it that this set up would be fast, PII-preserving and accurate, even if we switched to SHA-1?

Thanks a lot for the feedback!

Perksey · 2023-08-15T05:22:53Z

I would probably switch to SHA256 or SHA3-256 given the known cryptographic problems with SHA1 to cover your backside a bit, but otherwise sounds good to me.

kzu · 2023-08-15T07:27:40Z

Troy's fairly recent post on the topic makes a strong case for actually sticking to SHA1 though: https://www.troyhunt.com/understanding-have-i-been-pwneds-use-of-sha-1-and-k-anonymity/

DavidAtImpactCubed · 2023-08-15T08:29:10Z

On reflection, I'm not sure that k-anonymity (a-la-haveibeenpwned.com) is achievable in your use case. As I understand it, it relies on being able to segment the reference data set (in your case sponsors) into groups which share a common hash prefix which are large enough that the response contains many possible matches (the "k" in k-anonymity). In the case of haveibeenpwned, the responses contain a large number of matching hashes (see an example here: api.pwnedpasswords.com/range/11111)

In your case:
a) I'm not sure that the list of sponsors emails is public (that might be wrong, I'm not familiar with the domain!)
b) Even if the list of sponsors is public, I'm not sure there are enough of them for k-anonymity to work.

The flip side of this is that if the number of sponsors is public, and is a short list, then why not respond with a list of hashes of all sponsors, and do all checks client side?

If your list of sponsors is not public, none of these approaches are valid.

Perksey · 2023-08-15T15:56:32Z

@kzu The use cases are different. HIBP protects known-public data therefore the SHA-1 just acts as a minor obfuscation as well as handling the compromised data in a uniform manor. I strongly discourage using a known-insecure hash for data that is not strictly public.

@DavidAtImpactCubed As discussed, the fact that sponsors will have PII stored and exposed as pseudononymised is unavoidable, k-anonymity isn't being used to prevent this. This model is only used to prevent entire hashes of emails being sent to the server by those who have not agreed to it (those who have their data stored agree to it at time of linking the GitHub app)

Perksey · 2023-08-20T21:30:28Z

... and people are trying to tell him how it's "done properly", and he keeps ignoring it because he doesn't like how it stops him trying to do it in the way he's trying to do it.

When you say "people" do you mean the people actively proposing solutions on this thread or the people just saying it's not done properly and not doing a single thing proactive to suggest a compliant solution to be implemented?

cmjdiff · 2023-08-20T21:32:42Z

When you say "people" do you mean the people actively proposing solutions on this thread or the people just saying it's not done properly and not doing a single thing proactive to suggest a compliant solution to be implemented?

Translation:

When you say "people" do you mean the people trying to find workarounds on this thread or the people just saying it's not done properly and explaining why a compliant solution doesn't exist?

I'm sorry that GDPR's lack of wiggle room offends you.

I've offered a compliant solution: Don't do any checking and just put in the message regardless. Guaranteed 100% compliant with GDPR.

harunlegoz · 2023-08-20T21:33:19Z

@cmjdiff you're not the only GDPR-aware developer here. I'm in the UK and I'm in data business. I've done this before, so as countless others. Just because you don't know how it can be done, doesn't mean no one else does.

Maybe @kzu knows how it's done, maybe he doesn't. Advise him on how it can be done, then. Not how it's impossible.

We've all gave him a list of "One Million Things That You Can't Do Under GDPR". We've all criticised him to the hell and back, because he has done it wrong. Hell, I've given 6 different metaphors to explain what's wrong.

He gets it. He's trying to improve it. Give the man a break.

kzu · 2023-08-20T21:44:46Z

Thanks @harunlegoz for the suggestions.

The auth I was talking about was not about getting (locally) the user's sponsors list. The tricky part is coming up with some sort of SponsorLink-backend signed manifest, not something just done on the client-side. Otherwise, anyone could generate a manifest, share it with the whole team and there would be no way to assert a given manifest has been validated/signed by SL itself.

So, gathering sponsors locally is already doable (it's what my PR to the other gh-sponsors repo does already), since the user running the GH CLI is already authenticated to GH. On the backend, similary, the user has to install the GH App (via github.com), which grants the permissions to the SL backend to read the users' email(s).

The missing piece is matching up the locally calculated sponsors with the backend-determined sponsors (this info would be shared by the sponsorable account when they set up the webhook to notify SL of active sponsors) for the "current user" (which is authenticated to the GH CLI but not to the SL endpoint. 🤔 Does it make sense?

harunlegoz · 2023-08-20T21:56:51Z

@kzu, ah, I see. Okay, now I understand. Sorry for making the wrong assumption.

Why not have gh sponsors app have a running background agent that exposes a localhost API (with CORS set to only accept localhost requests)? It can accept the package name true/false to the caller, never store the values anywhere other than memory (and maybe in a SecureString?), and can do checks on behalf of the calling packages? If done right, it wouldn't take more than maybe a couple MB of memory, and literally no CPU usage.

Again, I know I'm not well-versed into the SponsorLink's internal workings and only know it by your descriptions, so apologies if I'm not hitting the target.

seanterry · 2023-08-20T21:58:35Z

So either hire a lawyer or just stop mentioning this very specific and very intricate regulation, you have no authority to decide whether this implementation is GDPR compliant.

@Perksey GDPR rules are not intricate. They are very straightforward, and not aimed at lawyers; they are aimed at us, the processors of data. If you think the GDPR is intricate, oh man, just wait for user stories.

When you say "people" do you mean the people actively proposing solutions on this thread or the people just saying it's not done properly and not doing a single thing proactive to suggest a compliant solution to be implemented?

A number of solutions have been proposed. One should not need to propose a solution, because this is a solved problem, and has been for a very long time. If you want to get paid, charge for your product. That means change the license it is published under, and use a license key. It doesn't require a rocket scientist, brain surgeon, or rocket surgeon. It may require a lawyer, because you no longer get that "get out of lawsuit free" card that FOSS licenses tend to give you.

If you publish under a FOSS license, then waste a developer's time or a business' compute time, then you are undermining FOSS.

So in short, the proper way to handle this is:

Change the license, stop supporting the free versions published as FOSS, and implement one of the time-honored means of licensing software for a price, or...
Ask for donations, but don't waste developer's time, compute time, phone home, or do literally anything else if they don't.

Full stop.

TsengSR · 2023-08-21T01:45:52Z

@TsengSR sure, but my point is none of us have any authority to speak on whether something is GDPR compliant or not beyond a "best effort" implementation which is what is proposed. So either hire a lawyer or just stop mentioning this very specific and very intricate regulation, you have no authority to decide whether this implementation is GDPR compliant.

Well, previously I worked at a company in the health sector which involved sending/transfering estimates which contained A LOT of very critical PII, not about just the patient but also about their diagnosis and stuff, so I know a thing or two about GDPR since the company HAD to comply with it. But no lawyer yes.

What's your professional background on GDPR?

The only reason we can say that the previous implementation was non-compliant is because it was blatantly obviously sending emails without consent, whereas there is reasonable measures in place to at least think we are GDPR compliant. (emphasis on think, until kzu or someone else hires a lawyer there is no way to be sure).

No, it wasn't. There was no mention of it, unless people started asking and all of that boiled up. And even that only happened because the pipelines failed after upgrading (for once because of it being emitted as warning which fails for "Warning as Errors" Pipelines and some other even through exceptions during build, preventing people to ship stuff)

In worst case no one would even have noticed the PII being leaked to kzu!

kzu · 2023-08-21T03:03:57Z

@harunlegoz that wouldn't be an improvement over just checking a local file, I think. If anything, it would look even MORE suspicious to have a running process/agent.

@seanterry

Ask for donations, but don't waste developer's time, compute time, phone home, or do literally anything else if they don't.

Awesome, finally someone understands what this is about, since that's precisely what is being proposed. The "literally anything else if they don't" part, in particular. If there's no sponsorlink file, do nothing, just ask for donation.

Now, if the user IS sponsoring, the analyzer should be able to do something with that information. It can be changing the diagnostics message (or removing it entirely), or even feature-toggling some IDE-only feature. That's the part that needs solving in order for SL to actually provide value beyond just showing a static diagnostics message begging for sponsorships.

@TsengSR

And even that only happened because the pipelines failed after upgrading

I very much doubt that was the case. The analyzer NEVER run in CI. You can see the extensive checks for in editor only conditions I performed since day 1: https://github.com/devlooped/SponsorLink/blob/main/src/Package/SessionManager.cs#L51-L74

NzKyle · 2023-08-21T04:16:18Z

I very much doubt that was the case. The analyzer NEVER run in CI. You can see the extensive checks for in editor only conditions I performed since day 1: https://github.com/devlooped/SponsorLink/blob/main/src/Package/SessionManager.cs#L51-L74

How well tested is that IsCI method? Are you 100% certain that it covers all possible CI environments that dotnet devs would be using?

kzu · 2023-08-21T06:00:52Z

It ALSO has to be IsEditor

gep13 · 2023-08-22T18:57:15Z

@kzu if you did want some more checks in this area, you can find a number of CI provider that are checked for as part of the Cake Build Orchestration system. You can find that code here:

https://github.com/cake-build/cake/tree/develop/src/Cake.Common/Build

Each one has a IsRunningOn... method, for example:

https://github.com/cake-build/cake/blob/6be615392ac27b52d979b5284993f84f175c7201/src/Cake.Common/Build/AzurePipelines/AzurePipelinesProvider.cs#L32-L33

In addition the hashing, which might still be concerning since knowing your sponsors' emails and sponsored accounts might still allow to reconstruct the hash, adding a random, per-install GUID completely removes this possibility. The new Session handles these environment variables so we don't even incur any I/O down the road from the analyzer: * SPONSORLINK_INSTALLATION: a GUID created if not already present (can be cleared by the user at any time to completely change all future hashes as needed), used for salting all hashes. * SPONSORLINK_TOKEN: an access token used to invoke the SponsorLink API to sign the manifest hashes. This is done only to allow integrity verification at analyzer/check time. Since the hashes are now effectively irreproducible by the server, all the server would do is sign the JWT received in the `/sign` endpoint with the corresponding private key, but otherwise the JWT remains intact (only the expiration date is set from the server-side too when signing). Related to devlooped/SponsorLink#31

In addition the hashing, which might still be concerning since knowing your sponsors' emails and sponsored accounts might still allow to reconstruct the hash, adding a random, per-install GUID completely removes this possibility. The new Session handles these environment variables so we don't even incur any I/O down the road from the analyzer: * SPONSORLINK_INSTALLATION: a GUID created if not already present (can be cleared by the user at any time to completely change all future hashes as needed), used for salting all hashes. * SPONSORLINK_TOKEN: an access token used to invoke the SponsorLink API to sign the manifest hashes. This is done only to allow integrity verification at analyzer/check time. Since the hashes are now effectively opaque to he server, all the server would do is sign the JWT received in the `/sign` endpoint with the corresponding private key, but otherwise the JWT remains intact (only the expiration date is set from the server-side too when signing). Related to devlooped/SponsorLink#31

…wing your sponsors' emails and sponsored accounts might still allow reconstructing the hash, adding a random, per-install GUID completely removes this possibility. The new Session handles these environment variables so we don't even incur any I/O down the road from the analyzer: * SPONSORLINK_INSTALLATION: a GUID created if not already present (can be cleared by the user at any time to completely change all future hashes as needed), used for salting all hashes. * SPONSORLINK_TOKEN: an access token used to invoke the SponsorLink API to sign the manifest hashes. This is done only to allow integrity verification at analyzer/check time. * SPONSORLINK_MANIFEST: last sync'ed manifest JWT to check for sponsorships. Since the hashes are now effectively opaque by the server, all the server would do is sign the JWT received in the `/sign` endpoint with the corresponding private key, but otherwise, the JWT remains intact (only the expiration date is set from the server-side too when signing). Related to devlooped/SponsorLink#31

Since we're moving to an entirely offline and manifest-based SL check, we don't need any of the complex stuff we had before. We just need a SINGLE endpoint to sign the JWT manifest sent by the gh-sponsors extension (see devlooped/gh-sponsors#8). Everything else is unnecessary now. The functionality is reduced but simplified significantly, with a massive improvement in PII/GDPR compliance. There is no longer any direct nor indirect telemetry tracked since the user must explicitly run a command and install an extension in their machine before getting a signed manifest. We also no longer provide a webhook for the GH apps, which will be deprecated too. Fixes #31

This was previously ignored from a glob too. Fixes #31

Moxinilian · 2023-08-25T12:47:06Z

@cmjdiff

You need to get it into your head that people will stop going "GDPR GDPR" at you when you stop insisting on trying to do things that violate GDPR.

While I do agree with you that this project is a horrible idea, I don't think your GDPR claims are valid. In fact I think the proposed solution in OP is GDPR-compliant in principle. No personal data is processed unless the data subject consents to it for the sake of verifying they are a paid customer. The consequences of the refusal to provide consent are arguably minor: this is akin to a user buying an ad-free version of the open source software (what a dystopian sentence to say).

aschan · 2023-08-25T13:38:08Z

@Moxinilian consent is only a small part of GDPR. It doesn't matter if there is consent and the technical implementation is ironclad if there isn't well documented processes for how the data is stored, extracted and deleted according to the tenants for transparency, portability and right to be forgotten.

In all fairness GDPR also applies for a commercial licens model where there are paying customers with persisted information somewhere. The administrative overhead must still be handled.

Moxinilian · 2023-08-25T14:04:07Z

@Moxinilian consent is only a small part of GDPR. It doesn't matter if there is consent and the technical implementation is ironclad if there isn't well documented processes for how the data is stored, extracted and deleted according to the tenants for transparency, portability and right to be forgotten.

In all fairness GDPR also applies for a commercial licens model where there are paying customers with persisted information somewhere. The administrative overhead must still be handled.

Yes, hence why I said it is compliant in principle. The rest is formalities.

kzu added the enhancement New feature or request label Aug 10, 2023

This comment was marked as off-topic.

Sign in to view

stakx mentioned this issue Aug 15, 2023

Permanently delete all data from SponsorLink's database that has been collected during builds that included Moq (notably any version 4.20.*) devlooped/moq#1395

Closed

kzu mentioned this issue Aug 21, 2023

Idea for achieving the goel of sponsorlink without email and/or connection #54

Closed

kzu mentioned this issue Aug 23, 2023

Add random GUID talk to all hashes, simplify commands devlooped/gh-sponsors#8

Merged

kzu mentioned this issue Aug 23, 2023

Remove ALL existing code related to stored sponsors #57

Merged

kzu closed this as completed in #57 Aug 23, 2023

kzu reopened this Aug 23, 2023

kzu added a commit that referenced this issue Aug 23, 2023

Add signing function and update deployment dir

cd79cca

This was previously ignored from a glob too. Fixes #31

kzu mentioned this issue Aug 23, 2023

Add signing function and update deployment dir #59

Merged

kzu closed this as completed in #59 Aug 23, 2023

kzu added a commit that referenced this issue Aug 23, 2023

Add signing function and update deployment dir

3d6b8e0

This was previously ignored from a glob too. Fixes #31

Replace hashed email with manifest-based offline check #31

Replace hashed email with manifest-based offline check #31

Comments

kzu commented Aug 10, 2023 • edited Loading

teo-tsirpanis commented Aug 10, 2023

ScarletKuro commented Aug 10, 2023 • edited Loading

egil commented Aug 10, 2023

teo-tsirpanis commented Aug 10, 2023

This comment was marked as off-topic.

teo-tsirpanis commented Aug 11, 2023

This comment was marked as off-topic.

egil commented Aug 11, 2023

TeddyAlbina commented Aug 11, 2023

mattleibow commented Aug 11, 2023

we-sell-bags commented Aug 11, 2023

egil commented Aug 11, 2023

macco3k commented Aug 11, 2023

kzu commented Aug 11, 2023

jozefizso commented Aug 12, 2023

jozefizso commented Aug 12, 2023

seanterry commented Aug 13, 2023 • edited Loading

Perksey commented Aug 13, 2023

DavidAtImpactCubed commented Aug 14, 2023

kzu commented Aug 15, 2023 • edited Loading

Perksey commented Aug 15, 2023

kzu commented Aug 15, 2023 • edited Loading

DavidAtImpactCubed commented Aug 15, 2023 • edited Loading

Perksey commented Aug 15, 2023 • edited Loading

Perksey commented Aug 20, 2023

cmjdiff commented Aug 20, 2023 • edited Loading

harunlegoz commented Aug 20, 2023

kzu commented Aug 20, 2023

harunlegoz commented Aug 20, 2023

seanterry commented Aug 20, 2023

TsengSR commented Aug 21, 2023 • edited Loading

kzu commented Aug 21, 2023

NzKyle commented Aug 21, 2023

kzu commented Aug 21, 2023

gep13 commented Aug 22, 2023

Moxinilian commented Aug 25, 2023 • edited Loading

aschan commented Aug 25, 2023

Moxinilian commented Aug 25, 2023

kzu commented Aug 10, 2023 •

edited

Loading

ScarletKuro commented Aug 10, 2023 •

edited

Loading

seanterry commented Aug 13, 2023 •

edited

Loading

kzu commented Aug 15, 2023 •

edited

Loading

kzu commented Aug 15, 2023 •

edited

Loading

DavidAtImpactCubed commented Aug 15, 2023 •

edited

Loading

Perksey commented Aug 15, 2023 •

edited

Loading

cmjdiff commented Aug 20, 2023 •

edited

Loading

TsengSR commented Aug 21, 2023 •

edited

Loading

Moxinilian commented Aug 25, 2023 •

edited

Loading