-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Public analytics & metrics pipeline #97
Comments
Totally agree, this could be a great community feature, and is a key step towards making a "case" for binder tech as being impactful. One challenge: how would this work once Binder is federated? Would these statistics be kept at the BinderHub level? If there are multiple public streams out there, then it would be straightforward to aggregate them, so maybe not such a big deal so long as the data is there. |
Indeed, ideally every BinderHub would make its stream public and people can aggregate. See https://wikiapiary.com/wiki/Main_Page for how this sortof aggregation happens for MediaWiki instances (which run Wikimedia but also other websites on the internet unrelated to wikimedia) |
+1 on the federation question :). Does this mean we'll need to have a BinderHubHub? |
I prefer "binderbinderhubhub" |
then we can make it into a song like
ok no more coffee for me this morning |
merge jupyterhub#198 jupyterhub#199 binderhub and jupyterhub#95, jupyterhub#97, jupyterhub#99 of repo2docker
What kind of tools/setups would we use to collect the events emitted by binderhub? As a user of this data, I'd hit "events.mybinder.org" and receive all future events (similar to how you subscribe to the twitter 1% stream?). |
The way you'd usually do this is:
This accomplishes a few things:
|
We could also explicitly have a 'public' field in the JSON log output, thus whitelisting the things that appear in the public stream. This protects against things like secrets accidentally leaking. |
One of the coolest things about Wikimedia is the large amount of usage data it makes available publicly: Page Views API & Dumps, content dumps, client usage, live recent changes stream etc. This makes it very useful for a number of purposes - fundraising, quantifying impact, etc. By just making the data available, it enables a wide variety of people to derive whatever meaning they want from the raw data, enabling creativitiy & removing itself as a bottleneck.
We should take a similar approach, both because we strive to be open & we're a small team who can not do all the cool things that would be possible with this approach.
The simple proposal here is:
This can be our primary information source. On top of this, multiple other things can be built:
And far more. This also prevents us from being a bottleneck, and provides space for a developer community that uses binder (rather than just one that develops binder) to open up. We also determine what kinda info is emitted, making sure we preserve our users' privacy.
This issue is primarily to talk about this approach, rather than technical details. Thoughts?
The text was updated successfully, but these errors were encountered: