Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats API #679

Merged
merged 14 commits into from
Feb 5, 2021
Merged

Stats API #679

merged 14 commits into from
Feb 5, 2021

Conversation

ukutaht
Copy link
Contributor

@ukutaht ukutaht commented Feb 4, 2021

Changes

Relevant discussion: #95

This adds a read-only API to retrieve stats from your Plausible dashboard. I will create a PR with full documentation on our docs repository soon. Will also publish a Postman collection for testing. Some relevant points to discuss:

Authentication

I decided to go with a simple Bearer token for authentication. The API key is generated with Erlang's crypto:strong_rand_bytes(64) |> Base.url_encode64(). This creates 64 strong random bytes. Before storing in the database, this value along with the server secret is hashed using sha256. The first 6 letters are kept as plaintext to make it easier to recognise the API key from the UI.

I didn't want to use Bcrypt here because it adds an artificial delay of about 250ms to each request. OK when you're loggin in, but not OK for API requests. Since the API 'password' is 64 random bytes, we have much more entropy there and a brute force attack is less likely than user-generated passwords. Would be good for someone to confirm whether this is a reasonably secure approach.

Endpoints

Endpoints are namespaced with /api/v1/stats. The API is quite low level and flexible, almost like a restricted database interface. I worry about getting the abstractions right at this stage. I am building it with the aim of being able to rebuild the main dashboard by just using the public API endpoints. The endpoint are:

  • /api/v1/stats/realtime/visitors for current visitors
  • /api/v1/stats/aggregate for aggregated stats like the top row of the dashboard (visitors, bounce rate, etc)
  • /api/v1/stats/timeseries for graph data like the main graph on the dashboard
  • /api/v1/stats/breakdown for breaking down properties like 'Top sources', 'Top pages' etc. Could also be called group-by. Not part of this PR, under construction still 🚧

Again, full documentation will be available soon on the docs repo.

The endpoints are built to accommodate features that we have hard plans to build in the near future:

  • Different filter operators (currently only == supported)
  • Combining filters with AND and OR (currently only AND supported)
  • Different comparison modes (previous period, month-over-month, year-over-year, etc)
  • Configurable interval for timeseries data (e.g. view last year of data but group daily instead of monthly)

Tests

  • Automated tests have been added

Changelog

  • Entry has been added to changelog

Documentation

  • Docs have been updated

@oliver-kriska
Copy link
Contributor

What about Phoenix Token for api authentication? Performance should be fine, even you can use lower key iterations than default 1000. Security should be maybe better. You can store api_key id in this token and do some verification or even skip it, whatever you want. In case of huge need for performance, you can use cachex for cache of actually used keys and verification.

https://hexdocs.pm/phoenix/Phoenix.Token.html#decrypt/4
There is code how it works: https://github.com/elixir-plug/plug_crypto/blob/master/lib/plug/crypto/message_encryptor.ex#L58-L67

@oliver-kriska
Copy link
Contributor

The best practice is just show once the api key. But you can store it with Cloak and allow to show it. This storing type is not good for search but you don't have to search because token can contain regular api key id inside.

@ukutaht
Copy link
Contributor Author

ukutaht commented Feb 5, 2021

What about Phoenix Token for api authentication? Performance should be fine, even you can use lower key iterations than default 1000. Security should be maybe better. You can store api_key id in this token and do some verification or even skip it, whatever you want. In case of huge need for performance, you can use cachex for cache of actually used keys and verification.

https://hexdocs.pm/phoenix/Phoenix.Token.html#decrypt/4
There is code how it works: https://github.com/elixir-plug/plug_crypto/blob/master/lib/plug/crypto/message_encryptor.ex#L58-L67

Like you mentioned the best practice is to just show the key once. After the key is generated, I would prefer to treat it like a password. This means that even in case of the database and secrets leaking, it should not be possible to retrieve the plaintext api keys.

To achieve this I wanted to use a one-way hash function instead of a two-way encryption function. This is why I didn't use Phoenix Token. Because in case a of the database and secrets leaking, the attacker can just Phoenix.Token.decrypt/4 them all and get access to api keys.

With the current sha256 approach, the attacker would not have an easy way to retrieve the plaintext value of the api key. Even if they knew the server secret, they would have to brute force 10^77 possible combinations.

@ukutaht ukutaht marked this pull request as ready for review February 5, 2021 08:32
@oliver-kriska
Copy link
Contributor

@ukutaht I got it. I understand what you want to achieve. We will see how it will work in production. BTW: it will be bigger problem when DB and phx secret will be compromised ;)

@ukutaht ukutaht merged commit 5acb5b7 into master Feb 5, 2021
@ukutaht ukutaht deleted the api branch February 5, 2021 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants