Accept header and infrastructure for auth tokens #2263
Merged
+140
−53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problems
CKAN's GitHub downloads fail with an HTTP 403 status fairly frequently. We do not know the exact details that trigger it, but it mostly affects large downloads.
It also turns out that the URLs we're using are supposed to be for users and browsers only, not applications. We should migrate to using the GitHub API.
See #2210 for details.
Cause
GitHub's normal download links are throttled to prevent abuse, and CKAN's use of them is sometimes affected by this.
GitHub prefers applications to use the GitHub API for programmatic access, which requires support for two new HTTP headers, one that specifies the MIME type of the download (without this, the API returns a JSON description of the file instead of the actual download):
Accept: application/octet-stream
And one that contains the user's GitHub API authentication token (this one is technically optional, but downloads will still fail frequently without it, because the limit is 60 per hour):
Authorization: token <OAuth token here>
Changes
MIME types
The
Accept
header is now set for all module downloads. The default isapplication/octet-stream
, but we will use a multi-value format as well whendownload_content_type
is set for a module:Accept: {CkanModule.download_content_type};q=1.0,application/octet-stream;q=0.9
CkanModule
has a new propertydownload_content_type
that maps to the same value in CKAN-meta (this was absent from the registry previously but stored to the metadata repo for all modules by netkan). This allows us to get the MIME type of a download.NetAsyncDownloader
now usesNet.DownloadTarget
objects instead ofKeyValuePair<Uri, int>
to describe requested downloads, as does all of its calling code.Net.DownloadTarget
now has amimeType
property storing the MIME type of the download.The value from
CkanModule.download_content_type
is passed toNet.DownloadTarget.mimeType
when it's available.The value from
Net.DownloadTarget.mimeType
is used to set theAccept
header when it's available.Auth tokens
The Windows/Mono registry now has a new
AuthTokens
key (secret value in screenshot deleted in post-processing):When a download is initiated, its
Uri.Host
property is looked up as a key of this key, and if it's found, then the corresponding value is plugged in to theAuthentication
header.Still To Be Done In Future Pull Requests
Partial fix for #2210. Will be complete after:
asset.url
instead ofasset.browser_download_url
Note that you can set up auth tokens without UI components by using the registry editor in Windows or an equivalent tool for Mono. When this is done, this version of the client will be able to access GitHub API download URLs with that auth token.
Many thanks to @dbent for wise feedback and ideas that guided these changes in the comments of #2210.