-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More reliable favicon retrieval. #652
Conversation
b495b27
to
789b692
Compare
Merge conflict is because I wrote and tested this against |
pkg/image/favicon.go
Outdated
} | ||
|
||
// Icons are sorted widest first. We currently get the first one (icons[0]). | ||
req, err := http.NewRequestWithContext(ctx, http.MethodGet, icons[0].URL, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will panic with a runtime error if icons
is empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super! thanks.
Previously, the application looked for favicons in `/favicon.ico`. But favicons can be anywhere on a website and the location can be defined in HTML and JSON manifests. Websites can also have multiple favicons. As a result, when a new site was added to stash-box which did not have a `/favicon.ico`, the site was created without a proper favicon. Additionally, since stash-box didn't check the filetype of the data returned by the `/favicon.ico` GET request, it stored various 404 and other redirected html pages. A non-trivial amount of logic is required to discover all favicon locations, check filetypes, and sort them. It therefore makes sense to rely on a third-party package for this job. This patch uses `go.deanishe.net/favicon`.
b70d24e
to
0967925
Compare
force-pushed after rebasing on |
I haven't been able to get this to work reliably. There's a few issues with the icon downloading code which make debugging the issues quite difficult. I think the whole thing needs a bit of improving, but probably out of scope of what you're doing. I tried creating a new site with URL Finally, to the part that is the most relevant to you, the call to |
There's a general absence of logging throughout this codebase even though there is a For now I'll look into the EDIT: There are also two important config options |
Apparently, there's an issue with twitter.com:
|
Previously, an empty file was created at the start of the downloadIcon function and would remain as an empty file even when favicon retreival failed. This empty file would prevent future attempts to download an icon for the same site ID. Now the icon file is created at the end, after everything else has succeeded.
4cd8cfa
to
7834eaf
Compare
Basically, the way twitter.com works is that on the first request it gives you a 302 redirect to With curl you can store the cookie like this:
So that's what's happening. But I'm still not sure how to handle it. |
I think this can be solved by using the WithClient function to specify an |
Cookies are required for certain websites, even for simple requests. See code comment for details.
|
This more closely mirrors the GitHub workflow build process and helps avoid CI build failures on pull requests.
* More reliable favicon retrieval. Previously, the application looked for favicons in `/favicon.ico`. But favicons can be anywhere on a website and the location can be defined in HTML and JSON manifests. Websites can also have multiple favicons. As a result, when a new site was added to stash-box which did not have a `/favicon.ico`, the site was created without a proper favicon. Additionally, since stash-box didn't check the filetype of the data returned by the `/favicon.ico` GET request, it stored various 404 and other redirected html pages. A non-trivial amount of logic is required to discover all favicon locations, check filetypes, and sort them. It therefore makes sense to rely on a third-party package for this job. This patch uses `go.deanishe.net/favicon`. * Check length of icons to avoid runtime error when using icons[0] later. * Specify go.deanishe.net/favicon as direct requirement * favicon: crate favicon file only after retreival has succeded Previously, an empty file was created at the start of the downloadIcon function and would remain as an empty file even when favicon retreival failed. This empty file would prevent future attempts to download an icon for the same site ID. Now the icon file is created at the end, after everything else has succeeded. * retreive the smallest available favicon for sites * use http.client with cookiejar for favicon finder Cookies are required for certain websites, even for simple requests. See code comment for details. * Makefile: include generate in the default make target This more closely mirrors the GitHub workflow build process and helps avoid CI build failures on pull requests.
Previously, the application looked for favicons in
/favicon.ico
. But favicons can be anywhere on a website and the location can be defined in HTML and JSON manifests. Websites can also have multiple favicons.As a result, when a new site was added to stash-box which did not have a
/favicon.ico
, the site was created without a proper favicon. Additionally, since stash-box didn't check the filetype of the data returned by the/favicon.ico
GET request, it stored various 404 and other redirected html pages.A non-trivial amount of logic is required to discover all favicon locations, check filetypes, and sort them. It therefore makes sense to rely on a third-party package for this job. This patch uses
go.deanishe.net/favicon
.