-
-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup, Redux #24
Speedup, Redux #24
Conversation
Also add option to disable external link checks
Hey here's a funny bug: sometimes HEAD requests return a 404, even on legitimate pages! The bug seems to affect curl; try `curl -I -X HEAD https://help.github.com/changing-author-info`. So in these cases, try a regular `GET`. If it fails, it fails.
Does this affect anchors in external pages?
Perhaps there's a flag to follow redirects? |
@options.delete opt | ||
end | ||
|
||
external_urls.each_pair do |href, filenames| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't have to be in this pull request, but may make sense to break some things out into methods rather than having a single long, functional run method. OO and all that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbalter 👍 ^^
A
I assume since the underlying page is okay, the anchor is not checked. Probably similar to how the browser just goes "yeah here's a page, I don't know where the anchor is."
Yeah, that's the
|
Proofer now supports iterating over single files, so make sure we’re using the file’s dirname if it’s not actually a directory
Want to use HTML Proofer with your Jekyll site? Awesome. Simply add `gem 'html-proofer'` to your `Gemfile` as described above, and add the following to your `Rakefile`, using `rake test` to execute: | ||
Want to use HTML Proofer with your Jekyll site? Awesome. Simply add `gem 'html-proofer'` | ||
to your `Gemfile` as described above, and add the following to your `Rakefile`, | ||
using `rake test` to execute: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you thought about creating these tasks yourself? Then I could just do:
require 'html/proofer/tasks'
HTML::Proofer::Tasks.new(:jekyll, :test)
Which defines a new jekyll
test and assigns it to the task name test
?
This looks AWESOME! Great work. :) |
|
||
``` ruby | ||
HTML::Proofer.new("out/", {:ext => ".htm", :verbose = > true, :ssl_verifyhost => 2 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may consider separating the Typhoeus options, so that they don't conflict with your own. Something like
HTML::Proofer.new("out/", {:ext => ".htm", :typhoeus => {:verbose = > true, :ssl_verifyhost => 2 }})
@failed_tests << "#{filenames.join(' ').blue}: External link #{href} failed: got a time out" | ||
# hey here's a funny bug: sometimes HEAD requests return a 404, even on legitimate pages! The | ||
# bug seems to affect curl; try `curl -I -X HEAD https://help.github.com/changing-author-info` | ||
# so in these cases, try a regular `GET`. if it fails, it fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this is a problem w/ the server not honoring HEAD requests, so isn't curl-specific.
RestClient.head('https://help.github.com/changing-author-info')
#=> RestClient::ResourceNotFound: 404 Resource Not Found
I think I prefer the way the assertions were done in the tests before. It's great to add some "integration" tests to check that it outputs what you expect, but I wouldn't run it that way for every test. |
This branch introduces a few significant changes and optimizations:
HTML::Proofer.new
disable_external
, which won't trigger these tests (in case you only want to run them on a CI or something)HEAD
request. This is way, way faster, since unlikeGET
, we don't download the page contents, nor do we care for them. AHEAD
request is good enough for checking a URL's existence.Unfortunately, using
HEAD
introduced some weird bug fromcurl
(I'm on version 7.30). For an example, run this command from the terminal:You'll get back a
404
, even though the browser sees it as a301
, then a200
. This seems to affect a wide variety of sites.The workaround is to execute a
GET
request when you see a404
. If this is a failure, then the page really does not exist.Although you're under no obligation to, since a lot of the code has changed, it'd be nice for some 👀. /cc @benbalter @afeld @parkr