Skip to content

Commit

Permalink
Merge branch 'master' into html5
Browse files Browse the repository at this point in the history
  • Loading branch information
fulldecent authored Oct 7, 2019
2 parents 78b3e81 + 63663a4 commit 0d660ba
Show file tree
Hide file tree
Showing 68 changed files with 10,414 additions and 86 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,4 @@ doc/
.DS_Store
.idea
.byebug_history
.vscode
5 changes: 4 additions & 1 deletion .rubocop.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
inherit_gem:
rubocop-github:
rubocop-standard:
- config/default.yml

Style/StringLiterals:
Expand All @@ -8,3 +8,6 @@ Style/StringLiterals:

RequireParentheses:
Enabled: true

Naming/FileName:
Enabled: false
3 changes: 2 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ rvm:
- 2.3.6
- 2.4.3
- 2.5.0
- 2.6.0

git:
depth: 10
Expand All @@ -17,4 +18,4 @@ cache: bundler
matrix:
include:
- script: bundle exec rake rubocop
rvm: 2.5.0
rvm: 2.6.0
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# frozen_string_literal: true
source 'https://rubygems.org'

gem 'nokogumbo', git: 'https://github.com/rubys/nokogumbo'
Expand Down
44 changes: 42 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,14 @@ Below is mostly comprehensive list of checks that HTMLProofer can perform.

## Usage

You can configure HTMLProofer to run on a file, a directory, an array of directories, or an array of links.
You can configure HTMLProofer to run on:

* a file
* a directory
* an array of directories
* an array of links

It can also run through the command-line, Docker, or as Rack middleware.

### Using in a script

Expand Down Expand Up @@ -141,7 +148,7 @@ HTMLProofer.check_directories(['./one', './two']).run
With `check_links`, you can also pass in an array of links:

``` ruby
HTMLProofer.check_links(['http://github.com', 'http://jekyllrb.com'])
HTMLProofer.check_links(['http://github.com', 'http://jekyllrb.com']).run
```

This configures Proofer to just test those links to ensure they are valid. Note that for the command-line, you'll need to pass a special `--as-links` argument:
Expand Down Expand Up @@ -217,6 +224,19 @@ htmlproofer --assume-extension ./_site

If you have trouble with (or don't want to) install Ruby/Nokogumbo, the command-line tool can be run through Docker. See [html-proofer-docker](https://github.com/18F/html-proofer-docker) for more information.

### Using as Rack middleware

You can run html-proofer as part of your Rack middleware to validate your HTML at runtime. For example, in Rails, add these lines to `config/application.rb`:

```ruby
config.middleware.use HTMLProofer::Middleware if Rails.env.test?
config.middleware.use HTMLProofer::Middleware if Rails.env.development?
```

This will raise an error at runtime if your HTML is invalid. You can choose to skip validation of a page by adding `?proofer-ignore` to the URL.

This is particularly helpful for projects which have extensive CI, since any invalid HTML will fail your build.

## Ignoring content

Add the `data-proofer-ignore` attribute to any tag to ignore it from every check.
Expand All @@ -232,6 +252,23 @@ This can also apply to parent elements, all the way up to the `<html>` tag:
<a href="http://notareallink">Not checked because of parent.</a>
</div>
```

## Ignoring new files

Say you've got some new files in a pull request, and your tests are failing because links to those files are not live yet. One thing you can do is run a diff against your base branch and explicitly ignore the new files, like this:

```ruby
directories = %w(content)
merge_base = `git merge-base origin/production HEAD`.chomp
diffable_files = `git diff -z --name-only --diff-filter=AC #{merge_base}`.split("\0")
diffable_files = diffable_files.select do |filename|
next true if directories.include?(File.dirname(filename))
filename.end_with?('.md')
end.map { |f| Regexp.new(File.basename(f, File.extname(f))) }

HTMLProofer.check_directory('./output', { url_ignore: diffable_files }).run
```

## Configuration

The `HTMLProofer` constructor takes an optional hash of additional options:
Expand Down Expand Up @@ -260,6 +297,7 @@ The `HTMLProofer` constructor takes an optional hash of additional options:
| `internal_domains`| An array of Strings containing domains that will be treated as internal urls. | `[]` |
| `log_level` | Sets the logging level, as determined by [Yell](https://github.com/rudionrails/yell). One of `:debug`, `:info`, `:warn`, `:error`, or `:fatal`. | `:info`
| `only_4xx` | Only reports errors for links that fall within the 4xx status code range. | `false` |
| `typhoeus_config` | A JSON-formatted string. Parsed using `JSON.parse` and mapped on top of the default configuration values so that they can be overridden. | `{}` |
| `url_ignore` | An array of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored. | `[]` |
| `url_swap` | A hash containing key-value pairs of `RegExp => String`. It transforms URLs that match `RegExp` into `String` via `gsub`. | `{}` |
| `verbose` | If `true`, outputs extra information as the checking happens. Useful for debugging. **Will be deprecated in a future release.**| `false` |
Expand Down Expand Up @@ -413,6 +451,8 @@ class MailToOctocat < ::HTMLProofer::Check
end
```

See our [list of third-party custom classes](https://github.com/gjtorikian/html-proofer/wiki/Extensions-(custom-classes)) and add your own to this list.

## Troubleshooting

Here are some brief snippets identifying some common problems that you can work around. For more information, check out [our wiki](https://github.com/gjtorikian/html-proofer/wiki).
Expand Down
6 changes: 5 additions & 1 deletion Rakefile
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# frozen_string_literal: true

require 'bundler'
Bundler::GemHelper.install_tasks

Expand All @@ -23,6 +25,8 @@ task :proof_readme do
mkdir_p 'out'
File.write('out/README.html', html)

opts = { url_ignore: [/badge.fury.io/] }
opts = {
url_ignore: [/badge.fury.io/, /codecov.io/]
}
HTMLProofer.check_directory('./out', opts).run
end
46 changes: 29 additions & 17 deletions bin/htmlproofer
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
#!/usr/bin/env ruby
# frozen_string_literal: true

STDOUT.sync = true

$LOAD_PATH.unshift File.join(File.dirname(__FILE__), '..', 'lib')
Expand All @@ -25,26 +27,27 @@ Mercenary.program(:htmlproofer) do |p|
p.option 'check_img_http', '--check-img-http', 'Fails an image if it\'s marked as `http` (default: `false`).'
p.option 'check_opengraph', '--check-opengraph', 'Enables the Open Graph checker (default: `false`).'
p.option 'check_sri', '--check-sri', 'Check that `<link>` and `<script>` external resources do use SRI (default: `false`).'
p.option 'directory_index_file', '--directory-index-file', String, 'Sets the file to look for when a link refers to a directory. (default: `index.html`)'
p.option 'directory_index_file', '--directory-index-file <filename>', String, 'Sets the file to look for when a link refers to a directory. (default: `index.html`)'
p.option 'disable_external', '--disable-external', 'If `true`, does not run the external link checker, which can take a lot of time (default: `false`)'
p.option 'empty_alt_ignore', '--empty-alt-ignore', 'If `true`, ignores images with empty alt tags'
p.option 'error_sort', '--error-sort SORT', 'Defines the sort order for error output. Can be `:path`, `:desc`, or `:status` (default: `:path`).'
p.option 'error_sort', '--error-sort <sort>', String, 'Defines the sort order for error output. Can be `:path`, `:desc`, or `:status` (default: `:path`).'
p.option 'enforce_https', '--enforce-https', 'Fails a link if it\'s not marked as `https` (default: `false`).'
p.option 'extension', '--extension EXT', String, 'The extension of your HTML files including the dot. (default: `.html`)'
p.option 'extension', '--extension <ext>', String, 'The extension of your HTML files including the dot. (default: `.html`)'
p.option 'external_only', '--external_only', 'Only checks problems with external references'
p.option 'file_ignore', '--file-ignore file1,[file2,...]', Array, 'A comma-separated list of Strings or RegExps containing file paths that are safe to ignore'
p.option 'http_status_ignore', '--http-status-ignore 123,[xxx, ...]', Array, 'A comma-separated list of numbers representing status codes to ignore.'
p.option 'report_invalid_tags', '--report-invalid-tags', 'Report `check_html` errors associated with unknown markup (default: `false`)'
p.option 'report_missing_names', '--report-missing-names', 'Report `check_html` errors associated with missing entities (default: `false`)'
p.option 'report_script_embeds', '--report-script-embeds', 'Report `check_html` errors associated with `script`s (default: `false`)'
p.option 'internal_domains', '--internal-domains domain1,[domain2,...]', Array, 'A comma-separated list of Strings containing domains that will be treated as internal urls.'
p.option 'report_invalid_tags', '--report-invalid-tags', 'Ignore `check_html` errors associated with unknown markup (default: `false`)'
p.option 'report_missing_names', '--report-missing-names', 'Ignore `check_html` errors associated with missing entities (default: `false`)'
p.option 'report_script_embeds', '--report-script-embeds', 'Ignore `check_html` errors associated with `script`s (default: `false`)'
p.option 'report_missing_doctype', '--report-missing-doctype', 'Report `check_html` errors associated with missing or out-of-order DOCTYPE (default: `false`)'
p.option 'log_level', '--log-level <level>', String, 'Sets the logging level, as determined by Yell. One of `:debug`, `:info`, `:warn`, `:error`, or `:fatal`. (default: `:info`)'
p.option 'only_4xx', '--only-4xx', 'Only reports errors for links that fall within the 4xx status code range'
p.option 'storage_dir', '--storage-dir PATH', String, 'Directory where to store the cache log (default: "tmp/.htmlproofer")'
p.option 'timeframe', '--timeframe <time>', String, 'A string representing the caching timeframe.'
p.option 'typhoeus_config', '--typhoeus-config CONFIG', String, 'JSON-formatted string of Typhoeus config. Will override the html-proofer defaults.'
p.option 'url_ignore', '--url-ignore link1,[link2,...]', Array, 'A comma-separated list of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored'
p.option 'url_swap', '--url-swap re:string,[re:string,...]', Array, 'A comma-separated list containing key-value pairs of `RegExp => String`. It transforms URLs that match `RegExp` into `String` via `gsub`. The escape sequences `\\:` should be used to produce literal `:`s.'
p.option 'internal_domains', '--internal-domains domain1,[domain2,...]', Array, 'A comma-separated list of Strings containing domains that will be treated as internal urls.'
p.option 'storage_dir', '--storage-dir PATH', String, 'Directory where to store the cache log (default: "tmp/.htmlproofer")'

p.action do |args, opts|
args = ['.'] if args.empty?
Expand Down Expand Up @@ -75,16 +78,25 @@ Mercenary.program(:htmlproofer) do |p|
options[:error_sort] = opts['error-sort'].to_sym unless opts['error-sort'].nil?
options[:log_level] = opts['log_level'].to_sym unless opts['log_level'].nil?

# FIXME: this is gross
options[:validation] = {}
options[:validation][:report_script_embeds] = opts['report_script_embeds']
options[:validation][:report_missing_names] = opts['report_missing_names']
options[:validation][:report_invalid_tags] = opts['report_invalid_tags']
options[:validation][:report_missing_doctype] = opts['report_missing_doctype']
options[:validation] = HTMLProofer::Configuration::VALIDATION_DEFAULTS.dup
options[:validation][:report_script_embeds] = opts['report_script_embeds'] unless opts['report_script_embeds'].nil?
options[:validation][:report_missing_names] = opts['report_missing_names'] unless opts['report_missing_names'].nil?
options[:validation][:report_invalid_tags] = opts['report_invalid_tags'] unless opts['report_invalid_tags'].nil?
options[:validation][:report_missing_doctype] = opts['report_missing_doctype'] unless opts['report_missing_doctype'].nil?

options[:cache] = {}
options[:cache][:timeframe] = opts['timeframe'] unless opts['timeframe'].nil?
options[:cache][:storage_dir] = opts['storage_dir'] unless opts['storage_dir'].nil?
unless opts['typhoeus_config'].nil?
options[:typhoeus] = HTMLProofer::Configuration.parse_json_option('typhoeus_config', opts['typhoeus_config'])
end

unless opts['timeframe'].nil?
options[:cache] ||= {}
options[:cache][:timeframe] = opts['timeframe'] unless opts['timeframe'].nil?
end

unless opts['storage_dir'].nil?
options[:cache] ||= {}
options[:cache][:storage_dir] = opts['storage_dir'] unless opts['storage_dir'].nil?
end

options[:http_status_ignore] = Array(options[:http_status_ignore]).map(&:to_i)

Expand Down
7 changes: 5 additions & 2 deletions html-proofer.gemspec
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# frozen_string_literal: true

$LOAD_PATH.push File.expand_path('../lib', __FILE__)
require 'html-proofer/version'

Expand All @@ -20,15 +22,16 @@ Gem::Specification.new do |gem|
gem.add_dependency 'mercenary', '~> 0.3.2'
gem.add_dependency 'nokogumbo', '>= 2.0.0.alpha', '< 3'
gem.add_dependency 'colorize', '~> 0.8'
gem.add_dependency 'rainbow', '~> 3.0'
gem.add_dependency 'typhoeus', '~> 1.3'
gem.add_dependency 'yell', '~> 2.0'
gem.add_dependency 'parallel', '~> 1.3'
gem.add_dependency 'addressable', '~> 2.3'
gem.add_dependency 'activesupport', '>= 4.2', '< 6.0'

gem.add_development_dependency 'redcarpet'
gem.add_development_dependency 'rubocop'
gem.add_development_dependency 'rubocop-github'
gem.add_development_dependency 'rubocop-standard'
gem.add_development_dependency 'rubocop-performance'
gem.add_development_dependency 'codecov'
gem.add_development_dependency 'rspec', '~> 3.1'
gem.add_development_dependency 'rake'
Expand Down
7 changes: 7 additions & 0 deletions lib/html-proofer.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# frozen_string_literal: true

def require_all(path)
dir = File.join(File.dirname(__FILE__), path)
Dir[File.join(dir, '*.rb')].each do |f|
Expand All @@ -19,13 +21,15 @@ def require_all(path)
module HTMLProofer
def check_file(file, options = {})
raise ArgumentError unless file.is_a?(String)
raise ArgumentError, "#{file} does not exist" unless File.exist?(file)
options[:type] = :file
HTMLProofer::Runner.new(file, options)
end
module_function :check_file

def check_directory(directory, options = {})
raise ArgumentError unless directory.is_a?(String)
raise ArgumentError, "#{directory} does not exist" unless Dir.exist?(directory)
options[:type] = :directory
HTMLProofer::Runner.new([directory], options)
end
Expand All @@ -34,6 +38,9 @@ def check_directory(directory, options = {})
def check_directories(directories, options = {})
raise ArgumentError unless directories.is_a?(Array)
options[:type] = :directory
directories.each do |directory|
raise ArgumentError, "#{directory} does not exist" unless Dir.exist?(directory)
end
HTMLProofer::Runner.new(directories, options)
end
module_function :check_directories
Expand Down
39 changes: 27 additions & 12 deletions lib/html-proofer/cache.rb
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
require_relative 'utils'
# frozen_string_literal: true

require_relative 'utils'
require 'date'
require 'json'
require 'active_support/core_ext/string'
require 'active_support/core_ext/date'
require 'active_support/core_ext/numeric/time'

module HTMLProofer
class Cache
Expand All @@ -18,19 +17,20 @@ def initialize(logger, options)
@logger = logger
@cache_log = {}

@cache_datetime = DateTime.now
@cache_time = @cache_datetime.to_time

if options.nil? || options.empty?
define_singleton_method('use_cache?') { false }
else
define_singleton_method('use_cache?') { true }
setup_cache!(options)
@parsed_timeframe = parsed_timeframe(options[:timeframe])
end

@cache_time = Time.now
end

def within_timeframe?(time)
(@parsed_timeframe..@cache_time).cover?(time)
(@parsed_timeframe..@cache_time).cover?(Time.parse(time))
end

def urls
Expand All @@ -43,16 +43,16 @@ def size

def parsed_timeframe(timeframe)
time, date = timeframe.match(/(\d+)(\D)/).captures
time = time.to_f
time = time.to_i
case date
when 'M'
time.months.ago
time_ago(time, :months)
when 'w'
time.weeks.ago
time_ago(time, :weeks)
when 'd'
time.days.ago
time_ago(time, :days)
when 'h'
time.hours.ago
time_ago(time, :hours)
else
raise ArgumentError, "#{date} is not a valid timeframe!"
end
Expand Down Expand Up @@ -162,5 +162,20 @@ def setup_cache!(options)
contents = File.read(cache_file)
@cache_log = contents.empty? ? {} : JSON.parse(contents)
end

private

def time_ago(measurement, unit)
case unit
when :months
@cache_datetime >> -measurement
when :weeks
@cache_datetime - measurement * 7
when :days
@cache_datetime - measurement
when :hours
@cache_datetime - Rational(measurement/24.0)
end.to_time
end
end
end
2 changes: 2 additions & 0 deletions lib/html-proofer/check.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# frozen_string_literal: true

module HTMLProofer
# Mostly handles issue management and collecting of external URLs.
class Check
Expand Down
4 changes: 3 additions & 1 deletion lib/html-proofer/check/favicon.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# frozen_string_literal: true

class FaviconCheck < ::HTMLProofer::Check
def run
found = false
Expand All @@ -19,7 +21,7 @@ def run

def is_immediate_redirect?
# allow any instant-redirect meta tag
@html.xpath("//meta[@http-equiv='refresh']").attribute('content').value.starts_with? '0;' rescue false
@html.xpath("//meta[@http-equiv='refresh']").attribute('content').value.start_with? '0;' rescue false
end

end
2 changes: 2 additions & 0 deletions lib/html-proofer/check/html.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# frozen_string_literal: true

class HtmlCheck < ::HTMLProofer::Check
# tags embedded in scripts are used in templating languages: http://git.io/vOovv
SCRIPT_EMBEDS_MSG = /Element script embeds close tag/
Expand Down
2 changes: 2 additions & 0 deletions lib/html-proofer/check/images.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# frozen_string_literal: true

class ImageCheck < ::HTMLProofer::Check
SCREEN_SHOT_REGEX = /Screen(?: |%20)Shot(?: |%20)\d+-\d+-\d+(?: |%20)at(?: |%20)\d+.\d+.\d+/

Expand Down
Loading

0 comments on commit 0d660ba

Please sign in to comment.