diff --git a/CHANGELOG.md b/CHANGELOG.md index fed94ae4..7b26a6d9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,7 +2,88 @@ ## [Unreleased](https://github.com/gjtorikian/html-proofer/tree/HEAD) -[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.2.0...HEAD) +[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.4.2...HEAD) + +**Merged pull requests:** + +- Revert "Validate options" [\#774](https://github.com/gjtorikian/html-proofer/pull/774) ([gjtorikian](https://github.com/gjtorikian)) + +## [v4.4.2](https://github.com/gjtorikian/html-proofer/tree/v4.4.2) (2022-10-07) + +[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.4.1...v4.4.2) + +**Closed issues:** + +- `erstiebegrüßung.html` causing problems on macOS [\#771](https://github.com/gjtorikian/html-proofer/issues/771) +- HTMLProofer times out [\#768](https://github.com/gjtorikian/html-proofer/issues/768) + +**Merged pull requests:** + +- Create erstiebegrüßung.html from code [\#772](https://github.com/gjtorikian/html-proofer/pull/772) ([asbjornu](https://github.com/asbjornu)) +- Validate options [\#767](https://github.com/gjtorikian/html-proofer/pull/767) ([asbjornu](https://github.com/asbjornu)) + +## [v4.4.1](https://github.com/gjtorikian/html-proofer/tree/v4.4.1) (2022-09-25) + +[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.4.0...v4.4.1) + +**Closed issues:** + +- Custom `Checker` class is not executed [\#764](https://github.com/gjtorikian/html-proofer/issues/764) +- `--cache` unvailable in CLI [\#763](https://github.com/gjtorikian/html-proofer/issues/763) +- `--parallel` unavailable in CLI [\#762](https://github.com/gjtorikian/html-proofer/issues/762) +- HTMLproofer does not properly ignore links [\#756](https://github.com/gjtorikian/html-proofer/issues/756) +- Mailto check failed in some cases [\#754](https://github.com/gjtorikian/html-proofer/issues/754) + +**Merged pull requests:** + +- Optimize checking internal link hashes in target files [\#770](https://github.com/gjtorikian/html-proofer/pull/770) ([riccardoporreca](https://github.com/riccardoporreca)) +- Fix `--swap-attributes` CLI argument in README [\#765](https://github.com/gjtorikian/html-proofer/pull/765) ([mark-monteiro](https://github.com/mark-monteiro)) +- Fix and improve swap\_attribute README example [\#755](https://github.com/gjtorikian/html-proofer/pull/755) ([riccardoporreca](https://github.com/riccardoporreca)) + +## [v4.4.0](https://github.com/gjtorikian/html-proofer/tree/v4.4.0) (2022-08-13) + +[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.3.2...v4.4.0) + +**Closed issues:** + +- `--assume_extension` unexpected behaviour [\#751](https://github.com/gjtorikian/html-proofer/issues/751) +- Protocol-relative \(no `http(s):`\) URL issue: Script cache issue and anti-pattern consideration [\#750](https://github.com/gjtorikian/html-proofer/issues/750) +- Questions on command-line options in 4.x [\#749](https://github.com/gjtorikian/html-proofer/issues/749) + +**Merged pull requests:** + +- Fail on protocol-relative urls [\#752](https://github.com/gjtorikian/html-proofer/pull/752) ([riccardoporreca](https://github.com/riccardoporreca)) + +## [v4.3.2](https://github.com/gjtorikian/html-proofer/tree/v4.3.2) (2022-08-03) + +[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.3.1...v4.3.2) + +**Closed issues:** + +- 4.3.1 Ignoring options [\#748](https://github.com/gjtorikian/html-proofer/issues/748) +- Link checker triggered for href="" [\#746](https://github.com/gjtorikian/html-proofer/issues/746) +- Passing RegExp to `--ignore-url` broken after v4.2.0 release [\#745](https://github.com/gjtorikian/html-proofer/issues/745) + +**Merged pull requests:** + +- Switch 'source' elements to use image check code path [\#747](https://github.com/gjtorikian/html-proofer/pull/747) ([fallax](https://github.com/fallax)) + +## [v4.3.1](https://github.com/gjtorikian/html-proofer/tree/v4.3.1) (2022-07-29) + +[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.3.0...v4.3.1) + +**Closed issues:** + +- ignore\_files no longer works with regex [\#743](https://github.com/gjtorikian/html-proofer/issues/743) +- Empty mailto is generating undesired errors. [\#742](https://github.com/gjtorikian/html-proofer/issues/742) + +**Merged pull requests:** + +- Srcsets - better handling of multiple srcsets [\#744](https://github.com/gjtorikian/html-proofer/pull/744) ([fallax](https://github.com/fallax)) + +## [v4.3.0](https://github.com/gjtorikian/html-proofer/tree/v4.3.0) (2022-07-26) + +[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.2.0...v4.3.0) **Closed issues:** diff --git a/README.md b/README.md index b10d554d..7d0a5f64 100644 --- a/README.md +++ b/README.md @@ -198,7 +198,7 @@ values. The escape sequences `\:` should be used to produce literal htmlproofer --swap-urls "wow:cow,mow:doh" --extensions .html.erb --ignore-urls www.github.com ./out ``` -Some configuration options--such as `--typheous`, `--cache`, or `--attribute-swap`--require well-formatted JSON. +Some configuration options, such as `--typheous`, `--cache`, or `--swap-attributes`, require well-formatted JSON. #### Adjusting for a `baseurl` diff --git a/lib/html_proofer/url_validator/internal.rb b/lib/html_proofer/url_validator/internal.rb index 5d02b87d..4af5cc1d 100644 --- a/lib/html_proofer/url_validator/internal.rb +++ b/lib/html_proofer/url_validator/internal.rb @@ -22,22 +22,39 @@ def validate end def run_internal_link_checker(links) + # collect urls and metadata for hashes to be checked in the same target file + file_paths_hashes_to_check = {} to_add = [] - links.each_pair do |link, matched_files| + links.each_with_index do |(link, matched_files), i| + matched_count_to_log = pluralize(matched_files.count, "reference", "references") + @logger.log(:debug, "(#{i + 1} / #{links.count}) Internal link #{link}: Checking #{matched_count_to_log}") matched_files.each do |metadata| url = HTMLProofer::Attribute::Url.new(@runner, link, base_url: metadata[:base_url]) @runner.current_source = metadata[:source] @runner.current_filename = metadata[:filename] - unless file_exists?(url) + target_file_path = url.absolute_path + unless file_exists?(target_file_path) @failed_checks << Failure.new(@runner.current_filename, "Links > Internal", "internally linking to #{url}, which does not exist", line: metadata[:line], status: nil, content: nil) to_add << [url, metadata, false] next end - unless hash_exists?(url) + hash_exists = hash_exists_for_url?(url) + if hash_exists.nil? + # the hash needs to be checked in the target file, we collect the url and metadata + unless file_paths_hashes_to_check.key?(target_file_path) + file_paths_hashes_to_check[target_file_path] = {} + end + unless file_paths_hashes_to_check[target_file_path].key?(url.hash) + file_paths_hashes_to_check[target_file_path][url.hash] = [] + end + file_paths_hashes_to_check[target_file_path][url.hash] << [url, metadata] + next + end + unless hash_exists @failed_checks << Failure.new(@runner.current_filename, "Links > Internal", "internally linking to #{url}; the file exists, but the hash '#{url.hash}' does not", line: metadata[:line], status: nil, content: nil) to_add << [url, metadata, false] @@ -48,6 +65,24 @@ def run_internal_link_checker(links) end end + # check hashes by target file + @logger.log(:info, "Checking internal link hashes in #{pluralize(file_paths_hashes_to_check.count, "file", "files")}") + file_paths_hashes_to_check.each_with_index do |(file_path, hashes_to_check), i| + hash_count_to_log = pluralize(hashes_to_check.count, "hash", "hashes") + @logger.log(:debug, "(#{i + 1} / #{file_paths_hashes_to_check.count}) Checking #{hash_count_to_log} in #{file_path}") + html = create_nokogiri(file_path) + hashes_to_check.each_pair do |href_hash, url_metadata| + exists = hash_exists_in_html?(href_hash, html) + url_metadata.each do |(url, metadata)| + unless exists + @failed_checks << Failure.new(metadata[:filename], "Links > Internal", + "internally linking to #{url}; the file exists, but the hash '#{href_hash}' does not", line: metadata[:line], status: nil, content: nil) + end + to_add << [url, metadata, exists] + end + end + end + # adding directly to the cache above results in an endless loop to_add.each do |(url, metadata, exists)| @cache.add_internal(url.to_s, metadata, exists) @@ -56,15 +91,15 @@ def run_internal_link_checker(links) @failed_checks end - private def file_exists?(url) - absolute_path = url.absolute_path - return @runner.checked_paths[url.absolute_path] if @runner.checked_paths.key?(absolute_path) + private def file_exists?(absolute_path) + return @runner.checked_paths[absolute_path] if @runner.checked_paths.key?(absolute_path) - @runner.checked_paths[url.absolute_path] = File.exist?(absolute_path) + @runner.checked_paths[absolute_path] = File.exist?(absolute_path) end - # verify the target hash - private def hash_exists?(url) + # verify the hash w/o just based on the URL, w/o looking at the target file + # => returns nil if the has could not be verified + private def hash_exists_for_url?(url) href_hash = url.hash return true if blank?(href_hash) return true unless @runner.options[:check_internal_hash] @@ -76,10 +111,18 @@ def run_internal_link_checker(links) decoded_href_hash = Addressable::URI.unescape(href_hash) fragment_ids = [href_hash, decoded_href_hash] # https://www.w3.org/TR/html5/single-page.html#scroll-to-fragid - fragment_ids.include?("top") || !find_fragments(fragment_ids, url).empty? + return true if fragment_ids.include?("top") + + nil + end + + private def hash_exists_in_html?(href_hash, html) + decoded_href_hash = Addressable::URI.unescape(href_hash) + fragment_ids = [href_hash, decoded_href_hash] + !find_fragments(fragment_ids, html).empty? end - private def find_fragments(fragment_ids, url) + private def find_fragments(fragment_ids, html) xpaths = fragment_ids.uniq.flat_map do |frag_id| escaped_frag_id = "'#{frag_id.split("'").join("', \"'\", '")}', ''" [ @@ -89,7 +132,6 @@ def run_internal_link_checker(links) end xpaths << XpathFunctions.new - html = create_nokogiri(url.absolute_path) html.xpath(*xpaths) end end diff --git a/lib/html_proofer/version.rb b/lib/html_proofer/version.rb index 838da696..1c6561d2 100644 --- a/lib/html_proofer/version.rb +++ b/lib/html_proofer/version.rb @@ -1,5 +1,5 @@ # frozen_string_literal: true module HTMLProofer - VERSION = "4.4.0" + VERSION = "4.4.3" end diff --git a/script/changelog b/script/changelog index 2d13bea2..8814ace5 100755 --- a/script/changelog +++ b/script/changelog @@ -1,3 +1,3 @@ #!/bin/sh -CHANGELOG_GITHUB_TOKEN="$PUBLIC_GITHUB_TOKEN" github_changelog_generator -u gjtorikian -p html-proofer +CHANGELOG_GITHUB_TOKEN="$GITHUB_CHANGELOG_TOKEN" github_changelog_generator -u gjtorikian -p html-proofer diff --git "a/spec/html-proofer/fixtures/links/erstiebegr\303\274\303\237ung.html" "b/spec/html-proofer/fixtures/links/erstiebegr\303\274\303\237ung.html" deleted file mode 100644 index e69de29b..00000000