Skip to content

Commit

Permalink
Merge branch 'main' into v5
Browse files Browse the repository at this point in the history
  • Loading branch information
gjtorikian committed Oct 8, 2022
2 parents 2d22272 + 6f8a324 commit 4e7a26b
Show file tree
Hide file tree
Showing 6 changed files with 139 additions and 16 deletions.
83 changes: 82 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,88 @@

## [Unreleased](https://github.com/gjtorikian/html-proofer/tree/HEAD)

[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.2.0...HEAD)
[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.4.2...HEAD)

**Merged pull requests:**

- Revert "Validate options" [\#774](https://github.com/gjtorikian/html-proofer/pull/774) ([gjtorikian](https://github.com/gjtorikian))

## [v4.4.2](https://github.com/gjtorikian/html-proofer/tree/v4.4.2) (2022-10-07)

[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.4.1...v4.4.2)

**Closed issues:**

- `erstiebegrüßung.html` causing problems on macOS [\#771](https://github.com/gjtorikian/html-proofer/issues/771)
- HTMLProofer times out [\#768](https://github.com/gjtorikian/html-proofer/issues/768)

**Merged pull requests:**

- Create erstiebegrüßung.html from code [\#772](https://github.com/gjtorikian/html-proofer/pull/772) ([asbjornu](https://github.com/asbjornu))
- Validate options [\#767](https://github.com/gjtorikian/html-proofer/pull/767) ([asbjornu](https://github.com/asbjornu))

## [v4.4.1](https://github.com/gjtorikian/html-proofer/tree/v4.4.1) (2022-09-25)

[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.4.0...v4.4.1)

**Closed issues:**

- Custom `Checker` class is not executed [\#764](https://github.com/gjtorikian/html-proofer/issues/764)
- `--cache` unvailable in CLI [\#763](https://github.com/gjtorikian/html-proofer/issues/763)
- `--parallel` unavailable in CLI [\#762](https://github.com/gjtorikian/html-proofer/issues/762)
- HTMLproofer does not properly ignore links [\#756](https://github.com/gjtorikian/html-proofer/issues/756)
- Mailto check failed in some cases [\#754](https://github.com/gjtorikian/html-proofer/issues/754)

**Merged pull requests:**

- Optimize checking internal link hashes in target files [\#770](https://github.com/gjtorikian/html-proofer/pull/770) ([riccardoporreca](https://github.com/riccardoporreca))
- Fix `--swap-attributes` CLI argument in README [\#765](https://github.com/gjtorikian/html-proofer/pull/765) ([mark-monteiro](https://github.com/mark-monteiro))
- Fix and improve swap\_attribute README example [\#755](https://github.com/gjtorikian/html-proofer/pull/755) ([riccardoporreca](https://github.com/riccardoporreca))

## [v4.4.0](https://github.com/gjtorikian/html-proofer/tree/v4.4.0) (2022-08-13)

[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.3.2...v4.4.0)

**Closed issues:**

- `--assume_extension` unexpected behaviour [\#751](https://github.com/gjtorikian/html-proofer/issues/751)
- Protocol-relative \(no `http(s):`\) URL issue: Script cache issue and anti-pattern consideration [\#750](https://github.com/gjtorikian/html-proofer/issues/750)
- Questions on command-line options in 4.x [\#749](https://github.com/gjtorikian/html-proofer/issues/749)

**Merged pull requests:**

- Fail on protocol-relative urls [\#752](https://github.com/gjtorikian/html-proofer/pull/752) ([riccardoporreca](https://github.com/riccardoporreca))

## [v4.3.2](https://github.com/gjtorikian/html-proofer/tree/v4.3.2) (2022-08-03)

[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.3.1...v4.3.2)

**Closed issues:**

- 4.3.1 Ignoring options [\#748](https://github.com/gjtorikian/html-proofer/issues/748)
- Link checker triggered for href="" [\#746](https://github.com/gjtorikian/html-proofer/issues/746)
- Passing RegExp to `--ignore-url` broken after v4.2.0 release [\#745](https://github.com/gjtorikian/html-proofer/issues/745)

**Merged pull requests:**

- Switch 'source' elements to use image check code path [\#747](https://github.com/gjtorikian/html-proofer/pull/747) ([fallax](https://github.com/fallax))

## [v4.3.1](https://github.com/gjtorikian/html-proofer/tree/v4.3.1) (2022-07-29)

[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.3.0...v4.3.1)

**Closed issues:**

- ignore\_files no longer works with regex [\#743](https://github.com/gjtorikian/html-proofer/issues/743)
- Empty mailto is generating undesired errors. [\#742](https://github.com/gjtorikian/html-proofer/issues/742)

**Merged pull requests:**

- Srcsets - better handling of multiple srcsets [\#744](https://github.com/gjtorikian/html-proofer/pull/744) ([fallax](https://github.com/fallax))

## [v4.3.0](https://github.com/gjtorikian/html-proofer/tree/v4.3.0) (2022-07-26)

[Full Changelog](https://github.com/gjtorikian/html-proofer/compare/v4.2.0...v4.3.0)

**Closed issues:**

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ values. The escape sequences `\:` should be used to produce literal
htmlproofer --swap-urls "wow:cow,mow:doh" --extensions .html.erb --ignore-urls www.github.com ./out
```

Some configuration options--such as `--typheous`, `--cache`, or `--attribute-swap`--require well-formatted JSON.
Some configuration options, such as `--typheous`, `--cache`, or `--swap-attributes`, require well-formatted JSON.

#### Adjusting for a `baseurl`

Expand Down
66 changes: 54 additions & 12 deletions lib/html_proofer/url_validator/internal.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,22 +22,39 @@ def validate
end

def run_internal_link_checker(links)
# collect urls and metadata for hashes to be checked in the same target file
file_paths_hashes_to_check = {}
to_add = []
links.each_pair do |link, matched_files|
links.each_with_index do |(link, matched_files), i|
matched_count_to_log = pluralize(matched_files.count, "reference", "references")
@logger.log(:debug, "(#{i + 1} / #{links.count}) Internal link #{link}: Checking #{matched_count_to_log}")
matched_files.each do |metadata|
url = HTMLProofer::Attribute::Url.new(@runner, link, base_url: metadata[:base_url])

@runner.current_source = metadata[:source]
@runner.current_filename = metadata[:filename]

unless file_exists?(url)
target_file_path = url.absolute_path
unless file_exists?(target_file_path)
@failed_checks << Failure.new(@runner.current_filename, "Links > Internal",
"internally linking to #{url}, which does not exist", line: metadata[:line], status: nil, content: nil)
to_add << [url, metadata, false]
next
end

unless hash_exists?(url)
hash_exists = hash_exists_for_url?(url)
if hash_exists.nil?
# the hash needs to be checked in the target file, we collect the url and metadata
unless file_paths_hashes_to_check.key?(target_file_path)
file_paths_hashes_to_check[target_file_path] = {}
end
unless file_paths_hashes_to_check[target_file_path].key?(url.hash)
file_paths_hashes_to_check[target_file_path][url.hash] = []
end
file_paths_hashes_to_check[target_file_path][url.hash] << [url, metadata]
next
end
unless hash_exists
@failed_checks << Failure.new(@runner.current_filename, "Links > Internal",
"internally linking to #{url}; the file exists, but the hash '#{url.hash}' does not", line: metadata[:line], status: nil, content: nil)
to_add << [url, metadata, false]
Expand All @@ -48,6 +65,24 @@ def run_internal_link_checker(links)
end
end

# check hashes by target file
@logger.log(:info, "Checking internal link hashes in #{pluralize(file_paths_hashes_to_check.count, "file", "files")}")
file_paths_hashes_to_check.each_with_index do |(file_path, hashes_to_check), i|
hash_count_to_log = pluralize(hashes_to_check.count, "hash", "hashes")
@logger.log(:debug, "(#{i + 1} / #{file_paths_hashes_to_check.count}) Checking #{hash_count_to_log} in #{file_path}")
html = create_nokogiri(file_path)
hashes_to_check.each_pair do |href_hash, url_metadata|
exists = hash_exists_in_html?(href_hash, html)
url_metadata.each do |(url, metadata)|
unless exists
@failed_checks << Failure.new(metadata[:filename], "Links > Internal",
"internally linking to #{url}; the file exists, but the hash '#{href_hash}' does not", line: metadata[:line], status: nil, content: nil)
end
to_add << [url, metadata, exists]
end
end
end

# adding directly to the cache above results in an endless loop
to_add.each do |(url, metadata, exists)|
@cache.add_internal(url.to_s, metadata, exists)
Expand All @@ -56,15 +91,15 @@ def run_internal_link_checker(links)
@failed_checks
end

private def file_exists?(url)
absolute_path = url.absolute_path
return @runner.checked_paths[url.absolute_path] if @runner.checked_paths.key?(absolute_path)
private def file_exists?(absolute_path)
return @runner.checked_paths[absolute_path] if @runner.checked_paths.key?(absolute_path)

@runner.checked_paths[url.absolute_path] = File.exist?(absolute_path)
@runner.checked_paths[absolute_path] = File.exist?(absolute_path)
end

# verify the target hash
private def hash_exists?(url)
# verify the hash w/o just based on the URL, w/o looking at the target file
# => returns nil if the has could not be verified
private def hash_exists_for_url?(url)
href_hash = url.hash
return true if blank?(href_hash)
return true unless @runner.options[:check_internal_hash]
Expand All @@ -76,10 +111,18 @@ def run_internal_link_checker(links)
decoded_href_hash = Addressable::URI.unescape(href_hash)
fragment_ids = [href_hash, decoded_href_hash]
# https://www.w3.org/TR/html5/single-page.html#scroll-to-fragid
fragment_ids.include?("top") || !find_fragments(fragment_ids, url).empty?
return true if fragment_ids.include?("top")

nil
end

private def hash_exists_in_html?(href_hash, html)
decoded_href_hash = Addressable::URI.unescape(href_hash)
fragment_ids = [href_hash, decoded_href_hash]
!find_fragments(fragment_ids, html).empty?
end

private def find_fragments(fragment_ids, url)
private def find_fragments(fragment_ids, html)
xpaths = fragment_ids.uniq.flat_map do |frag_id|
escaped_frag_id = "'#{frag_id.split("'").join("', \"'\", '")}', ''"
[
Expand All @@ -89,7 +132,6 @@ def run_internal_link_checker(links)
end
xpaths << XpathFunctions.new

html = create_nokogiri(url.absolute_path)
html.xpath(*xpaths)
end
end
Expand Down
2 changes: 1 addition & 1 deletion lib/html_proofer/version.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# frozen_string_literal: true

module HTMLProofer
VERSION = "4.4.0"
VERSION = "4.4.3"
end
2 changes: 1 addition & 1 deletion script/changelog
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/sh

CHANGELOG_GITHUB_TOKEN="$PUBLIC_GITHUB_TOKEN" github_changelog_generator -u gjtorikian -p html-proofer
CHANGELOG_GITHUB_TOKEN="$GITHUB_CHANGELOG_TOKEN" github_changelog_generator -u gjtorikian -p html-proofer
Empty file.

0 comments on commit 4e7a26b

Please sign in to comment.