Skip to content

Commit

Permalink
Refactor: condense Tokenizer#tokenize_urls!
Browse files Browse the repository at this point in the history
- Extracted `maybe_parse_url` to encapsulate that Strings matched by
  gsub might not in fact be valid URls.
- Condensed the `var = (uri part).to_s; var.tr!()` logic required due to
  `String#tr!` not returning `self` in case of a no-op to put it in a
  `tap {}` block instead. I'm not in love with the solution, but it's
  a minor improvement over the previous one. One line now matches to one
  part of the url.
  • Loading branch information
Narnach committed Feb 28, 2022
1 parent 7fd1ad7 commit c48148e
Showing 1 changed file with 18 additions and 11 deletions.
29 changes: 18 additions & 11 deletions lib/groupie/tokenizer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -50,17 +50,13 @@ def strip_html_tags!
# Intelligently split URLs into their component parts
def tokenize_urls!
@raw.gsub!(%r{http[\w\-\#:/_.?&=]+}) do |url|
uri = URI.parse(url)
rescue URI::InvalidURIError
url
else
path = uri.path.to_s
path.tr!('/_\-', ' ')
query = uri.query.to_s
query.tr!('?=&#_\-', ' ')
fragment = uri.fragment.to_s
fragment.tr!('#_/\-', ' ')
"#{uri.scheme} #{uri.host} #{path} #{query} #{fragment}"
maybe_parse_url(url) do |uri|
path = uri.path.tap { |str| str&.tr!('/_\-', ' ') }
query = uri.query.tap { |str| str&.tr!('?=&#_\-', ' ') }
fragment = uri.fragment.tap { |str| str&.tr!('#_/\-', ' ') }

"#{uri.scheme} #{uri.host} #{path} #{query} #{fragment}"
end
end
end

Expand All @@ -74,5 +70,16 @@ def remove_interpunction!(str)
str.gsub!(/\A['"]+|[!,."']+\Z/, '')
str
end

# Sometimes a String looks like a URL, but it's not.
# This method attempts to parse the input string into a URI.
# If it's successful, yield it to the block and return its response.
# In case of failure, return the original string.
def maybe_parse_url(input)
uri = URI.parse(input)
yield uri
rescue URI::InvalidURIError
input
end
end
end

0 comments on commit c48148e

Please sign in to comment.