From c48148ea2a2b87709f813ab259c988303ffe69f1 Mon Sep 17 00:00:00 2001 From: Wes Oldenbeuving Date: Mon, 28 Feb 2022 10:48:10 +0100 Subject: [PATCH] Refactor: condense `Tokenizer#tokenize_urls!` - Extracted `maybe_parse_url` to encapsulate that Strings matched by gsub might not in fact be valid URls. - Condensed the `var = (uri part).to_s; var.tr!()` logic required due to `String#tr!` not returning `self` in case of a no-op to put it in a `tap {}` block instead. I'm not in love with the solution, but it's a minor improvement over the previous one. One line now matches to one part of the url. --- lib/groupie/tokenizer.rb | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/lib/groupie/tokenizer.rb b/lib/groupie/tokenizer.rb index 77caa87..ba3528c 100644 --- a/lib/groupie/tokenizer.rb +++ b/lib/groupie/tokenizer.rb @@ -50,17 +50,13 @@ def strip_html_tags! # Intelligently split URLs into their component parts def tokenize_urls! @raw.gsub!(%r{http[\w\-\#:/_.?&=]+}) do |url| - uri = URI.parse(url) - rescue URI::InvalidURIError - url - else - path = uri.path.to_s - path.tr!('/_\-', ' ') - query = uri.query.to_s - query.tr!('?=&#_\-', ' ') - fragment = uri.fragment.to_s - fragment.tr!('#_/\-', ' ') - "#{uri.scheme} #{uri.host} #{path} #{query} #{fragment}" + maybe_parse_url(url) do |uri| + path = uri.path.tap { |str| str&.tr!('/_\-', ' ') } + query = uri.query.tap { |str| str&.tr!('?=&#_\-', ' ') } + fragment = uri.fragment.tap { |str| str&.tr!('#_/\-', ' ') } + + "#{uri.scheme} #{uri.host} #{path} #{query} #{fragment}" + end end end @@ -74,5 +70,16 @@ def remove_interpunction!(str) str.gsub!(/\A['"]+|[!,."']+\Z/, '') str end + + # Sometimes a String looks like a URL, but it's not. + # This method attempts to parse the input string into a URI. + # If it's successful, yield it to the block and return its response. + # In case of failure, return the original string. + def maybe_parse_url(input) + uri = URI.parse(input) + yield uri + rescue URI::InvalidURIError + input + end end end