Skip to content

Commit

Permalink
Deduplicate urls parsed to reduce crawl requests
Browse files Browse the repository at this point in the history
bazelbuild#14700 added couple more URLs to fetch JDK package and seems to be causing some infrastructure as discussed in bazelbuild#14700.

This patch workaround the issue by removing the duplicated URLs and reduce the crawl request.

Closes bazelbuild#14763.

PiperOrigin-RevId: 427464876
  • Loading branch information
niyas-sait committed Feb 11, 2022
1 parent 522efb7 commit 44a8e2b
Showing 1 changed file with 8 additions and 4 deletions.
12 changes: 8 additions & 4 deletions src/test/shell/bazel/verify_workspace.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,18 @@ function test_verify_urls() {
# Find url-shaped lines, skipping jekyll-tree (which isn't a valid URL), and
# skipping comments.
invalid_urls=()
checked_urls=()
for file in "${WORKSPACE_FILES[@]}"; do
for url in $(grep -E '"https://|http://' "${file}" | \
sed -e '/jekyll-tree/d' -e '/^#/d' -r -e 's#^.*"(https?://[^"]+)".*$#\1#g' | \
sort -u); do
#echo "Checking ${url}"
if ! curl --head -silent --fail --output /dev/null --retry 3 "${url}"; then
#fail "URL ${url} is invalid."
invalid_urls+=("${url}")
# add only unique url to the array
if [[ ${#checked_urls[@]} == 0 ]] || [[ ! " ${checked_urls[@]} " =~ " ${url} " ]]; then
checked_urls+=("${url}")
# echo "Checking ${url} ..."
if ! curl --head --silent --show-error --fail --output /dev/null --retry 3 "${url}"; then
invalid_urls+=("${url}")
fi
fi
done
done
Expand Down

0 comments on commit 44a8e2b

Please sign in to comment.