[bunkr] Intermittent ValueError on redirects #6790

unrealtournament · 2025-01-08T23:03:41Z

$ gallery-dl https://bunkr.cr/a/4oCihUYU -v

[gallery-dl][debug] Version 1.28.4-dev
[gallery-dl][debug] Python 3.11.2 - Linux-6.1.0-28-amd64-x86_64-with-glibc2.36
[gallery-dl][debug] requests 2.32.3 - urllib3 2.2.3
[gallery-dl][debug] Configuration Files ['${HOME}/.gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'https://bunkr.cr/a/4oCihUYU'
[bunkr][debug] Using BunkrAlbumExtractor for 'https://bunkr.cr/a/4oCihUYU'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bunkr.cr:443
[urllib3.connectionpool][debug] https://bunkr.cr:443 "GET /a/4oCihUYU HTTP/11" 200 None
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bunkrrr.org:443
[urllib3.connectionpool][debug] https://bunkrrr.org:443 "GET /v/rEeTUL8MXR17A HTTP/11" 307 68
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bunkr.ph:443
[urllib3.connectionpool][debug] https://bunkr.ph:443 "GET /v/rEeTUL8MXR17A HTTP/11" 301 51
[bunkr][error] ValueError: substring not found
[bunkr][debug]
Traceback (most recent call last):
  File "/home/user/gallery-dl-env/lib/python3.11/site-packages/gallery_dl/extractor/bunkr.py", line 135, in _extract_files
    file = self._extract_file(text.unescape(url))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/gallery-dl-env/lib/python3.11/site-packages/gallery_dl/extractor/bunkr.py", line 151, in _extract_file
    response = self.request(webpage_url)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/gallery-dl-env/lib/python3.11/site-packages/gallery_dl/extractor/bunkr.py", line 84, in request
    root, path = self._split(url)
                 ^^^^^^^^^^^^^^^^
  File "/home/user/gallery-dl-env/lib/python3.11/site-packages/gallery_dl/extractor/bunkr.py", line 175, in _split
    pos = url.index("/", 8)
          ^^^^^^^^^^^^^^^^^
ValueError: substring not found

The issue is that there seems to be an extra 301 redirect sometimes, and the value of the Location header, and thus the url variable here is sometimes in the form /f/rEeTUL8MXR17A instead of https://bunkr.ph/v/rEeTUL8MXR17A, hence causing the retrieval of the index of the character / starting at position 8 to fail.

It only happens intermittently (running the same command a couple times for the same album will eventually retrieve the right value), but it's probably better and more robust to ignore the protocol by checking explicitly for http:// or https://, instead of doing it by starting at position 8 in the string.

Also since the Location values in the form /f/rEeTUL8MXR17A do not contain the root domain, it would have to be retrieved from somewhere else (just trying the current domain in the request URL should work), as currently it's just the first element of the split.

The text was updated successfully, but these errors were encountered:

unrealtournament · 2025-01-09T00:00:45Z

I wrote a (very) dirty patch to fix it in the meantime since it only happens specifically with 301 codes, but IMO this should be handled properly by checking the value of the Location header instead of relying on the response code:

@@ -13,6 +13,8 @@
 from .. import text, config, exception
 import random

+from urllib.parse import urlparse
+
 if config.get(("extractor", "bunkr"), "tlds"):
     BASE_PATTERN = (
         r"(?:bunkr:(?:https?://)?([^/?#]+)|"
@@ -78,6 +80,12 @@
                 if response.status_code < 300:
                     return response

+                if response.status_code == 301:
+                    urlp = urlparse(url)
+                    root = f"https://{urlp.netloc}"
+                    url = root + response.headers["Location"]
+                    continue
+
                 # redirect
                 url = response.headers["Location"]
                 root, path = self._split(url)

mikf · 2025-01-09T09:30:59Z

Thank you for the detailed error report and the reproducible example URL for bunkr.

This patch will also fix it:

diff --git a/gallery_dl/extractor/bunkr.py b/gallery_dl/extractor/bunkr.py
index 3e124525..7b8294cc 100644
--- a/gallery_dl/extractor/bunkr.py
+++ b/gallery_dl/extractor/bunkr.py
@@ -80,6 +80,9 @@ class BunkrAlbumExtractor(LolisafeAlbumExtractor):
 
                 # redirect
                 url = response.headers["Location"]
+                if url[0] == "/":
+                    url = text.root_from_url(response.url) + url
+                    continue
                 root, path = self._split(url)
                 if root not in CF_DOMAINS:
                     continue

marlekal · 2025-01-09T23:18:50Z

it still fails with albums

gallery-dl https://bunkr.cr/a/jQWEsdwb -v
[gallery-dl][debug] Version 1.28.3
[gallery-dl][debug] Python 3.11.8 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 2.2.0
[gallery-dl][debug] Configuration Files ['%USERPROFILE%\\gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'https://bunkr.cr/a/jQWEsdwb'
[bunkr][debug] Using BunkrAlbumExtractor for 'https://bunkr.cr/a/jQWEsdwb'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bunkr.cr:443
[urllib3.connectionpool][debug] https://bunkr.cr:443 "GET /a/jQWEsdwb HTTP/1.1" 200 None

mikf · 2025-01-10T08:47:33Z

~~The patch from #6790 (comment) hasn't been pushed to GitHub yet, let alone been included in a release.~~

edit: Same issue as #6798
edit2: Fixed in af9c06f

mikf added the site:bug label Jan 9, 2025

mikf added a commit that referenced this issue Jan 10, 2025

[bunkr] fix ValueError on relative redirects (#6790)

ba04431

mikf closed this as completed Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bunkr] Intermittent ValueError on redirects #6790

[bunkr] Intermittent ValueError on redirects #6790

unrealtournament commented Jan 8, 2025 •

edited

Loading

unrealtournament commented Jan 9, 2025

mikf commented Jan 9, 2025

marlekal commented Jan 9, 2025

mikf commented Jan 10, 2025 •

edited

Loading

[bunkr] Intermittent ValueError on redirects #6790

[bunkr] Intermittent ValueError on redirects #6790

Comments

unrealtournament commented Jan 8, 2025 • edited Loading

unrealtournament commented Jan 9, 2025

mikf commented Jan 9, 2025

marlekal commented Jan 9, 2025

mikf commented Jan 10, 2025 • edited Loading

unrealtournament commented Jan 8, 2025 •

edited

Loading

mikf commented Jan 10, 2025 •

edited

Loading