Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bunkr] Intermittent ValueError on redirects #6790

Closed
unrealtournament opened this issue Jan 8, 2025 · 4 comments
Closed

[bunkr] Intermittent ValueError on redirects #6790

unrealtournament opened this issue Jan 8, 2025 · 4 comments
Labels

Comments

@unrealtournament
Copy link

unrealtournament commented Jan 8, 2025

$ gallery-dl https://bunkr.cr/a/4oCihUYU -v

[gallery-dl][debug] Version 1.28.4-dev
[gallery-dl][debug] Python 3.11.2 - Linux-6.1.0-28-amd64-x86_64-with-glibc2.36
[gallery-dl][debug] requests 2.32.3 - urllib3 2.2.3
[gallery-dl][debug] Configuration Files ['${HOME}/.gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'https://bunkr.cr/a/4oCihUYU'
[bunkr][debug] Using BunkrAlbumExtractor for 'https://bunkr.cr/a/4oCihUYU'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bunkr.cr:443
[urllib3.connectionpool][debug] https://bunkr.cr:443 "GET /a/4oCihUYU HTTP/11" 200 None
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bunkrrr.org:443
[urllib3.connectionpool][debug] https://bunkrrr.org:443 "GET /v/rEeTUL8MXR17A HTTP/11" 307 68
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bunkr.ph:443
[urllib3.connectionpool][debug] https://bunkr.ph:443 "GET /v/rEeTUL8MXR17A HTTP/11" 301 51
[bunkr][error] ValueError: substring not found
[bunkr][debug]
Traceback (most recent call last):
  File "/home/user/gallery-dl-env/lib/python3.11/site-packages/gallery_dl/extractor/bunkr.py", line 135, in _extract_files
    file = self._extract_file(text.unescape(url))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/gallery-dl-env/lib/python3.11/site-packages/gallery_dl/extractor/bunkr.py", line 151, in _extract_file
    response = self.request(webpage_url)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/gallery-dl-env/lib/python3.11/site-packages/gallery_dl/extractor/bunkr.py", line 84, in request
    root, path = self._split(url)
                 ^^^^^^^^^^^^^^^^
  File "/home/user/gallery-dl-env/lib/python3.11/site-packages/gallery_dl/extractor/bunkr.py", line 175, in _split
    pos = url.index("/", 8)
          ^^^^^^^^^^^^^^^^^
ValueError: substring not found

The issue is that there seems to be an extra 301 redirect sometimes, and the value of the Location header, and thus the url variable here is sometimes in the form /f/rEeTUL8MXR17A instead of https://bunkr.ph/v/rEeTUL8MXR17A, hence causing the retrieval of the index of the character / starting at position 8 to fail.

It only happens intermittently (running the same command a couple times for the same album will eventually retrieve the right value), but it's probably better and more robust to ignore the protocol by checking explicitly for http:// or https://, instead of doing it by starting at position 8 in the string.

Also since the Location values in the form /f/rEeTUL8MXR17A do not contain the root domain, it would have to be retrieved from somewhere else (just trying the current domain in the request URL should work), as currently it's just the first element of the split.

@unrealtournament
Copy link
Author

I wrote a (very) dirty patch to fix it in the meantime since it only happens specifically with 301 codes, but IMO this should be handled properly by checking the value of the Location header instead of relying on the response code:

@@ -13,6 +13,8 @@
 from .. import text, config, exception
 import random

+from urllib.parse import urlparse
+
 if config.get(("extractor", "bunkr"), "tlds"):
     BASE_PATTERN = (
         r"(?:bunkr:(?:https?://)?([^/?#]+)|"
@@ -78,6 +80,12 @@
                 if response.status_code < 300:
                     return response

+                if response.status_code == 301:
+                    urlp = urlparse(url)
+                    root = f"https://{urlp.netloc}"
+                    url = root + response.headers["Location"]
+                    continue
+
                 # redirect
                 url = response.headers["Location"]
                 root, path = self._split(url)

@mikf mikf added the site:bug label Jan 9, 2025
@mikf
Copy link
Owner

mikf commented Jan 9, 2025

Thank you for the detailed error report and the reproducible example URL for bunkr.

This patch will also fix it:

diff --git a/gallery_dl/extractor/bunkr.py b/gallery_dl/extractor/bunkr.py
index 3e124525..7b8294cc 100644
--- a/gallery_dl/extractor/bunkr.py
+++ b/gallery_dl/extractor/bunkr.py
@@ -80,6 +80,9 @@ class BunkrAlbumExtractor(LolisafeAlbumExtractor):
 
                 # redirect
                 url = response.headers["Location"]
+                if url[0] == "/":
+                    url = text.root_from_url(response.url) + url
+                    continue
                 root, path = self._split(url)
                 if root not in CF_DOMAINS:
                     continue

@marlekal
Copy link

marlekal commented Jan 9, 2025

it still fails with albums

gallery-dl https://bunkr.cr/a/jQWEsdwb -v
[gallery-dl][debug] Version 1.28.3
[gallery-dl][debug] Python 3.11.8 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 2.2.0
[gallery-dl][debug] Configuration Files ['%USERPROFILE%\\gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'https://bunkr.cr/a/jQWEsdwb'
[bunkr][debug] Using BunkrAlbumExtractor for 'https://bunkr.cr/a/jQWEsdwb'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bunkr.cr:443
[urllib3.connectionpool][debug] https://bunkr.cr:443 "GET /a/jQWEsdwb HTTP/1.1" 200 None

@mikf
Copy link
Owner

mikf commented Jan 10, 2025

The patch from #6790 (comment) hasn't been pushed to GitHub yet, let alone been included in a release.

edit: Same issue as #6798
edit2: Fixed in af9c06f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants