Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not resuming images downloading after errored out #15

Closed
rickyzhangca opened this issue Jul 15, 2022 · 3 comments · Fixed by #88
Closed

Not resuming images downloading after errored out #15

rickyzhangca opened this issue Jul 15, 2022 · 3 comments · Fixed by #88
Assignees
Labels
bug Something isn't working

Comments

@rickyzhangca
Copy link

I am trying to create a dump for a Fandom wiki. The download errored out after a while. When I attempt to resume the downloading, the script downloads the images from the beginning, saying no image dump was found.

Error:

...
    Downloaded 6500 images
    Downloaded 6510 images
    Downloaded 6520 images
    Downloaded 6530 images
    Downloaded 6540 images
HTTP Error 503.
Server error, max retries exceeded.
Please resume the dump later.
https://hitman.fandom.com/index.php?title=Special%3AExport&pages=Image%3AHitmanbm12.jpg&action=submit&curonly=1&limit=1

Resuming with:

dumpgenerator https://hitman.fandom.com/wiki/Apex_Predator --xml --curonly --images --resume --path=C:/Users/[my name]/Downloads/wikiteam3-python3/hitmanfandomcom-20220714-wikidump

Gives

...
Analysing https://hitman.fandom.com/api.php
Loading config file...
Resuming previous dump process...
Title list was completed in the previous session
XML dump was completed in the previous session
Image list was completed in the previous session
0 images were found in the directory from a previous session
Retrieving images from "start"
...

The images from the session does exist in folder.
image

@elsiehupp
Copy link
Member

Problems resuming past dumps are unfortunately a little bit of a known issue (though we thought we had fixed it).

The main workaround at this point would probably be re-running the dump from the start with the parameter --delay=0.5 (a 0.5-second delay between calls—though you can choose a different value—in order to avoid getting timed out).

In the meantime, apologies for the inconvenience, and thank you for bringing this to our attention!

@elsiehupp elsiehupp added the bug Something isn't working label Jul 15, 2022
@rickyzhangca
Copy link
Author

no worries! I just decided to bring it up in case it signals some other underlying issues.

@elsiehupp
Copy link
Member

I changed the default delay from 0s to 0.5s, which should help mitigate the problem for other users. With regard to fixing the underlying bug, though, I have a half-complete drastic rewrite of the entire project, and that has its own issues, lol.

@yzqzss yzqzss self-assigned this Jan 14, 2023
@yzqzss yzqzss moved this to In Progress in MediaWiki Scraper Development Jan 15, 2023
robkam pushed a commit that referenced this issue Jan 15, 2023
---

- Introduce `sha1File`
- save more metadata(`size`, `sha1`) into `images.txt`
- feat: better file dump:
  - validate image's size and sha1
  - show progress
  - better resume
    > Improved the resume mechanism. (fix: #15)
First check whether the `file` and `file.desc` exist, and then check
whether the `size` and `sha1` of the file correspond to `images.txt`. If
any check fails, the file and the .desc of the file will be downloaded
again. If all pass, the download of this file is skipped.
You can even delete random pictures and .desc files and try to resume
again.
  - pre-work for incremental image dump
  - remove `start` param from `generatorImageDump()`
> the images resume mechanism has changed. we don't need `start` for
resuming anymore.
- other minor improvements
@github-project-automation github-project-automation bot moved this from In Progress to Done in MediaWiki Scraper Development Jan 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
3 participants