-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions, Feedback and Suggestions #11
Comments
This is actually a really good idea, especially since I'm very hesitant/lazy about documenting things or writing text in general. edit: The more I think about it, the less satisfied I am with the previous explanation, so here is version 2.
(edit end) If something still doesn't make sense, just tell me and I will try to explain this a bit better. |
Very good to know, thank you. Checked some profiles with I realized what caused the slight confusion (for me): The default format string set by the extractor gets overwritten by the output format defined in What put me a bit off was this: Because pixiv seems to be a bit of a special case here. I think these are called objects in JSON parlance.. Now, if I want to use my own
I put these two definitions into the |
What you have discovered here is true for all extractors, especially those with more than one extractor per module, and not just pixiv. In general the configuration value located the "deepest" inside the dictionary- or object-tree is used. If non is found, the config system falls back to the default value. An example: {
"extractor":
{
"pixiv":
{
"user": { "filename": "A" },
"filename": "B"
},
"deviantart":
{
"image": { "filename": "C"}
},
"filename": "D"
}
} With a configuration file like the one above, the following is going to happen:
Yes, if you have those two definitions at this place, then all pixiv extractors (there are 4 in total) will use these instead of their default format strings. If you want to dig even deeper, take a look at the inner loop of the config.interpolate function. For example for the
This function first searches the top-most level for a value with key |
Okay, got it. Also, found all 4 pixiv extractors ;-) Very nice, and very flexible. Ultimately, every possible variant can be customized. Excellent. Just threw some pixiv URLs at the program, can confirm everything works indeed as described! (Including these multiple images per entry/"work", I did a manual recount ;-) On to the next one.. Okay, this probably is a newbie question, but it looks like exhentai isn't a real site? There is e-hentai, seems like they are related (sister sites?). And you apparently need an e-hentai account first (and some dark magic, probably) before you can use exhentai. I will read a bit into this first. Pretty sure that is the first time I've ever encountered something like this. But this theory has a little flaw: If these two sites are indeed related, I'd assume that they don't differ much on the technical side, if at all. But trying some gallery links from e-hentai got me this:
Or is there another specific reason for this? |
exhentai is basically the "dark" version e-hentai with all the non-advertiser-friendly stuff enabled. You should be able to access this site by doing this:
In the past the domain of the regular site was |
Okay, made an account, will frequent the site a bit and see how it works out then.. Can't test it before, because b603b59 changed the expression pattern, and that part works, but it's still the I will test some other sites in the meantime, and will update my initial post accordingly. |
I don't know if visiting the regular site and so on is even necessary, that is just what I did when I created an account for unit testing and couldn't access exhentai immediately. Speaking of which: I didn't want to make my unit testing accounts any more public than necessary (for, i hope, obvious reasons), but I should probably just share them with you. Take a look at this. |
I'm not sure, but other random sources on the Internet indicate that this is actually the case.
Yes, obviously. That is nice, but it won't be necessary, I've already made an account and started using it a bit. Besides, creating and using different accounts for different sites and services doesn't really bother me at all. If there is some longer gap between my responses, it's only because I'm busy with something else ;-) Another thing, which I think belongs here, because it's not an issue or bug, but maybe a possible suggestion: There is another feature on DeviantArt I wasn't aware of before: The Journal. I noticed it while using gallery-dl with this profile: http://inkveil-matter.deviantart.com/ The site states: 190 deviations. Luckily, there is a statistics page which explains this:
35 Journal entries, so 190 in total. Shamelessly copied from the DeviantArt Wikipedia page:
Not sure if that is useful at all. I clicked around a bit, and saw nothing I would consider as missing. I don't know, not sure If I even really understand this feature yet. Anyway, forgive me my wall of text here, I just wanted to let you know, just in case this is news to you as well ;-) |
Thank you for the suggestion but I'm going to stay with my trusty GPG-encrypted plain text file :)
Even if this platform here is called an issue tracker, feel free (and even encouraged) to create new "issues" if you want to suggest or request a feature or support for a new site.
This seems to be just a collection of blog posts, which might contain references to other deviantart- or sta.sh images. There shouldn't be any images missing: 190 deviations consisting of 155 real deviations and 35 journal entries seems about right to me.
No worries, I don't mind walls of text and actually wasn't aware of the journal or muro, so thanks for telling me. |
Sorry asking about it, this tool have a feature to able remembering what image is already downloaded, without checking local directory? iirc on package is already have sqlite dll, right? Thank you. |
No, I am sorry, but such a feature does currently not exist. Feel free to open a separate issue If you want a feature like this being implemented, but please explain in greater detail what you actually want to do and/or need this feature for. |
Just saw the new commit adding options for skipping files. A change from fc9223c#diff-283aceda91c5f7f10981253611f9f950 def _exists_abort(self):
if self.has_extension and os.path.exists(self.realpath):
raise exception.StopExtraction()
return False Current extractor run, in this context, means just the 'active' URL, right? Maybe a case for an additional option. Or rather not, I'm still not sure about it, need to make up my mind first probably. |
Yes.
The
An |
Yes, for example. I think the current behaviour is just right as the default, we'll see when someone asks for other variants. |
Do you plan to add a graphical interface for the program? At least the input fields and the pause / continuation buttons. Also interesting in the possibility of multi-threading and the possibility to plan the uploads one by one via GUI. Yes, I know that it can be done through the console, but still ... |
Well, I don't know, but if I may, let me add just this: |
No, there are no plans for a graphical user interface, mainly because of the reasons @Hrxn listed. |
Hmmm...I can try to build a graphical shell on C# for windows version, that will intercept commands from and into gallery-dl, but I'm not sure that it will take a little time.But I will try my best. |
Yes, I think that's a good idea. I wouldn't even know what to use a GUI program for, to be honest. If the program is running, there isn't much to see, because what actually takes the most time is just transferring data across the net, aka downloading. You could add some fancy progress bars, but this doesn't really change anything, in my opinion. Besides, progress bar support can also be done in CLI, via simple text output written to the terminal, like wget and curl for example. The only thing I can really think of right now is managing your personal usage history of the program, so to speak. That means having all in one central place, a queue for all URLs that are yet to be processed, and an archive of all URLs that already have been done. This would be more of a meta-program, if you think about it, because all this can be done completely independent of gallery-dl. You could also use this program to write the script files for the CLI then 😄 As a starting point, writing processed URLs to archive file(s) would be a good idea, I think. Something along the lines of the |
Interesting, although these sites seem so similar (and the Gelbooru site even states "concept by Danbooru"), they are yet so different in terms of implementation and functionality. I just checked again, Gelbooru support for pools may be pretty much irrelevant, at least for now. Because unlike on Danbooru, where pools are used quite extensively, Gelbooru only seems to have 25 pools in total right now, and there is not really much activity. At least that is what I see here, even with an account on Gelbooru. Although an account enables to create own pools (public and private), allowing to collect different posts there which could then be downloaded. So this might be relevant to potential gallery-dl users, maybe.. |
There seem to be up to 44500 pools If you take a look at the |
What made you check that? 25 inconceivably low for a big site? 😉 |
This one: https://gelbooru.com/index.php?page=tags&s=list AdblockPlus + filter list seems to be causing the same issue. |
Ah, okay. Yes, pagination also broken for me on that listing. |
Small suggestion: Add a column to Supported Sites to indicate status of user authentication. Not sure, just a simple "Supported" Yes or No. By the way, does this case even exist currently? |
Thanks for the suggestion -> done fb1904d And yes, there are two modules with optional authentication: |
Seems that way. The /g/ threads on warosu even link to their counterparts on archived.moe and rbt.asia, and they all share the same thread-id. |
There’s a good deal of keyword∶value info, 👍 on that. |
Question regarding f3fbaa5
For setting |
@llelf you can currently store any metadata in JSON format by passing @Hrxn they state that every user-agent string should look something like |
Guys, first thanks for a great project, it helps me a lot! The question is: I have used Please, help me |
There is currently no way to add a delay between downloads or limit download speeds, but I guess I will be looking into that next.
In the meantime you could collect a few image URLs from safebooru by using
(The |
@mikf Some extractors don't specify What to do? Manually setting another value for Which allows to use some sub-dirs, i.e. |
If you want to change an extractor's target directory, you should set it's (Extractor) classes will use the values specified in their base class if these aren't specified in the class itself, which in this case means that gfycat extractors are using the value set in the It is also not possible to overwrite an extractor's |
Thanks, got it. |
@mikf There's some unusual behaviour, although I don't think it's a real issue, maybe a cosmetic one, and I assume something like this is specific to Windows as well. I hope it's not too much of a nitpick, probably just a question of different ways to implement it in detail.. For each
But here's the thing: It does not work in the same way for the base directory.
Alternatively, setting
Okay, so it appears that, and please correct me if my conclusion is wrong, the I did a quick code search, I think this is the relevant result: Lines 356 to 359 in a1980b1
Or maybe these functions? Line 410 in a1980b1
Line 379 in a1980b1
|
The value for The full path gets build by something like You can actually use a list of strings as directory segments for As for a reason why it it works the way it does: In the earlier versions of this project I wanted a way to direct all downloads to a common base-path which is how this option came to be; and it has stayed like this ever since. There is a static part + a dynamic part + a filename, which seems reasonable to me. To solve your "slash" problem: I guess I could just replace all forward- with backward-slashes on Windows which should result in a consistent use of |
Interesting to know, thank you for the explanation. In summary, we could say the true cause of this "issue" is the Python interpreter and its implementation itself, right? Depending on the OS, of course, but apparently the functionality of Thank you anyway for addressing this very specific nuisance.. 😄 But with the latest commit, what is the one true way to write my |
Well, not really. The functionality is well documented, so I could have somehow worked around this, but I didn't realize that forward slashes in Windows could be an issue ... doesn't help that I'm not using Windows myself.
As you said yourself, it doesn't really matter. Both work ( |
@mikf If I may inquire, are you currently planning on adding support for some new sites? Or already something in the pipeline? Other plans in that regard? Because I'd like to make a suggestion, basically, and maybe get some other opinions and feedback in here 😄 |
Happy New Year to you, too. There are no plans on adding support for new sites from my side, but I have been thinking about adding a few features - an equivalent of YoutubeDL's If you have an idea or suggestion about improvements, (new) features, site support, etc., just open a new issue and let me know. |
I presume that something like Not sure if templates for GitHub are really that necessary, considering the rather low amount of opened issues. If the tracker gets flooded with new issues, this would be a different story. But if you think that the repository would feel like something's missing, for lack of a better description right now, don't mind my comment on this 😉 I will definitely open a new issue for a new site, but I wanted to gather some feedback first, and since this thread is already in existence [1], I thought it would be a good idea to simply ask first. Dunno, I would really like to see some other users chiming in here, but so far there aren't that many, unfortunately. Okay, everyone reading this, please let me know: What do you think of adding support for ArtStation, for example? [1] Maybe this Projects feature on GitHub would be a good alternative? |
Anyone? Please? |
Functions covered in GH projects and issue tracker are virtually the same. Only important factor here is personal preference of main maintainer. I think that common tracker is much more straightforward. |
The projects page doesn't seem particularly suited to fulfill a similar role as this meta-issue here does. Having an issue for general discussion is a lot more accessible/visible then a "meta-project" on the projects page and, as Bfgeshka said, much more straightforward for the average user. But you are right, there should probably be another way and place for general questions and discussion. An IRC channel (on freenode?) would nice and all but it would most likely require some sort of logging bot to be useful. Another alternative might be Gitter, which is used by quite a few other GitHub projects. I've played around with it a bit and registered a "community" and room there: https://gitter.im/gallery-dl/main . Maybe that is something to use. |
This Gitter thing is pretty nice.. especially the integration with GitHub, definite advantage over a normal vanilla IRC channel. As I understand it, the projects feature offers better visualization and organization of all related matters, in the form of boards, kanban style. I personally like these, but it might take some time to get used to it for any novices, and at the current state of the project in general, primarily activity, it might be a bit overkill right now. And you are right, accessibility and visibility should be the main concern here. I mean, any board/notes whatever in Project can be mentioned (and linked) in In the meantime, the meta-issue is definitely fine with me, no complaints here. Although on my end, not sure if you are affected as well, I can notice a small delay when opening this issue, it's not slow or anything, but noticeable, in my opinion. And as #11 here continues to grow, I guess at some point we'd have to close it and open a new one 😄 But okay, I think we're already in bike-shedding territory here. |
@mikf can you recommend a way to cache the result of the extractor?
j = job.UrlJob("http://example.org/image")
j.run() # prints "http://example.org/img.jpg"
print(j.extractor) this will take a long time as example link of reddit thread, where it will find another links and extract it directly. so i'm trying a custom `UrlJob', which handle message with type Message.queue as Message.Url. class CustomUrlJob(job.UrlJob):
def run(self):
try:
log = self.extractor.log
for msg in self.extractor:
if msg[0] == Message.Queue:
_, url, kwds = msg
self.update_kwdict(kwds)
self.handle_url(url, kwds)
else:
self.dispatch(msg)
... is there better way to do it? |
Caching extractor results (and a bit more) is what the DataJob class does, but you can have this a lot easier than that.
To just copy all of these tuples for later use, try this: https://gist.github.com/mikf/052916c25a9bda7d6876a355cacbe88f And the UrlJob thing is a bit of a mistake on my part and will be fixed in one of the next commits. For the time being, set edit: updated the gist code to use |
@Hrxn: before I forget, I'm also noticing a considerable delay when opening this issue, so closing this and creating a new one might be in order. |
Roger that, closing this and opening issue for new site soon. |
A central place for these things might be a good idea.
This thread could serve as a starting point, results will eventually be collected in the project wiki, if appropriate and useful.
Edited 2017-04-15
For conciseness
Edited 2017-05-04
Removed nonsensical checklist thing
The text was updated successfully, but these errors were encountered: