Where did the CSV of all danbooru tags come from? #194
-
edit: I did find this, though I don't have nor want a google account, so I'll be sticking to the CSV that comes with the extension unless there's something else. https://danbooru.donmai.us/forum_topics/12774 I'm wanting a CSV copy for personal use. I've been using the copy that comes installed in the extensions folder with this extension, but it looks like it stops cleanly at 100k tags which makes me skeptical that some tags are missing (actually I know there's been tags on danbooru that weren't searchable in the extension). |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
For a long time danbooru had a public daily dump under https://danbooru.donmai.us/cache/tags.json, and a similar one for aliases. I then converted the json file to an SQLite database using https://github.com/dcmoura/spyql so I could query and export it easier and simply created a view with the top 100k tags sorted by post count and joined with their aliases, then exported that to csv. You are correct that there are a few tags not included, but it should only be tags either added after the snap was taken or with a very low post count, since as said above deprecated tags are included (a few old tags have a lot of associated posts before the tag was changed). I can send you the full SQLite DB if you want, which is probably a lot easier to work with than the CSVs. It contains not everything that was in the original JSON, but most of it. |
Beta Was this translation helpful? Give feedback.
For a long time danbooru had a public daily dump under https://danbooru.donmai.us/cache/tags.json, and a similar one for aliases.
However, this was removed some time at the end of 2022, probably to discourage data scraping. But that is where my copy came from, I think my version is a snapshot from November. It was all tags on danbooru at that point as far as I know, including deprecated ones.
I then converted the json file to an SQLite database using https://github.com/dcmoura/spyql so I could query and export it easier and simply created a view with the top 100k tags sorted by post count and joined with their aliases, then exported that to csv.
You are correct that there are a few tags n…