Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

javr: enable multiple scrapers, and add javlibrary and javbus #1100

Merged
merged 9 commits into from
Jan 17, 2023

Conversation

thebrnd
Copy link
Contributor

@thebrnd thebrnd commented Jan 7, 2023

Two additional javr scene importers, because one just isn't enough. See discussion in pull request #1067 for details. Is anybody willing to test or comment on this?

@vt-idiot
Copy link
Contributor

vt-idiot commented Jan 7, 2023

I can give it about as thorough a look as I did for the previous PR! I don't think I'll have the free time today to fumble with getting Gitpod to properly load and start XBVR, as I had to do a few extra things with installing or building stuff within it for it to actually launch, and I didn't take any notes, but I'll give it a go by tomorrow evening!!!

@thebrnd
Copy link
Contributor Author

thebrnd commented Jan 8, 2023

Thanks vt! I also struggled with the build system for a while when starting this, because I didn't feel like using gitpod and prefer a local development environment over a cloud environment. It means I had to start with creating a standalone Dockerfile and docker-compose file, so I could build and spin up a development copy with docker compose up --build -d. I never included this in any pull request because I figured xbvr devs would prefer gitpod instead. I pushed them to a separate branch now for you though, so if you are familiar with using docker compose you can use those build files instead, if you prefer. You don't necessarily have to use specifically the whole branch, just these two files should be enough.

https://github.com/thebrnd/xbvr/blob/standalone-build/Dockerfile-standalone
https://github.com/thebrnd/xbvr/blob/standalone-build/docker-compose.yml

@vt-idiot
Copy link
Contributor

vt-idiot commented Jan 9, 2023

I'm on Windows, (WS2019 - it's got WSL and Docker but not as "good" as 2022) so I haven't the faintest idea what to do with Docker.

I'm going to try and remember what I did it to launch in Gitpod beyond https://gitpod.io/#https://github.com/thebrnd/xbvr/tree/javr-javlibrary

@vt-idiot
Copy link
Contributor

vt-idiot commented Jan 9, 2023

I focused on 3DSVR since it was the only one where I "thought of a fix" earlier, and just picked some random unmatched scenes I actually had for testing

anyways, all three new scrapers seem to work. I'm not sure if you want to add workarounds for how 3DSVR scrapes as "DSVR" for both javlibrary and jav.land, but they both appear to show up in the library universally the same way they did from javdatabase before the bodge/fix - DSVR-\d{5} - all scenes, not just the newer ones like on javdatabase

I didn't scrape too many scenes, but I did notice that e.g. javlibrary returned "studio": "SOD Create  ", note the trailing spaces, not sure if that's a function of the scraper, or if they actually have it entered that way on their end. it did the same for a CJVR scene, "studio": "Bi  ",

JAV.land also returned "studio": "Sod Create  \n\t\t\t\t\t\t\t\t\t\t\t\t", and "studio": "Warp Entertainment  \n\t\t\t\t\t\t\t\t\t\t\t\t", for a WPVR scene, in testing which is probably very not good.

the other peculiarities, like differences in actress names, JAVBUS having Japanese studio names SODクリエイト or absolutely hilarious non-sensical tags from JAV.land like miss hippopotamus can IMO be left alone, or up to the users. a match/replace list for JAVBUS studios is technically possible, but a lot of work, I am not in any way suggesting you do that, but if you started it off "the right way" with that one entry, I might update it myself over time. it might be moot either way, since "studio" is AFAIK not something we can filter by anyways

actress name differences aren't a huge deal, either users can change them out on their own, or they can use AKAs.

also another tag to add to the "drop these useless annoying tags" list, this one was from jav.land:
"name": "solo work"
and this one was from javlibrary:
"name": "solowork"

final thought: similarly to how the "JAVR" tag is added to anything scraped using the scraper, it might be worth adding an additional tag, for the actual scraper used? only if it's easy to implement - and something simple, so either javbus, javdatabase, jav.land or javlibrary alongside the JAVR tag.

let me know if you'd like me to look at anything else!


unrelated gitpod rambling

IDK if it's just me, or what, but with gitpod, yarn dev errors out because concurrently isn't installed, and then vue-cli-service

npm i -g concurrently
npm i -g vue
npm i -g @vue/cli-service
npm i -g go
npm audit fix

only then does yarn dev actually work in the Web interface that Gitpod shows me, but it seems to freeze. Ctrl-C, and then I'm not sure if I need yarn build or yarn serve but running them both, and then finally npm run dev gives me a working 9999 link?

@vt-idiot
Copy link
Contributor

vt-idiot commented Jan 9, 2023

Also, I just figured out how to PR directly to your fork, so I'll take care of

  1. Adding scraper name as a tag
  2. The additional tag being added

!!!

Add `javdatabase` as a tag
Always add `javbus` as a tag
Always add 'jav.land' as a tag
Always add javlibrary as a tag
Changed both of the skip/re-map lists to tab separation.

Moved "solo/solo work/solowork" to the "drop" list, I forgot what the R18 tag was, but unless I am mistaken, it was a tag they (and FANZA by extension) automatically add(ed) to titles that aren't part of an overarching "series". It's meaningless to us since we don't scrape R18/DMM's "series" listings, nor would we have a way to filter for them in XBVR.

Changed "kiss kiss" to the tag to be retained, as that was the tag R18 used, and most users would already have plenty of in their databases. Probably best to maintain continuity with the old R18 tags whenever possible if this is to be done.
Similarly, `suntan` was the tag R18 used, I have 12 entries in my library that pre-date the manifests I started writing myself.
@vt-idiot
Copy link
Contributor

vt-idiot commented Jan 9, 2023

I made some edits to the tag replacement rules, both in cases where I know what the old R18 tag used to be, moved the "solo" related tags to the drop list, and added the extra tag per scraper source. Also added tab separation, but without lint running on the fork IDK if I broke anything or not because gofmt really doesn't like stray spaces.

@thebrnd
Copy link
Contributor Author

thebrnd commented Jan 9, 2023

Patched the following:

  • Apply the 3dsvr correction for any of the jav parsers, instead of only the one
  • Trim whitespace on the Studio and Synopsis
  • Move some duplicate code into javutils.go for shared use between all of the javr sources

Also added tab separation, but without lint running on the fork IDK if I broke anything or not because gofmt really doesn't like stray spaces.

My code editor does it's own thing with whitespace whenever I save, so I was unable to preserve your whitespace changes.

Are you planning to further complete the tag mapping or is it finished?

@vt-idiot
Copy link
Contributor

vt-idiot commented Jan 9, 2023

I just made corrections to the ones I saw in there, I imagine that one will be a WIP as time goes on and I notice any differences or similarities between R18 tags and any of the tags from the new scrapers.

@crwxaj crwxaj merged commit e676c1b into xbapps:master Jan 17, 2023
balckpixie pushed a commit to balckpixie/xbvr that referenced this pull request Dec 27, 2024
…ps#1100)

* javr: enable multiple scraper options, and add javlibrary and javbus

* javr: add jav.land as well

* Update javdatabase.go

Add `javdatabase` as a tag

* Update javbus.go

Always add `javbus` as a tag

* Update javland.go

Always add 'jav.land' as a tag

* Update javlibrary.go

Always add javlibrary as a tag

* Update javtags.go

Changed both of the skip/re-map lists to tab separation.

Moved "solo/solo work/solowork" to the "drop" list, I forgot what the R18 tag was, but unless I am mistaken, it was a tag they (and FANZA by extension) automatically add(ed) to titles that aren't part of an overarching "series". It's meaningless to us since we don't scrape R18/DMM's "series" listings, nor would we have a way to filter for them in XBVR.

Changed "kiss kiss" to the tag to be retained, as that was the tag R18 used, and most users would already have plenty of in their databases. Probably best to maintain continuity with the old R18 tags whenever possible if this is to be done.

* Update javtags.go

Similarly, `suntan` was the tag R18 used, I have 12 entries in my library that pre-date the manifests I started writing myself.

* javr: more error-correcting code, and less code-reuse between scrapers

Co-authored-by: vt-idiot <81622808+vt-idiot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants