Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Content support for Phoronix.com #623

Closed
privacyadmin opened this issue Dec 23, 2024 · 18 comments
Closed

Full Content support for Phoronix.com #623

privacyadmin opened this issue Dec 23, 2024 · 18 comments
Assignees

Comments

@privacyadmin
Copy link

Feed URL

https://www.phoronix.com/news/Raspberry-Pi-HEVC-H265-Decode

Add any details, links, or screenshots about the article layout that's missing or wrong

Text and images are missing/incomplete

@jocmp
Copy link
Owner

jocmp commented Dec 23, 2024

Thanks @privacyadmin. The update for phoronix.com is available as of 2024.12.1086.

@jocmp jocmp closed this as completed Dec 23, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Capy Reader Dec 23, 2024
@privacyadmin
Copy link
Author

Hi. I would like to reopen this as it doesn't seem to fix the site. I have already updated to the latest version as mentioned.

e.g. https://www.phoronix.com/review/amd-epyc-9005-hpc-tuning

@jocmp
Copy link
Owner

jocmp commented Dec 25, 2024

hm, looks like they use a different format for these articles. I'll take a look!

@jocmp jocmp reopened this Dec 25, 2024
@privacyadmin
Copy link
Author

Thank you for taking the time to relook into this. Much appreciated and Merry Xmas!

@jocmp jocmp moved this from Done to In Progress in Capy Reader Dec 26, 2024
@jocmp
Copy link
Owner

jocmp commented Dec 27, 2024

@privacyadmin I added a fix for this in 2024.12.1087. Feel free to reopen this issue if you're still seeing problems after that!

@jocmp jocmp closed this as completed Dec 27, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Capy Reader Dec 27, 2024
@privacyadmin
Copy link
Author

Just received another Phoronix article with the same issue.

https://www.phoronix.com/news/Deadline-DRM-Scheduler-RFC

Hope this helps. Thanks and Happy New Year

@jocmp
Copy link
Owner

jocmp commented Dec 31, 2024

Happy New Year!

I tried out that article on version 2024.12.1087 but I wasn't able to reproduce. Is it completely blank for you? Here's what I'm seeing.

@privacyadmin
Copy link
Author

image

This is what I see even after trying to fetch full content. I tried on another article and it is the same too.

image

My CapyReader is with version 2024.12.1088.dev on another Phoronix article.

@jocmp
Copy link
Owner

jocmp commented Jan 1, 2025

@privacyadmin the unfilled full content icon makes it look like the article isn't being parsed at all.

I added some error handling in 2025.01.1089 that will at least make this more visible if it's a fetch error.

@privacyadmin
Copy link
Author

Yes it seems that when I refresh my Photonix feeds, it says failed to fetch article. The site is definitely reachable since I can access it via tapping the article title or using another RSS reader. Any idea why it happens this way and if it's fixable?

@jocmp
Copy link
Owner

jocmp commented Jan 3, 2025

The "failed to fetch article" comes back when there's no response from the site. The exact reason should be in the crash log file under Settings > General > Share crash logs. The log entries that have cr.full_content will have the exact error code.

@jocmp jocmp reopened this Jan 3, 2025
@privacyadmin
Copy link
Author

This are the error messages from the latest CapyReader.

01-04 14:08:03.919 20167 20357 I cr.full_content: error_type=MissingBodyError error_message=403 01-04 14:08:05.368 20167 20357 I cr.full_content: error_type=MissingBodyError error_message=403 01-04 14:08:10.112 20167 20167 W WindowOnBackDispatcher: sendCancelIfRunning: isInProgress=false callback=c.y@ed9912a 01-04 14:08:18.240 20407 20407 W libc : Access denied finding property "ro.debuggable" 01-04 14:08:18.239 20407 20407 W logcat : type=1400 audit(0.0:37955): avc: denied { read } for name="u:object_r:userdebug_or_eng_prop:s0" dev="tmpfs" ino=456 scontext=u:r:untrusted_app:s0:c193,c256,c512,c768 tcontext=u:object_r:userdebug_or_eng_prop:s0 tclass=file permissive=0 app=com.capyreader.app

@jocmp
Copy link
Owner

jocmp commented Jan 4, 2025

Thanks @privacyadmin. Those are 403 forbidden errors then. There's not much I can do in the app when a website comes back with that. Based on some chatter in Phoronix's forums, it looks like this might be a problem that happens semi-regularly with that site.

@privacyadmin
Copy link
Author

privacyadmin commented Jan 6, 2025

Hmm...I just tested a Phoronix article with another RSS reader with some findings.

  1. 403 error seems unlikely as the article is reachable by clicking the article header and redirecting to it through the browser. If it was 403, wouldn't clicking the header yield the same error ( article not reachable)?

  2. The speed at which it says failed to reach the article seems a bit too fast, almost like it weren't even trying. Usually when requesting it to fetch full content, it will take a second or two while it fetches from the site but for Phoronix it instaneously failed. This brings me to pt 3.

  3. I tested the same Phoronix article with FeedMe, another RSS reader on Android. It works with both it's built in mobilizer and via browser. This will rule out at least an issue on my device and IP ( in case it was blocked or restricted by Cloudflare maybe?).

Screenshot_20250106-081809
Screenshot_20250106-081828

Is it possible that my setting for Phoronix site was corrupted or something? Any chance to reset it for Phoronix while leaving the other sites preferences intact?

@jocmp
Copy link
Owner

jocmp commented Jan 6, 2025

Short answer

These are all good points. The full content mode is just an on/off switch so there's no specific site data for Phoronix compared to any other site. I published another version just now, 2025.01.1092-dev, that I hope will solve this. Let me know!

Long answer

As to your other points, I mentioned 403 because the HTTP status code is logged alongside the MissingBodyError error, cr.full_content: error_type=MissingBodyError error_message=403.

Result.failure(MissingBodyError(message = response.code.toString()))

In my experience failures like 403 errors and 4xx errors in general will always be much faster because there's less data to transmit, and no article data to parse, making it feel instant.

To your third point, I think some sites may block Capy based on its user agent which is why opening it in the article header (which uses the browser's user agent) or FeedMe render different results. To my understanding, headers, location and IP all act as indicators for Cloudflare's bot scores. These indicators don't work the same across all regions.

As an example, I tried accessing another feed while testing hardwarezone.com.sg (#622). Full content works on a different RSS reader called Reeder for Mac, but not for Capy. The following returns a 200 response.

curl 'https://www.hardwarezone.com.sg/review-microsoft-surface-laptop-7-snapdragon-x-series-mobile-processors' \
-H 'Host: www.hardwarezone.com.sg' \
-H 'Accept: */*' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Connection: keep-alive' \
-H 'User-Agent: Reeder/5050001 CFNetwork/1568.200.51 Darwin/24.1.0'

But if I change the user agent to Capy's user agent, CapyReader (RSS Reader https://capyreader.com/) okhttp/4.12.0, it fails with a 403 error.

curl 'https://www.hardwarezone.com.sg/review-microsoft-surface-laptop-7-snapdragon-x-series-mobile-processors' \
-H 'Host: www.hardwarezone.com.sg' \
-H 'Accept: */*' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Connection: keep-alive' \
-H 'User-Agent: CapyReader (RSS Reader https://capyreader.com/) okhttp/4.12.0'

Partial output, 403 response code:

* Request completely sent off
< HTTP/2 403
< content-type: text/html
< content-length: 12223
< last-modified: Wed, 21 Aug 2024 07:22:41 GMT
< accept-ranges: bytes
< server: AmazonS3
< x-cache: Error from cloudfront
< x-xss-protection: 1; mode=block
< x-frame-options: SAMEORIGIN
< referrer-policy: strict-origin-when-cross-origin
< x-content-type-options: nosniff
< vary: Origin

However, if I exclude okhttp/4.12.0 in Capy's user agent, it works! (#685) I omitted that in the latest version. I'm hoping it will solve phoronix.com for you like it fixed hardwarezone.com.sg for me.

curl 'https://www.hardwarezone.com.sg/review-microsoft-surface-laptop-7-snapdragon-x-series-mobile-processors' \
-H 'Host: www.hardwarezone.com.sg' \
-H 'Accept: */*' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Connection: keep-alive' \
-H 'User-Agent: CapyReader (RSS Reader; https://capyreader.com/)'

Partial output, 200 response code:

< HTTP/2 200
< content-type: text/html; charset=UTF-8
< date: Mon, 06 Jan 2025 02:46:48 GMT
< server: WEB

@privacyadmin
Copy link
Author

Thank you for the investigation into Phoronix. I just updated to the latest 2025.01.1092-dev and confirmed that #622 is working and the user agent fixed the issue.

I tested again with Phoronix again and still no joy. What I find particular interesting is that your end seems to be working with Phoronix (based on your response from 5 days ago) but not for me.

At this point, do let me reset my Capy Reader installation and try it again. Will report back on this.

@privacyadmin
Copy link
Author

Short update:

  1. I cleared Capy reader storage and uninstalled 2025.01.1092-dev for a fresh start. I reinstalled and linked it to my FreshRSS instance and tested Phoronix. Still the same problem.

  2. I again cleared Capy Reader and uninstalled it. Reinstalled agin but this time I selected Local instead of FreshRSS (just in case) and loaded in only https://www.phoronix.com/rss.php and no other feeds. Still the same problem.

  3. In a last ditch attempt to confirm if its an Android issue (I know its not but just to confirm if I can replicate the issue similarly on another platform), I loaded up Phoronix RSS on my iOS with lire and confirmed that Phoronix works fine with full content on it.

At this point, I can only assume that some combinations of factors (IP, location, useragent etc) will result in the full content error on Capy Reader since you are unable to replicate it on your end. I'm not sure if you are ok to leave this ticket opened for now or to close it.

I am fine either way for now.

@jocmp
Copy link
Owner

jocmp commented Jan 7, 2025

Sorry that didn't fix this feed for you. Depending on whether or not you want to continue trying, you could see if the Phoronix forum mods are willing to unblock the user-agent CapyReader (RSS Reader; https://capyreader.com/) for your region. I'll go ahead and close this out for now.

@jocmp jocmp closed this as completed Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

2 participants