Full Content support for Phoronix.com #623

privacyadmin · 2024-12-23T15:18:13Z

Feed URL

https://www.phoronix.com/news/Raspberry-Pi-HEVC-H265-Decode

Add any details, links, or screenshots about the article layout that's missing or wrong

Text and images are missing/incomplete

jocmp · 2024-12-23T21:01:15Z

Thanks @privacyadmin. The update for phoronix.com is available as of 2024.12.1086.

privacyadmin · 2024-12-25T04:52:12Z

Hi. I would like to reopen this as it doesn't seem to fix the site. I have already updated to the latest version as mentioned.

e.g. https://www.phoronix.com/review/amd-epyc-9005-hpc-tuning

jocmp · 2024-12-25T05:06:53Z

hm, looks like they use a different format for these articles. I'll take a look!

privacyadmin · 2024-12-25T05:20:26Z

Thank you for taking the time to relook into this. Much appreciated and Merry Xmas!

jocmp · 2024-12-27T23:31:24Z

@privacyadmin I added a fix for this in 2024.12.1087. Feel free to reopen this issue if you're still seeing problems after that!

privacyadmin · 2024-12-31T11:26:48Z

Just received another Phoronix article with the same issue.

https://www.phoronix.com/news/Deadline-DRM-Scheduler-RFC

Hope this helps. Thanks and Happy New Year

jocmp · 2024-12-31T21:12:39Z

Happy New Year!

I tried out that article on version 2024.12.1087 but I wasn't able to reproduce. Is it completely blank for you? Here's what I'm seeing.

privacyadmin · 2025-01-01T02:35:39Z

This is what I see even after trying to fetch full content. I tried on another article and it is the same too.

My CapyReader is with version 2024.12.1088.dev on another Phoronix article.

jocmp · 2025-01-01T21:44:36Z

@privacyadmin the unfilled full content icon makes it look like the article isn't being parsed at all.

I added some error handling in 2025.01.1089 that will at least make this more visible if it's a fetch error.

privacyadmin · 2025-01-03T11:34:05Z

Yes it seems that when I refresh my Photonix feeds, it says failed to fetch article. The site is definitely reachable since I can access it via tapping the article title or using another RSS reader. Any idea why it happens this way and if it's fixable?

jocmp · 2025-01-03T14:44:45Z

The "failed to fetch article" comes back when there's no response from the site. The exact reason should be in the crash log file under Settings > General > Share crash logs. The log entries that have cr.full_content will have the exact error code.

privacyadmin · 2025-01-04T06:12:23Z

This are the error messages from the latest CapyReader.

01-04 14:08:03.919 20167 20357 I cr.full_content: error_type=MissingBodyError error_message=403 01-04 14:08:05.368 20167 20357 I cr.full_content: error_type=MissingBodyError error_message=403 01-04 14:08:10.112 20167 20167 W WindowOnBackDispatcher: sendCancelIfRunning: isInProgress=false callback=c.y@ed9912a 01-04 14:08:18.240 20407 20407 W libc : Access denied finding property "ro.debuggable" 01-04 14:08:18.239 20407 20407 W logcat : type=1400 audit(0.0:37955): avc: denied { read } for name="u:object_r:userdebug_or_eng_prop:s0" dev="tmpfs" ino=456 scontext=u:r:untrusted_app:s0:c193,c256,c512,c768 tcontext=u:object_r:userdebug_or_eng_prop:s0 tclass=file permissive=0 app=com.capyreader.app

jocmp · 2025-01-04T21:30:49Z

Thanks @privacyadmin. Those are 403 forbidden errors then. There's not much I can do in the app when a website comes back with that. Based on some chatter in Phoronix's forums, it looks like this might be a problem that happens semi-regularly with that site.

privacyadmin · 2025-01-06T00:28:49Z

Hmm...I just tested a Phoronix article with another RSS reader with some findings.

403 error seems unlikely as the article is reachable by clicking the article header and redirecting to it through the browser. If it was 403, wouldn't clicking the header yield the same error ( article not reachable)?
The speed at which it says failed to reach the article seems a bit too fast, almost like it weren't even trying. Usually when requesting it to fetch full content, it will take a second or two while it fetches from the site but for Phoronix it instaneously failed. This brings me to pt 3.
I tested the same Phoronix article with FeedMe, another RSS reader on Android. It works with both it's built in mobilizer and via browser. This will rule out at least an issue on my device and IP ( in case it was blocked or restricted by Cloudflare maybe?).

Is it possible that my setting for Phoronix site was corrupted or something? Any chance to reset it for Phoronix while leaving the other sites preferences intact?

jocmp · 2025-01-06T02:56:01Z

Short answer

These are all good points. The full content mode is just an on/off switch so there's no specific site data for Phoronix compared to any other site. I published another version just now, 2025.01.1092-dev, that I hope will solve this. Let me know!

Long answer

As to your other points, I mentioned 403 because the HTTP status code is logged alongside the MissingBodyError error, cr.full_content: error_type=MissingBodyError error_message=403.

capyreader/capy/src/main/java/com/jocmp/capy/articles/ArticleContent.kt

Line 26 in 91915b0

Result.failure(MissingBodyError(message = response.code.toString()))

In my experience failures like 403 errors and 4xx errors in general will always be much faster because there's less data to transmit, and no article data to parse, making it feel instant.

To your third point, I think some sites may block Capy based on its user agent which is why opening it in the article header (which uses the browser's user agent) or FeedMe render different results. To my understanding, headers, location and IP all act as indicators for Cloudflare's bot scores. These indicators don't work the same across all regions.

As an example, I tried accessing another feed while testing hardwarezone.com.sg (#622). Full content works on a different RSS reader called Reeder for Mac, but not for Capy. The following returns a 200 response.

curl 'https://www.hardwarezone.com.sg/review-microsoft-surface-laptop-7-snapdragon-x-series-mobile-processors' \
-H 'Host: www.hardwarezone.com.sg' \
-H 'Accept: */*' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Connection: keep-alive' \
-H 'User-Agent: Reeder/5050001 CFNetwork/1568.200.51 Darwin/24.1.0'

But if I change the user agent to Capy's user agent, CapyReader (RSS Reader https://capyreader.com/) okhttp/4.12.0, it fails with a 403 error.

curl 'https://www.hardwarezone.com.sg/review-microsoft-surface-laptop-7-snapdragon-x-series-mobile-processors' \
-H 'Host: www.hardwarezone.com.sg' \
-H 'Accept: */*' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Connection: keep-alive' \
-H 'User-Agent: CapyReader (RSS Reader https://capyreader.com/) okhttp/4.12.0'

Partial output, 403 response code:

* Request completely sent off
< HTTP/2 403
< content-type: text/html
< content-length: 12223
< last-modified: Wed, 21 Aug 2024 07:22:41 GMT
< accept-ranges: bytes
< server: AmazonS3
< x-cache: Error from cloudfront
< x-xss-protection: 1; mode=block
< x-frame-options: SAMEORIGIN
< referrer-policy: strict-origin-when-cross-origin
< x-content-type-options: nosniff
< vary: Origin

However, if I exclude okhttp/4.12.0 in Capy's user agent, it works! (#685) I omitted that in the latest version. I'm hoping it will solve phoronix.com for you like it fixed hardwarezone.com.sg for me.

curl 'https://www.hardwarezone.com.sg/review-microsoft-surface-laptop-7-snapdragon-x-series-mobile-processors' \
-H 'Host: www.hardwarezone.com.sg' \
-H 'Accept: */*' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Connection: keep-alive' \
-H 'User-Agent: CapyReader (RSS Reader; https://capyreader.com/)'

Partial output, 200 response code:

< HTTP/2 200
< content-type: text/html; charset=UTF-8
< date: Mon, 06 Jan 2025 02:46:48 GMT
< server: WEB

privacyadmin · 2025-01-06T03:15:36Z

Thank you for the investigation into Phoronix. I just updated to the latest 2025.01.1092-dev and confirmed that #622 is working and the user agent fixed the issue.

I tested again with Phoronix again and still no joy. What I find particular interesting is that your end seems to be working with Phoronix (based on your response from 5 days ago) but not for me.

At this point, do let me reset my Capy Reader installation and try it again. Will report back on this.

privacyadmin · 2025-01-06T04:00:19Z

Short update:

I cleared Capy reader storage and uninstalled 2025.01.1092-dev for a fresh start. I reinstalled and linked it to my FreshRSS instance and tested Phoronix. Still the same problem.
I again cleared Capy Reader and uninstalled it. Reinstalled agin but this time I selected Local instead of FreshRSS (just in case) and loaded in only https://www.phoronix.com/rss.php and no other feeds. Still the same problem.
In a last ditch attempt to confirm if its an Android issue (I know its not but just to confirm if I can replicate the issue similarly on another platform), I loaded up Phoronix RSS on my iOS with lire and confirmed that Phoronix works fine with full content on it.

At this point, I can only assume that some combinations of factors (IP, location, useragent etc) will result in the full content error on Capy Reader since you are unable to replicate it on your end. I'm not sure if you are ok to leave this ticket opened for now or to close it.

I am fine either way for now.

jocmp · 2025-01-07T01:54:30Z

Sorry that didn't fix this feed for you. Depending on whether or not you want to continue trying, you could see if the Phoronix forum mods are willing to unblock the user-agent CapyReader (RSS Reader; https://capyreader.com/) for your region. I'll go ahead and close this out for now.

privacyadmin added the full content request label Dec 23, 2024

jocmp added this to Capy Reader Dec 23, 2024

jocmp self-assigned this Dec 23, 2024

jocmp moved this to In Progress in Capy Reader Dec 23, 2024

jocmp mentioned this issue Dec 23, 2024

Update phoronix.com custom parser #624

Merged

jocmp closed this as completed Dec 23, 2024

github-project-automation bot moved this from In Progress to Done in Capy Reader Dec 23, 2024

jocmp reopened this Dec 25, 2024

jocmp moved this from Done to In Progress in Capy Reader Dec 26, 2024

jocmp mentioned this issue Dec 26, 2024

Fix phoronix.com parser with cross-origin resources #633

Merged

jocmp closed this as completed Dec 27, 2024

github-project-automation bot moved this from In Progress to Done in Capy Reader Dec 27, 2024

jocmp reopened this Jan 3, 2025

jocmp closed this as completed Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full Content support for Phoronix.com #623

Full Content support for Phoronix.com #623

privacyadmin commented Dec 23, 2024

jocmp commented Dec 23, 2024

privacyadmin commented Dec 25, 2024

jocmp commented Dec 25, 2024

privacyadmin commented Dec 25, 2024

jocmp commented Dec 27, 2024

privacyadmin commented Dec 31, 2024

jocmp commented Dec 31, 2024

privacyadmin commented Jan 1, 2025

jocmp commented Jan 1, 2025

privacyadmin commented Jan 3, 2025

jocmp commented Jan 3, 2025

privacyadmin commented Jan 4, 2025

jocmp commented Jan 4, 2025

privacyadmin commented Jan 6, 2025 •

edited

Loading

jocmp commented Jan 6, 2025

privacyadmin commented Jan 6, 2025

privacyadmin commented Jan 6, 2025

jocmp commented Jan 7, 2025

Full Content support for Phoronix.com #623

Full Content support for Phoronix.com #623

Comments

privacyadmin commented Dec 23, 2024

Feed URL

Add any details, links, or screenshots about the article layout that's missing or wrong

jocmp commented Dec 23, 2024

privacyadmin commented Dec 25, 2024

jocmp commented Dec 25, 2024

privacyadmin commented Dec 25, 2024

jocmp commented Dec 27, 2024

privacyadmin commented Dec 31, 2024

jocmp commented Dec 31, 2024

privacyadmin commented Jan 1, 2025

jocmp commented Jan 1, 2025

privacyadmin commented Jan 3, 2025

jocmp commented Jan 3, 2025

privacyadmin commented Jan 4, 2025

jocmp commented Jan 4, 2025

privacyadmin commented Jan 6, 2025 • edited Loading

jocmp commented Jan 6, 2025

Short answer

Long answer

privacyadmin commented Jan 6, 2025

privacyadmin commented Jan 6, 2025

jocmp commented Jan 7, 2025

privacyadmin commented Jan 6, 2025 •

edited

Loading