Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

german news media #23

Open
schochastics opened this issue Oct 15, 2024 · 2 comments
Open

german news media #23

schochastics opened this issue Oct 15, 2024 · 2 comments

Comments

@schochastics
Copy link
Contributor

schochastics commented Oct 15, 2024

I'd be willing to contribute with some german news outlets (will add more over time to the list):

  • bild.de
  • spiegel.de
  • welt.de
  • tagesschau.de
  • focus.de
  • fr-online.de (redirects to fr.de)
  • heute.de (redirects to zdf.de/nachrichten)
  • stern.de
  • sueddeutsche.de
  • n-tv.de
  • n24.de (redirects to welt.de)
  • rtl.de
  • prosieben.de
  • rp-online.de
  • sat1.de
  • t-online.de
  • tagesspiegel.de
  • morgenpost.de
  • handelsblatt.com
  • berliner-zeitung.de
  • badische-zeitung.de
  • derwesten.de
  • tag24.de
  • heise.de
  • merkur.de
  • ndr.de
  • br.de
  • t3n.de
  • karlsruhe-insider.de
  • mdr.de
  • ruhr24.de
  • tz.de
  • swr.de
  • swp.de
  • augsburger-allgemeine.de
  • daserste.de (news link to tageschau.de)
  • watson.de
  • wiwo.de
  • rnd.de
  • news.de
  • deutschlandfunk.de
  • businessinsider.de
  • nzz.ch
  • waz.de
  • finanzen.net
  • presseportal.de
  • wdr.de
  • hna.de
  • express.de
  • ksta.de
  • suedkurier.de
  • bz-berlin.de (access denied, needs investigation)
  • deutschlandfunkkultur.de
  • kreiszeitung.de
  • abendblatt.de
  • stuttgarter-zeitung.de
  • infranken.de
  • arte.tv (just videos)
  • rbb24.de
  • abendzeitung-muenchen.de
  • echo24.de
  • mopo.de
  • saechische.de
  • kurier.at
  • manager-magazin.de
  • bnn.de
  • nordkurier.de
  • rollingstone.de
  • berliner-kurier.de
  • vice.com
  • ruhrnachrichten.de
  • vox.de
  • der-postillon.com
  • heidelberg24.de
  • news-und-nachrichten.de
  • rbb-online.de (redirect of rbb24.de)
  • volksstimme.de
  • 3sat.de
  • derstandard.at
  • lvz.de
  • swrfernsehen.de
  • shz.de
  • fnp.de
  • freiepresse.de
  • wa.de
  • haz.de
  • blick.ch (access denied, needs investigation)
  • nw.de
  • noz.de
  • orf.at
  • srf.ch
  • epochtimes.de
  • ostsee-zeitung.de
  • swr3.de
  • newsflash24.de
  • jungefreiheit.de
  • kabeleins.de
  • thueringer-allgemeine.de
  • watson.ch
  • maz-online.de
  • taz.de
  • schwaebische.de
  • wz.de
  • dnn.de
  • frankenpost.de
@JBGruber
Copy link
Owner

Excellent! The for developers vignette is unfortunatley outdated. This is the approach I currently use:

# get some data to inspect and test
rss_url <- "http://www.bild.de/rssfeeds/rss3-20745882,feed=alles.bild.html"
test_df <- pb_collect(rss_url,
                      timeout = 60,
                      ignore_fails = TRUE)

# set up new parser
use_new_parser("https://www.bild.de/", 
               author = "[@schochastics](https://github.com/schochastics)", 
               issue = "[#23](https://github.com/JBGruber/paperboy/issues/23)", 
               rss = rss_url,
               test_data = test_df)

# see what has been collected in your browser
pb_inspect(test_df)

# when the parser looks roughly done, you can run the same function again to run
# the tests
use_new_parser("https://www.bild.de/", 
               author = "[@schochastics](https://github.com/schochastics)", 
               issue = "[#23](https://github.com/JBGruber/paperboy/issues/23)", 
               rss = rss_url,
               test_data = test_df)

@schochastics
Copy link
Contributor Author

welt.de seems to timeout users quickly when the rss feed is accessed too many times

JBGruber added a commit that referenced this issue Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants