You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Title: Implement Scraping for Fox, CNN, and MSNBC at Article Level Description: Develop a web scraping solution to extract headlines from Fox News, CNN, and MSNBC. Data should be collected by date and at the article level. Leads To: #6 Tasks:
1. Select a web scraping tool (e.g., BeautifulSoup, Scrapy, Puppeteer) for efficient extraction.
2. Set up pipelines to collect article URLs, headlines, authors, and publication dates.
3. Ensure data collection is logged by date and stored at the article level.
4. Implement error handling and retries for failed attempts (up to three retries).
Acceptance Criteria:
• Scraping functions correctly for Fox News, CNN, and MSNBC.
• Extracted data includes URLs, headlines, authors, and publication dates.
• Data is stored in a structured format with clear logging.
• Priority: High
Labels: Backend, Scraping, Data Collection, MVP
The text was updated successfully, but these errors were encountered:
I could start taking a look at this. I think I would start by writing a technical spec. What tech stack would be appropriate? Most of my backend experience is with .NET in C#, so that would be my first impulse, but if there is a more preferred stack I could get up to speed on that.
I do not have a stack preference for this piece. My only concern would be thinking about how we scale this. It seems like this should be a durable pipeline that will work even as we start to manipulate the data differently. Please challenge me on that if I seem to be wrong.
Title: Implement Scraping for Fox, CNN, and MSNBC at Article Level
Description: Develop a web scraping solution to extract headlines from Fox News, CNN, and MSNBC. Data should be collected by date and at the article level.
Leads To: #6
Tasks:
1. Select a web scraping tool (e.g., BeautifulSoup, Scrapy, Puppeteer) for efficient extraction.
2. Set up pipelines to collect article URLs, headlines, authors, and publication dates.
3. Ensure data collection is logged by date and stored at the article level.
4. Implement error handling and retries for failed attempts (up to three retries).
Acceptance Criteria:
• Scraping functions correctly for Fox News, CNN, and MSNBC.
• Extracted data includes URLs, headlines, authors, and publication dates.
• Data is stored in a structured format with clear logging.
• Priority: High
Labels: Backend, Scraping, Data Collection, MVP
The text was updated successfully, but these errors were encountered: