Nightwatch web scraper

Task

First:
- Go to http://google.com.
- Search for "pizza".
- ~~Set the preferences to get 100 results~~ (feature removed).
- Save the HTML.
Create a Go program that does the following:
- Based on CSS selector definitions read from a local JSON file, extract the following information:
  - URL rank type (organic, local, carousel, knowledge panel, featured snippet),
  - Website title, URL, description, and rank position within its own rank type group.
- Format the results in a standard format (JSON or YAML, or CSV).
- The use of well-maintained and mature external libraries is encouraged.
- Ensure optimal memory and CPU usage.
- Results should be consistent, even when one or more result types are missing.
- Bonus points for high performance.

Usage

go run .

Input HTML is read from pizza.html.
Selector groups are read from the group-selectors.json.
Output is saved to result.csv.

Selector groups

Expected JSON format:

{
   "<group name>": {
      "base": "<selector for group item>",
      "title": "<from item, selected by 'base' selector",
      "url": "<from item, selected by 'base' selector",
      "description": "<from item, selected by 'base' selector"
   }
}

Example:

<items-group1>
   <item>
      <title></title>
      <url></url>
      <description></description>
   </item>
   <item>
      ...
   </item>
</items-group1>

{
   "organic": {
      "base": "items-group1 > item",
      "title": "title",
      "url": "url",
      "description": "description"
   }
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
group-selectors.json		group-selectors.json
main.go		main.go
main_test.go		main_test.go
pizza.html		pizza.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nightwatch web scraper

Task

Usage

Selector groups

About

Releases

Packages

Languages

no-more-coffee/webscraper-for-nightwatch

Folders and files

Latest commit

History

Repository files navigation

Nightwatch web scraper

Task

Usage

Selector groups

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages