Skip to content

Commit

Permalink
✨ Add multi files fetch
Browse files Browse the repository at this point in the history
  • Loading branch information
wesen committed Jan 26, 2025
1 parent 36ed60d commit 7173f13
Show file tree
Hide file tree
Showing 19 changed files with 7,331 additions and 53 deletions.
15 changes: 15 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,21 @@
"program": "${workspaceFolder}/cmd/tools/simplify-html",
"args": ["simplify-html", "--file", "/tmp/foxp3.html", "--max-list-items", "5"],
"cwd": "${workspaceFolder}"
},
{
"name": "Test HTML Selector",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/cmd/tools/test-html-selector",
"args": [
"test-html-selector",
"--config",
"cmd/tools/test-html-selector/examples/tutorial/01-basic-text.yaml",
"--files",
"cmd/tools/test-html-selector/examples/tutorial/01-basic-text.html"
],
"cwd": "${workspaceFolder}"
}
]
}
7 changes: 6 additions & 1 deletion changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -650,4 +650,9 @@ Added support for processing multiple HTML files and URLs in a single run:
- Added --urls flag for processing multiple URLs
- Updated output format to include source information
- Added proper error handling for each source
- Improved template support to handle multiple sources
- Improved template support to handle multiple sources

## Add VSCode launch configuration for test-html-selector
Added a new launch configuration to make it easier to debug the test-html-selector tool with example files.

- Added Test HTML Selector launch configuration in .vscode/launch.json
74 changes: 63 additions & 11 deletions cmd/tools/test-html-selector/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ A command-line tool for testing CSS and XPath selectors against HTML documents.
## Features

- Support for both CSS and XPath selectors
- Process multiple files and URLs in a single run
- Configurable sample count and context size
- YAML configuration for selectors
- DOM path visualization for matched elements
Expand Down Expand Up @@ -39,31 +40,37 @@ config:
sample_count: 5
context_chars: 100
template: | # Optional Go template for formatting output
{{- range $name, $matches := . }}
## {{ $name }}
{{- range . }}
# Results from {{ .Source }}
{{- range $selector, $matches := .Data }}
## {{ $selector }}
{{- range $matches }}
- {{ . }}
{{- end }}
{{ end }}
{{- end }}
{{- end }}
```
2. Run the tool:
```bash
# Basic usage with config file
test-html-selector --config config.yaml --input path/to/input.html
# Basic usage with config file and multiple sources
test-html-selector --config config.yaml --files file1.html file2.html

# Use individual selectors without config file
test-html-selector --input path/to/input.html \
# Process multiple URLs
test-html-selector --urls https://example.com https://example.org \
--select-css ".product-card h2" \
--select-xpath "//div[@class='price']"

# Extract all matches with template formatting
test-html-selector --config config.yaml --input path/to/input.html \
test-html-selector --config config.yaml \
--files file1.html file2.html \
--urls https://example.com \
--extract --extract-template template.tmpl

# Show context and customize output
test-html-selector --config config.yaml --input path/to/input.html \
test-html-selector --config config.yaml \
--files input1.html input2.html \
--show-context --sample-count 10 --context-chars 200
```

Expand All @@ -73,10 +80,12 @@ test-html-selector --config config.yaml --input path/to/input.html \

#### Basic Options
- `--config`: Path to YAML config file
- `--input`: Path to HTML input file (required)
- `--files`: HTML files to process (can be specified multiple times)
- `--urls`: URLs to fetch and process (can be specified multiple times)
- `--select-css`: CSS selectors to test (can be specified multiple times)
- `--select-xpath`: XPath selectors to test (can be specified multiple times)
- `--extract`: Extract all matches into a YAML map of selector name to matches (ignores sample-count limit)
- `--extract-data`: Extract raw data without applying templates
- `--extract-template`: Go template file to render with extracted data
- `--show-context`: Show context around matched elements (default: false)
- `--show-path`: Show path to matched elements (default: true)
Expand Down Expand Up @@ -111,6 +120,7 @@ config:
## Example Output
### Default Format (without --extract)
```yaml
- name: product_titles
selector: .product-card h2
Expand All @@ -130,4 +140,46 @@ config:
path: "html > body > div > div > div > h2" # Only shown with --show-path
```
When using `--extract` with a template, the output format will be determined by your template. The template has access to a map of selector names to their matches, containing ALL matches found (not limited by sample-count). The matches can be text content, markdown, or full document structures depending on your simplification settings.
### Extract Format (with --extract)
```yaml
- source: file1.html
data:
product_titles:
- "Awesome Product 1"
- "Awesome Product 2"
prices:
- "$19.99"
- "$29.99"
- source: https://example.com
data:
product_titles:
- "Example Product"
prices:
- "$9.99"
```
When using `--extract` with a template, the output format will be determined by your template. The template has access to a list of source results, each containing a map of selector names to their matches. The matches can be text content, markdown, or full document structures depending on your simplification settings.

### Template Example
```go
{{- range . }}
# Results from {{ .Source }}
{{- range $selector, $matches := .Data }}
## {{ $selector }}
{{- range $matches }}
- {{ . }}
{{- end }}
{{- end }}
{{- end }}
```

## Template Functions

The tool includes the full set of [Sprig template functions](http://masterminds.github.io/sprig/) for use in templates, including:

- String manipulation: `trim`, `upper`, `lower`, `replace`
- Math functions: `add`, `sub`, `mul`, `div`
- Date formatting: `now`, `date`, `dateModify`
- And many more...

For more detailed examples and best practices, see the [TUTORIAL.md](TUTORIAL.md) file.
Loading

0 comments on commit 7173f13

Please sign in to comment.