Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow to check for URL validity is needed #17

Open
smalers opened this issue Aug 16, 2022 · 1 comment
Open

Workflow to check for URL validity is needed #17

smalers opened this issue Aug 16, 2022 · 1 comment
Assignees
Labels
enhancement high Priority: next release if possible S Size: day or less

Comments

@smalers
Copy link
Contributor

smalers commented Aug 16, 2022

It seems that some websites do not allow the TSTool WebGet command to download the file corresponding to the website URL. Need to figure out how to do this reliably so that a workflow can be implemented to check all the URLs.

@smalers smalers added enhancement high Priority: next release if possible S Size: day or less labels Aug 16, 2022
@smalers smalers self-assigned this Aug 16, 2022
@smalers
Copy link
Contributor Author

smalers commented Feb 13, 2023

I added the command file 00-check-data.tstool, which does a WebGet on each website URL. This requires setting the HTTP User-Agent property as if a web browser is being used because some websites don't allow other software to download the web page. I had to edit the User-Agent to remove commas because it causes a parse error, and was able to get the command to work. This shows 11 errors that need review. I don't have time to fix right now.

For some reason, reading the spreadsheet is very slow and I have not been able to figure out why. Consequently, debugging and checking the data is more painful than other datasets. For now, move on knowing that at least bad URLs can be detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement high Priority: next release if possible S Size: day or less
Projects
None yet
Development

No branches or pull requests

1 participant