-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do not allow urls in name and address fields #115
Conversation
No attempts are made to try and correct the data programmatically, I've gone through and hand-edited a bunch of OSM data to fix the ones I am aware of, I'm sure there are loads more. |
For some reason the integration between Travis & Github seems to be broken, I can see that the tests passed here https://travis-ci.org/pelias/model/builds/506714451 so I'm going to merge this using admin privileges. |
👍 on this PR I've been noticing lots of Travis/Github integration issues. I'm not sure what the cause is but it's definitely annoying |
We have had numerous reports from Pelias users about concerning error message during builds regarding the URL regex filter from pelias/model#115. While this filter is good, the resulting error message is alarming. Looking today at the output of a planet build, it appears that many of these errors come from the polylines file created by Valhalla out of the OSM street network. Looking at the contents of the polyline file and corresponding record on OSM, it seems that Valhalla puts the contents of the `ref` tag in the polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will often contain a URL. This means that not only will the error happen frequently, but many records that are actaully valid will be filtered out. An example of this is the [Iowa Women of Achievement bridge](ttps://www.openstreetmap.org/way/65066830) which is completely valid in terms of name, geometry, and tagging but contains a URL in the `ref` field. The polylines importer currently selects a single name value from the list of names in the polylines file by choosing the longest. This PR adds an additional filter that first removes any URL-like values from consideration, and should completely eliminate any of the otherwise concerning errors while ensuring all valid records make it into Elasticsearch. Fixes pelias/whosonfirst#456 Fixes #216 Fixes pelias/docker#89 Connects pelias/model#116
We have had numerous reports from Pelias users about concerning error message during builds regarding the URL regex filter from pelias/model#115. While this filter is good, the resulting error message is alarming. Looking today at the output of a planet build, it appears that many of these errors come from the polylines file created by Valhalla out of the OSM street network. Looking at the contents of the polyline file and corresponding record on OSM, it seems that Valhalla puts the contents of the `ref` tag in the polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will often contain a URL. This means that not only will the error happen frequently, but many records that are actaully valid will be filtered out. An example of this is the [Iowa Women of Achievement bridge](ttps://www.openstreetmap.org/way/65066830) which is completely valid in terms of name, geometry, and tagging but contains a URL in the `ref` field. The polylines importer currently selects a single name value from the list of names in the polylines file by choosing the longest. This PR adds an additional filter that first removes any URL-like values from consideration, and should completely eliminate any of the otherwise concerning errors while ensuring all valid records make it into Elasticsearch. Fixes pelias/whosonfirst#456 Fixes #216 Fixes pelias/docker#89 Connects pelias/model#116
We have had numerous reports from Pelias users about concerning error message during builds regarding the URL regex filter from pelias/model#115. While this filter is good, the resulting error message is alarming. Looking today at the output of a planet build, it appears that many of these errors come from the polylines file created by Valhalla out of the OSM street network. Looking at the contents of the polyline file and corresponding record on OSM, it seems that Valhalla puts the contents of the `ref` tag in the polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will often contain a URL. This means that not only will the error happen frequently, but many records that are actaully valid will be filtered out. An example of this is the [Iowa Women of Achievement bridge](ttps://www.openstreetmap.org/way/65066830) which is completely valid in terms of name, geometry, and tagging but contains a URL in the `ref` field. The polylines importer currently selects a single name value from the list of names in the polylines file by choosing the longest. This PR adds an additional filter that first removes any URL-like values from consideration, and should completely eliminate any of the otherwise concerning errors while ensuring all valid records make it into Elasticsearch. Fixes pelias/whosonfirst#456 Fixes #216 Fixes pelias/docker#89 Connects pelias/model#116
We recently discovered some open-data which incorrectly contains URLs in the
name
&housenumber
fields.Unfortunately, this seems to be a common problem in openstreetmap due to human error.
This PR enforces a regex check on
name
andaddress
fields to ensure that they do not contain URLs.