Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid regex test #89

Closed
jeremy-rutman opened this issue Apr 9, 2019 · 1 comment · Fixed by pelias/polylines#225
Closed

invalid regex test #89

jeremy-rutman opened this issue Apr 9, 2019 · 1 comment · Fixed by pelias/polylines#225

Comments

@jeremy-rutman
Copy link
Contributor

jeremy-rutman commented Apr 9, 2019

During pelias import all (or maybe the test run) I hit some 'invalid regex test' errors as below. Also I'm looking for guidance if th number of misses vs. hits should look so skewed towards misses.

...
[pbf2json]: 2019/04/09 15:55:03 denormalize failed for way: 679144706 node not found: 6359467361                                                                                                                                                                                                                    
...
{"calls":33955360,"hits":325910,"misses":33629450}
info: [admin-lookup:worker] region worker process exiting, stats: {"calls":33629450,"hits":33035765,"misses":593685}
info: [admin-lookup:worker] localadmin worker process exiting, stats: {"calls":59668214,"hits":11729866,"misses":47938348}
info: [admin-lookup:worker] empire worker process exiting, stats: {"calls":119193,"hits":0,"misses":119193}
info: [admin-lookup:worker] ocean worker process exiting, stats: {"calls":34409,"hits":17605,"misses":16804}
info: [admin-lookup:worker] dependency worker process exiting, stats: {"calls":593685,"hits":22009,"misses":571676}
info: [admin-lookup:worker] neighbourhood worker process exiting, stats: {"calls":100315152,"hits":13602598,"misses":86712554}
info: [admin-lookup:worker] borough worker process exiting, stats: {"calls":100315152,"hits":3206852,"misses":97108300}
info: [admin-lookup:worker] locality worker process exiting, stats: {"calls":97108300,"hits":37440086,"misses":59668214}
info: [admin-lookup:worker] country worker process exiting, stats: {"calls":571676,"hits":452483,"misses":119193}
info: [admin-lookup:worker] continent worker process exiting, stats: {"calls":119193,"hits":36323,"misses":82870}
info: [admin-lookup:worker] macrocounty worker process exiting, stats: {"calls":34048540,"hits":93180,"misses":33955360}
info: [admin-lookup:worker] county worker process exiting, stats: {"calls":47938348,"hits":13889808,"misses":34048540}
info: [dbclient-openstreetmap]  paused=false, transient=0, current_length=0, indexed=100315152, batch_ok=200631, batch_retries=0, failed_records=0, venue=22752709, address=77562443, persec=165.2
...
info: [wof-pip-service:master] starting with layers neighbourhood,borough,locality,localadmin,county,macrocounty,macroregion,region,dependency,country,empire,continent,marinearea,ocean
info: [wof-pip-service:master] empire worker loaded 0 features in 0.052 seconds
info: [wof-pip-service:master] ocean worker loaded 7 features in 0.09 seconds
info: [wof-pip-service:master] dependency worker loaded 32 features in 0.643 seconds
info: [wof-pip-service:master] continent worker loaded 8 features in 0.793 seconds
info: [wof-pip-service:master] macrocounty worker loaded 23 features in 0.818 seconds
info: [wof-pip-service:master] macroregion worker loaded 25 features in 0.842 seconds
info: [wof-pip-service:master] marinearea worker loaded 305 features in 2.66 seconds
info: [wof-pip-service:master] borough worker loaded 138 features in 3.983 seconds
info: [wof-pip-service:master] country worker loaded 199 features in 9.17 seconds
info: [wof-pip-service:master] region worker loaded 4268 features in 51.517 seconds
info: [wof-pip-service:master] county worker loaded 24845 features in 155.089 seconds
info: [wof-pip-service:master] neighbourhood worker loaded 17726 features in 259.723 seconds
info: [wof-pip-service:master] localadmin worker loaded 99206 features in 412.208 seconds
info: [wof-pip-service:master] locality worker loaded 143249 features in 506.709 seconds
info: [wof-pip-service:master] PIP Service Loading Completed!!!
info: [dbclient-polylines]  paused=false, transient=1, current_length=204
...
info: [dbclient-polylines]  paused=false, transient=0, current_length=459, indexed=276000, batch_ok=552, batch_retries=0, failed_records=0, street=276000, persec=4000
error: [polyline] polyline document error message=invalid regex test, http://www.hembygd.se/lagunda/nysatra-kyrkstig/ should not match /https?:\/\//, stack=PeliasModelError: invalid regex test, http://www.hembygd.se/lagunda/nysatra-kyrkstig/ should not match /https?:\/\//
    at Object.nomatch (/code/pelias/polylines/node_modules/pelias-model/util/valid.js:117:13)
    at Document.setName (/code/pelias/polylines/node_modules/pelias-model/Document.js:258:18)
    at DestroyableTransform._transform (/code/pelias/polylines/stream/document.js:30:11)
    at DestroyableTransform.Transform._read (/code/pelias/polylines/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:177:10)
    at DestroyableTransform.Readable.read (/code/pelias/polylines/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:440:10)
    at flow (/code/pelias/polylines/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:898:34)
    at ParallelTransform.pipeOnDrainFunctionResult (/code/pelias/polylines/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:708:7)
    at ParallelTransform.emit (events.js:182:13)
    at onwriteDrain (/code/pelias/polylines/node_modules/parallel-transform/node_modules/readable-stream/lib/_stream_writable.js:501:12)
    at afterWrite (/code/pelias/polylines/node_modules/parallel-transform/node_modules/readable-stream/lib/_stream_writable.js:489:18), name=PeliasModelError
error: [polyline] polyline document error message=invalid regex test, http://www.hembygd.se/lagunda/nysatra-kyrkstig/ should not match /https?:\/\//, stack=PeliasModelError: invalid regex test, http://www.hembygd.se/lagunda/nysatra-kyrkstig/ should not match /https?:\/\//
    at Object.nomatch (/code/pelias/polylines/node_modules/pelias-model/util/valid.js:117:13)
    at Document.setName (/code/pelias/polylines/node_modules/pelias-model/Document.js:258:18)
    at DestroyableTransform._transform (/code/pelias/polylines/stream/document.js:30:11)
    at DestroyableTransform.Transform._read (/code/pelias/polylines/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:177:10)
    at DestroyableTransform.Readable.read (/code/pelias/polylines/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:440:10)
    at flow (/code/pelias/polylines/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:898:34)
    at ParallelTransform.pipeOnDrainFunctionResult (/code/pelias/polylines/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:708:7)
    at ParallelTransform.emit (events.js:182:13)
    at onwriteDrain (/code/pelias/polylines/node_modules/parallel-transform/node_modules/readable-stream/lib/_stream_writable.js:501:12)
    at afterWrite (/code/pelias/polylines/node_modules/parallel-transform/node_modules/readable-stream/lib/_stream_writable.js:489:18), name=PeliasModelError
info: [dbclient-polylines]  paused=false, transient=1, current_length=7, indexed=316000, batch_ok=632, batch_retries=0, failed_records=0, street=316000, persec=4000

@jeremy-rutman
Copy link
Contributor Author

jeremy-rutman commented Apr 15, 2019

closing this and putting in the polylines subrepo , tho the hits vs misses question is still relevant for me

orangejulius added a commit to pelias/polylines that referenced this issue Jul 3, 2019
We have had numerous reports from Pelias users about concerning error
message during builds regarding the URL regex filter from
pelias/model#115.

While this filter is good, the resulting error message is alarming.
Looking today at the output of a planet build, it appears that many of
these errors come from the polylines file created by Valhalla out of the
OSM street network.

Looking at the contents of the polyline file and corresponding record on
OSM, it seems that Valhalla puts the contents of the `ref` tag in the
polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will
often contain a URL.

This means that not only will the error happen frequently, but many
records that are actaully valid will be filtered out.

An example of this is the [Iowa Women of Achievement
bridge](ttps://www.openstreetmap.org/way/65066830) which is completely
valid in terms of name, geometry, and tagging but contains a URL in the
`ref` field.

The polylines importer currently selects a single name value from the
list of names in the polylines file by choosing the longest.

This PR adds an additional filter that first removes any URL-like values
from consideration, and should completely eliminate any of the otherwise
concerning errors while ensuring all valid records make it into
Elasticsearch.

Fixes pelias/whosonfirst#456
Fixes #216
Fixes pelias/docker#89
Connects pelias/model#116
orangejulius added a commit to pelias/polylines that referenced this issue Jul 3, 2019
We have had numerous reports from Pelias users about concerning error
message during builds regarding the URL regex filter from
pelias/model#115.

While this filter is good, the resulting error message is alarming.
Looking today at the output of a planet build, it appears that many of
these errors come from the polylines file created by Valhalla out of the
OSM street network.

Looking at the contents of the polyline file and corresponding record on
OSM, it seems that Valhalla puts the contents of the `ref` tag in the
polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will
often contain a URL.

This means that not only will the error happen frequently, but many
records that are actaully valid will be filtered out.

An example of this is the [Iowa Women of Achievement
bridge](ttps://www.openstreetmap.org/way/65066830) which is completely
valid in terms of name, geometry, and tagging but contains a URL in the
`ref` field.

The polylines importer currently selects a single name value from the
list of names in the polylines file by choosing the longest.

This PR adds an additional filter that first removes any URL-like values
from consideration, and should completely eliminate any of the otherwise
concerning errors while ensuring all valid records make it into
Elasticsearch.

Fixes pelias/whosonfirst#456
Fixes #216
Fixes pelias/docker#89
Connects pelias/model#116
orangejulius added a commit to pelias/polylines that referenced this issue Jul 3, 2019
We have had numerous reports from Pelias users about concerning error
message during builds regarding the URL regex filter from
pelias/model#115.

While this filter is good, the resulting error message is alarming.
Looking today at the output of a planet build, it appears that many of
these errors come from the polylines file created by Valhalla out of the
OSM street network.

Looking at the contents of the polyline file and corresponding record on
OSM, it seems that Valhalla puts the contents of the `ref` tag in the
polyline file as an alternate name. The [ref tag](https://wiki.openstreetmap.org/wiki/Key:ref?uselang=en-US) will
often contain a URL.

This means that not only will the error happen frequently, but many
records that are actaully valid will be filtered out.

An example of this is the [Iowa Women of Achievement
bridge](ttps://www.openstreetmap.org/way/65066830) which is completely
valid in terms of name, geometry, and tagging but contains a URL in the
`ref` field.

The polylines importer currently selects a single name value from the
list of names in the polylines file by choosing the longest.

This PR adds an additional filter that first removes any URL-like values
from consideration, and should completely eliminate any of the otherwise
concerning errors while ensuring all valid records make it into
Elasticsearch.

Fixes pelias/whosonfirst#456
Fixes #216
Fixes pelias/docker#89
Connects pelias/model#116
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant