Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q] Very short line segments in combined.json? #63

Closed
pdh0710 opened this issue Jan 21, 2019 · 8 comments
Closed

[Q] Very short line segments in combined.json? #63

pdh0710 opened this issue Jan 21, 2019 · 8 comments

Comments

@pdh0710
Copy link

pdh0710 commented Jan 21, 2019

I found lots of very short line segments in combined.js. The minimum length of the short line segments is about 1/300 arc seconds, which is not practical. Though I did not check all of them, some of them look like just line glitches, which could be generated by erroneous software operations, such as careless converting rasterized lines to vector lines.
I tried to simplify those lines, because they are disturbing look-up program. But I found that the problems are not simple. If I simplify a timezone boundary, it can overlap or make gaps with adjacent timezone boundary.

How these short lines were generated, and how I can safely remove/simplify them?

@evansiroky
Copy link
Owner

There are 3 methods in which the boundary lines are generated or manipulated in the boundary building process.

  1. Most initial data comes from OpenStreetMap, so it is possible that some of those really short line segments come from OSM itself.
  2. This library uses the union and difference geometric operations to build boundaries. It is possible that some very short line segments could result from the resulting boundaries generated from those operations.
  3. In order to satisfy request for a smaller version of the combined.json #17, this library uses a precision reducer to chop off excessively precise coordinates. This actually probably has the least effect as the line segments are probably already close together at this point due to the reasons listed in 1 and 2.

As I mentioned in #17, I still think it is best for downstream users of the data produced in this project to pursue data simplification at their own risk and let this project have some of the artifacts of absurdly precise data so long as the data is not wrong.

@evansiroky
Copy link
Owner

I just thought of one other boundary manipulation process that occurs in this code. In order to resolve #11, this script will look at the resulting boundaries and find and remove any holes in polygons that have a very small area. This should result in less of those very short line segments since they might be removed altogether.

@pdh0710
Copy link
Author

pdh0710 commented Jan 22, 2019

Thank you @evansiroky.

One of the results I found was that 209 line segments are having the same minimum length = 9.999999974752427e-7. So I concluded they are generated by a software operation, rather than human input.
And when I checked one of the minimum length lines, it was just a line glitch in the middle of straight lines. I had seen that this kind of line glitch is generated by careless converting rasterized lines to vector lines. I guessed the original OpenStreetMap boundary data is likely to be the cause. But I was not sure. So I asked this question.

By the way, Natural Earth site also offers the country boundary line data. Though the update of the data seems to be a bit slow, it seems that the site is offering high resolution and reliable data.
Can I know why you are not using the Natural Earth boundary data?

@evansiroky
Copy link
Owner

I'm also not sure what the glitch you are describing is the result of, especially since it is unclear where exactly any of these glitches are happening.

With regards to the natural earth data, that data is not used for 3 reasons.

  1. The boundary data is vastly inferior to OpenStreetMap data. See this example of the natural earth borders overlaid on an OpenStreetMap background (the Natural Earth data is the black line of borders between countries):

screen shot 2019-01-21 at 4 55 56 pm

As you can see, the data in OpenStreetMap is much more precise and accurate.

  1. The natural earth data lacks information about territorial waters.
  2. The Natural Earth data does not appear to be immediately editable. If there are issues with OpenStreetMap data, I can go and fix the issue myself. Then, in typically less than a minute I can re-run this project's download script and obtain the corrected data.

@pdh0710
Copy link
Author

pdh0710 commented Jan 22, 2019

Oh my... The natural earth data is not more reliable than I thought. Thank you @evansiroky for your great job.

@pdh0710
Copy link
Author

pdh0710 commented Jan 22, 2019

Below 4 images(screen captured) are the examples of the glitch lines. The marker positions are the start points of the glitch lines. The coordinates of the markers are shown in right side of bottom message.
First 3 images are the examples of glitches in the middle of straight lines. Last image is a example of glitch in bending lines.
The glitches can not be identified with eyes. They are too short.

  • Timezone : Europe/Amsterdam

t0

t1

t2

  • Timezone : Africa/Johannesburg

t3

@evansiroky
Copy link
Owner

Thanks for posting these examples and coordinates. I'm rather busy with my day job right now, but I'm reopening this as a reminder to take a look before the next release.

@evansiroky evansiroky reopened this Jan 29, 2019
@evansiroky
Copy link
Owner

For the first Europe/Amsterdam timezone I'm not noticing any super close coordinates from OpenStreetMap. However, in the second example, there are two nodes as follows:

{
  "type": "node",
  "id": 2156620436,
  "lat": 51.4536259,
  "lon": 4.6945512
},
{
  "type": "node",
  "id": 2156620437,
  "lat": 51.4536266,
  "lon": 4.6945517
}

And the third example has the following extremely close together nodes:

{
  "type": "node",
  "id": 2156629391,
  "lat": 51.4427372,
  "lon": 4.6668953
},
{
  "type": "node",
  "id": 2156629393,
  "lat": 51.4427380,
  "lon": 4.6668952
}

The level of precision here is probably unneeded, but it is coming from OpenStreetMap, so I don't think this is a problem originating from this library. I'm going to go ahead and close this because this project aims to use data that is as close as possible to what is found in OpenStreetMap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants