-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a geoparquet module and add dependencies #855
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for creating this draft so quickly. I just added 2 small questions to the PR.
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/GeoParquetReader.java
Fixed
Show fixed
Hide fixed
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/GeoParquetReader.java
Fixed
Show fixed
Hide fixed
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/data/FeatureGroup.java
Fixed
Show fixed
Hide fixed
baremaps-geoparquet/src/test/java/org/apache/baremaps/geoparquet/GeoParquetReaderTest.java
Fixed
Show fixed
Hide fixed
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/data/GeoParquetGroup.java
Fixed
Show fixed
Hide fixed
...s-geoparquet/src/main/java/org/apache/baremaps/geoparquet/data/GeoParquetGroupConverter.java
Fixed
Show fixed
Hide fixed
Previous attempt #851 |
baremaps-geoparquet/src/test/java/org/apache/baremaps/geoparquet/GeoParquetReaderTest.java
Fixed
Show fixed
Hide fixed
2095d8d
to
0c476c6
Compare
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/data/GeoParquetGroupImpl.java
Fixed
Show fixed
Hide fixed
@sebr72 I just pushed the In terms of organisation, once this interface is stabilized, I can work on a postgis importer with mocks and you can work on the geopackage reader internals. Is that ok for you? @Drabble It would be nice to have your feedback as well. |
configuration.set("fs.s3a.aws.credentials.provider", | ||
"org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"); | ||
configuration.setBoolean("fs.s3a.path.style.access", true); | ||
configuration.setBoolean(AvroReadSupport.READ_INT96_AS_FIXED, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that INT96 is deprecated. https://stackoverflow.com/questions/55829202/unable-to-read-date-format-columns-int96-type-from-avro-parquet-schema-in-apac. Did you get some INT96 fields in the geoparquet files?
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/GeoParquetReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably discuss organising these files further into sub-directories. FileInfo
and DoubleValue
don't really belong together.
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/GeoParquetReader.java
Outdated
Show resolved
Hide resolved
return configuration; | ||
} | ||
|
||
private static URI getRootUri(URI uri) throws URISyntaxException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a standard way to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is a quick and dirty attempt at supporting wildcards. I will put a TODO ;)
super(file, writeSupport, compressionCodecName, blockSize, pageSize, | ||
pageSize, enableDictionary, enableValidation, writerVersion, conf); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
ParquetWriter.ParquetWriter
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/GeoParquetReader.java
Fixed
Show fixed
Hide fixed
...parquet/src/main/java/org/apache/baremaps/geoparquet/hadoop/GeoParquetGroupWriteSupport.java
Outdated
Show resolved
Hide resolved
I did a little bit of reading on the spec of geoparquet. https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md I have a feeling that in our library what we do is extract the data from a generic parquet file and provide in addition the Geoparquet metadata. We don't really do Geoparquet specifc processing except from the metadata parsing. We could provide extra functions to parse the WKB/GeoArrow binary into a Java class that loads up geometries and sets the correct CRS. For Baremaps, I don't think this is really useful as we store raw WKB in the database. But it could be useful for someone using the library. Maybe this should be considered in a second iteration? The big question is, should we return a Logical type called Geometry instead of the binary type? This could also be valid for other logical types like STRING or Date. And should we rename the Regarding the geoparquet metadata, I believe we should provide this metadata with a function like As for writing to Geoparquet files, I think this is not useful for Baremaps. We should maybe consider it as a second or third step. I think there are a lot of question about supporting the entire specification and validating schemas if we try to implement that. Some things to consider for support are:
|
7f321ee
to
455dccc
Compare
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/GeoParquetReader.java
Fixed
Show fixed
Hide fixed
baremaps-core/src/main/java/org/apache/baremaps/storage/geoparquet/GeoParquetTable.java
Fixed
Show fixed
Hide fixed
import org.apache.baremaps.testing.TestFiles; | ||
import org.junit.jupiter.api.Test; | ||
|
||
class GeoParquetDataSchemaTest { |
Check notice
Code scanning / CodeQL
Unused classes and interfaces Note test
|
||
// Iterate over all the files in the path | ||
for (FileStatus file : fileSystem.globStatus(globPath)) { | ||
ParquetFileReader reader = ParquetFileReader.open(configuration, file.getPath()); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
ParquetFileReader.open
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/GeoParquetReader.java
Fixed
Show fixed
Hide fixed
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/GeoParquetReader.java
Fixed
Show fixed
Hide fixed
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/GeoParquetReader.java
Fixed
Show fixed
Hide fixed
baremaps-geoparquet/src/main/java/org/apache/baremaps/geoparquet/GeoParquetReader.java
Fixed
Show fixed
Hide fixed
...maps-core/src/main/java/org/apache/baremaps/storage/geoparquet/GeoParquetTypeConversion.java
Show resolved
Hide resolved
...maps-core/src/test/java/org/apache/baremaps/storage/geoparquet/GeoParquetToPostgresTest.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
public MessageType getParquetSchema() throws IOException, URISyntaxException { | ||
return files().values().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we will have a different Geoparquet DataTable for each subtheme?
I believe each sub theme will have a different schema. https://docs.overturemaps.org/schema/reference/
For example water inside the base theme has a is_salt property and land has an elevation property.
That would make a total of 16 tables with the current Overture Maps spec.
baremaps-core/src/main/java/org/apache/baremaps/storage/geoparquet/GeoParquetDataSchema.java
Outdated
Show resolved
Hide resolved
...maps-core/src/test/java/org/apache/baremaps/storage/geoparquet/GeoParquetToPostgresTest.java
Outdated
Show resolved
Hide resolved
02ef525
to
e05898f
Compare
0f33525
to
8eb6763
Compare
- Cleanup the code - Fix Sonar issues
Quality Gate failedFailed conditions See analysis details on SonarCloud Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
No description provided.