Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update pyogrio introduction #481

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 24 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,29 @@
# pyogrio - Vectorized spatial vector file format I/O using GDAL/OGR

Pyogrio provides a
[GeoPandas](https://github.com/geopandas/geopandas)-oriented API to OGR vector
data sources, such as ESRI Shapefile, GeoPackage, and GeoJSON. Vector data sources
have geometries, such as points, lines, or polygons, and associated records
with potentially many columns worth of data.

Pyogrio uses a vectorized approach for reading and writing GeoDataFrames to and
from OGR vector data sources in order to give you faster interoperability. It
uses pre-compiled bindings for GDAL/OGR so that the performance is primarily
limited by the underlying I/O speed of data source drivers in GDAL/OGR rather
than multiple steps of converting to and from Python data types within Python.
# pyogrio - bulk-oriented spatial vector file I/O using GDAL/OGR

Pyogrio provides fast, bulk-oriented read and write access to
[GDAL/OGR](https://gdal.org/en/latest/drivers/vector/index.html) vector data
sources, such as ESRI Shapefile, GeoPackage, GeoJSON, and several others.
Vector data sources typically have geometries, such as points, lines, or
polygons, and associated records with potentially many columns worth of data.

The typical use is to read or write these data sources to/from
[GeoPandas](https://github.com/geopandas/geopandas) `GeoDataFrames`. Because
the geometry column is optional, reading or writing only non-spatial data is
also possible. Hence, GeoPackage attribute tables, DBF files, or CSV files are
also supported.

Pyogrio is fast because it uses pre-compiled bindings for GDAL/OGR to read and
write the data records in bulk. This approach avoids multiple steps of
converting to and from Python data types within Python, so performance becomes
primarily limited by the underlying I/O speed of data source drivers in
GDAL/OGR.

We have seen \>5-10x speedups reading files and \>5-20x speedups writing files
compared to using non-vectorized approaches (Fiona and current I/O support in
GeoPandas).

You can read these data sources into
`GeoDataFrames`, read just the non-geometry columns into Pandas `DataFrames`,
or even read non-spatial data sources that exist alongside vector data sources,
such as tables in a ESRI File Geodatabase, or antiquated DBF files.

Pyogrio also enables you to write `GeoDataFrames` to at least a few different
OGR vector data source formats.
compared to using row-per-row approaches (e.g. Fiona).

Read the documentation for more information:
[https://pyogrio.readthedocs.io](https://pyogrio.readthedocs.io/en/latest/).

WARNING: Pyogrio is still at an early version and the API is subject to
substantial change. Please see [CHANGES](CHANGES.md).

## Requirements

Supports Python 3.9 - 3.13 and GDAL 3.4.x - 3.9.x.
Expand All @@ -52,9 +46,9 @@ for more information.

## Supported vector formats

Pyogrio supports some of the most common vector data source formats (provided
they are also supported by GDAL/OGR), including ESRI Shapefile, GeoPackage,
GeoJSON, and FlatGeobuf.
Pyogrio supports most common vector data source formats (provided they are also
supported by GDAL/OGR), including ESRI Shapefile, GeoPackage, GeoJSON, and
FlatGeobuf.

Please see the [list of supported formats](https://pyogrio.readthedocs.io/en/latest/supported_formats.html)
for more information.
Expand All @@ -64,7 +58,7 @@ for more information.
Please read the [introduction](https://pyogrio.readthedocs.io/en/latest/supported_formats.html)
for more information and examples to get started using Pyogrio.

You can also check out the the [API documentation](https://pyogrio.readthedocs.io/en/latest/api.html)
You can also check out the [API documentation](https://pyogrio.readthedocs.io/en/latest/api.html)
for full details on using the API.

## Credits
Expand Down