Skip to content

Commit

Permalink
DOC: sync readme changes to docs (#482)
Browse files Browse the repository at this point in the history
  • Loading branch information
theroggy authored Oct 1, 2024
1 parent 247bf6f commit d4216e9
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 28 deletions.
2 changes: 1 addition & 1 deletion docs/source/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ for working with OGR vector data sources. It is **awesome**, has highly-dedicate
maintainers and contributors, and exposes more functionality than Pyogrio ever will.
This project would not be possible without Fiona having come first.

Pyogrio uses a vectorized (array-oriented) approach for reading and writing
Pyogrio uses a bulk-oriented approach for reading and writing
spatial vector file formats, which enables faster I/O operations. It borrows
from the internal mechanics and lessons learned of Fiona. It uses a stateless
approach to reading or writing data; all data are read or written in a single
Expand Down
47 changes: 20 additions & 27 deletions docs/source/index.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,25 @@
# pyogrio - Vectorized spatial vector file format I/O using GDAL/OGR

Pyogrio provides a
[GeoPandas](https://github.com/geopandas/geopandas)-oriented API to OGR vector
data sources, such as ESRI Shapefile, GeoPackage, and GeoJSON. Vector data sources
have geometries, such as points, lines, or polygons, and associated records
with potentially many columns worth of data.

Pyogrio uses a vectorized approach for reading and writing GeoDataFrames to and
from OGR vector data sources in order to give you faster interoperability. It
uses pre-compiled bindings for GDAL/OGR so that the performance is primarily
limited by the underlying I/O speed of data source drivers in GDAL/OGR rather
than multiple steps of converting to and from Python data types within Python.
# pyogrio - bulk-oriented spatial vector file I/O using GDAL/OGR

Pyogrio provides fast, bulk-oriented read and write access to
[GDAL/OGR](https://gdal.org/en/latest/drivers/vector/index.html) vector data
sources, such as ESRI Shapefile, GeoPackage, GeoJSON, and several others.
Vector data sources typically have geometries, such as points, lines, or
polygons, and associated records with potentially many columns worth of data.

The typical use is to read or write these data sources to/from
[GeoPandas](https://github.com/geopandas/geopandas) `GeoDataFrames`. Because
the geometry column is optional, reading or writing only non-spatial data is
also possible. Hence, GeoPackage attribute tables, DBF files, or CSV files are
also supported.

Pyogrio is fast because it uses pre-compiled bindings for GDAL/OGR to read and
write the data records in bulk. This approach avoids multiple steps of
converting to and from Python data types within Python, so performance becomes
primarily limited by the underlying I/O speed of data source drivers in
GDAL/OGR.

We have seen \>5-10x speedups reading files and \>5-20x speedups writing files
compared to using non-vectorized approaches (Fiona and current I/O support in
GeoPandas).

You can read these data sources into
`GeoDataFrames`, read just the non-geometry columns into Pandas `DataFrames`,
or even read non-spatial data sources that exist alongside vector data sources,
such as tables in a ESRI File Geodatabase, or antiquated DBF files.

Pyogrio also enables you to write `GeoDataFrames` to at least a few different
OGR vector data source formats.

```{warning}
Pyogrio is still at an early version and the API is subject to substantial change.
```
compared to using row-per-row approaches (e.g. Fiona).

```{toctree}
---
Expand Down

0 comments on commit d4216e9

Please sign in to comment.