ETL tools for spatial data stored in postgres.
This module exists to make life easier when working with geospatial data in a Postgres database.
You should have the following command-line tools installed, preferably on your system path:
psql
pg_dump
shp2postgis
ogr2ogr
If you want to use the optional vector tile functions you'll also need:
tippecanoe
pip install pg_data_etl
The following code blocks import spatial data into Postgres and runs a spatial query:
>>> from pg_data_etl import Database
>>> credentials = {
... "db_name": "sample_database",
... "host": "localhost",
... "un": "username",
... "pw": "my-password",
... "super_un": "postgres",
... "super_pw": "superuser-password"
... }
>>> db = Database.from_parameters(**credentials)
>>> db.admin("CREATE")
>>> data_to_import = [
... ("philly.high_injury_network", "https://phl.carto.com/api/v2/sql?filename=high_injury_network_2020&format=geojson&skipfields=cartodb_id&q=SELECT+*+FROM+high_injury_network_2020"),
... ("philly.playgrounds", "https://opendata.arcgis.com/datasets/899c807e205244278b3f39421be8489c_0.geojson")
... ]
>>> for sql_tablename, source_url in data_to_import:
... kwargs = {
... "filepath": source_url,
... "sql_tablename": sql_tablename,
... "gpd_kwargs": {"if_exists":"replace"}
... }
... db.import_gis(**kwargs)
>>> # Define a SQL query as a string in Python
>>> query = """
... select * from philly.high_injury_network
... where st_dwithin(
... st_transform(geom, 26918),
... (select st_transform(st_collect(geom), 26918) from philly.playgrounds),
... 100
... )
... order by st_length(geom) DESC """
>>> # Get a geodataframe from the db using the query
>>> gdf = db.gdf(query)
>>> gdf.head()
index objectid street_name buffer geom uid
0 234 189 BUSTLETON AVE 75 feet LINESTRING (-75.07081 40.03528, -75.07052 40.0... 236
1 65 38 5TH ST 50 feet LINESTRING (-75.14528 39.96913, -75.14502 39.9... 66
2 223 179 ARAMINGO AVE 75 feet LINESTRING (-75.12212 39.97449, -75.12132 39.9... 224
3 148 215 KELLY DR 75 feet LINESTRING (-75.18470 39.96934, -75.18513 39.9... 150
4 156 224 MARTIN LUTHER KING DR 75 feet LINESTRING (-75.17713 39.96327, -75.17775 39.9... 159
To save time and typing, database credentials can be stored in a text file. You can place this file wherever you want,
but by default it's placed into /USERHOME/.pg-data-etl/database_connections.cfg
.
To generate one for the first time, run the following from a terminal prompt:
> pg make-config-file
This file uses the following format:
[DEFAULT]
pw = this-is-a-placeholder-password
port = 5432
super_db = postgres
super_un = postgres
super_pw = this-is-another-placeholder-password
[localhost]
host = localhost
un = postgres
pw = your-password-here
db_name = dbname # this is not in the default file created yet. but was added to make db name part of config file if omitted.
Each entry in square brackets is a named connection, and any parameters not explicitly defined are inherited from DEFAULT
.
You can have as many connections defined as you'd like, and you can use them like this:
>>> from pg_data_etl import Database
>>> db = Database.from_config("sample_database", "localhost")
As noted in db_name comment, if the db name is omitted, it will need to be supplied in the config file. Otherwise you can pass it as shown above this line.
Clone or fork this repo:
git clone https://github.com/aaronfraint/pg-data-etl.git
cd pg-data-etl
Install an editable version with poetry
:
poetry install
Windows users who prefer to use conda
can use the included environment.yml
file:
conda env create -f environment.yml