Skip to content

Commit

Permalink
feat: multicorn2 (Postgres FDW) backend (#397)
Browse files Browse the repository at this point in the history
* feat: multicorn2 (Postgres FDW) backend

* Adding tests

* Adding tests

* Optimizing SELECT

* Fix tests

* Write API

* Query cost

* Add docs

* Add integration test

* Different strategy

* Another approach

* Rebase

* Fix docker

* Remove entrypoint

* Fix tests
  • Loading branch information
betodealmeida authored Sep 24, 2024
1 parent 57adc2d commit b43fc70
Show file tree
Hide file tree
Showing 30 changed files with 1,352 additions and 29 deletions.
11 changes: 11 additions & 0 deletions .github/workflows/python-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,19 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install -r requirements/test.txt
- name: Start the Postgres service
run: |
docker compose -f postgres/docker-compose.yml up --build -d
- name: Wait for Postgres to become available
run: |
until docker run --network container:postgres-postgres-1 postgres-postgres pg_isready -h postgres -p 5432 -U shillelagh --timeout=90; do sleep 10; done
- name: Test with pytest
env:
SHILLELAGH_ADAPTER_KWARGS: ${{ secrets.SHILLELAGH_ADAPTER_KWARGS }}
run: |
pytest --cov-fail-under=100 --cov=src/shillelagh -vv tests/ --doctest-modules src/shillelagh --with-integration --with-slow-integration
- name: Stop the Postgres service
if: always()
run: |
docker logs postgres-postgres-1
docker compose -f postgres/docker-compose.yml down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,5 @@ ENV/
*.sqlite
*.db
*.swp

multicorn2
6 changes: 4 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@
Changelog
=========

Next
====
Next (1.3.0)
============

- New Postgres backend based on multicorn2 (#397)

Version 1.2.28 - 2024-09-11
===========================
Expand Down
19 changes: 19 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,25 @@ And a command-line utility:
$ shillelagh
sql> SELECT * FROM a_table
There is also an [experimental backend](https://shillelagh.readthedocs.io/en/latest/postgres.html) that uses Postgres with the [Multicorn2](http://multicorn2.org/) extension:

.. code-block:: python
from shillelagh.backends.multicorn.db import connect
connection = connect(
user="username",
password="password",
host="localhost",
port=5432,
database="examples",
)
.. code-block:: python
from sqlalchemy import create_engine
engine = create_engine("shillelagh+multicorn2://username:password@localhost:5432/examples")
Why SQL?
========

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Contents
Usage <usage>
Adapters <adapters>
Creating a new adapter <development>
Postgres backend <postgres>
License <license>
Authors <authors>
Changelog <changelog>
Expand Down
23 changes: 23 additions & 0 deletions docs/postgres.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
.. _postgres:

================
Postgres backend
================

Since version 1.3 Shillelagh ships with an experimental backend that uses Postgres instead of SQLite. The backend implements a custom [pyscopg2](https://pypi.org/project/psycopg2/) cursor that automatically registers a foreign data wrapper (FDW) whenever a supported table is accessed. It's based on the [multicorn2](http://multicorn2.org/) extension and Python package.

To use the backend you need to:

1. Install the [Multicorn2](http://multicorn2.org/) extension.
2. Install the multicorn2 Python package in the machine running Postgres. Note that this is not the "multicorn" package available on PyPI. You need to download the source and install it manually.
3. Install Shillelagh in the machine running Postgres.

Note that you need to install Python packages in a way that they are available to the process running Postgres. You can either install them globally, or install them in a virtual environment and have it activated in the process that starts Postgres.

The ``postgres/`` directory has a Docker configuration that can be used to test the backend, or as a basis for installation. To run it, execute:

.. code-block:: bash
docker compose -f postgres/docker-compose.yml up
You should then be able to run the example script in `examples/postgres.py`_ to test that everything works.
31 changes: 31 additions & 0 deletions examples/postgres.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
"""
Simple multicorn2 test.
Multicorn2 is an extension for PostgreSQL that allows you to create foreign data wrappers
in Python. To use it, you need to install on the machine running Postgres the extension,
the multicorn2 package (not on (PyPI), and the shillelagh package.
If you want to play with it Shillelagh has a `docker-compose.yml` file that will run
Postgres with the extension and the Python packages. Just run:
$ cd postgres/
$ docker compose up --build -d
Then you can run this script.
"""

from sqlalchemy import create_engine

# the backend uses psycopg2 under the hood, so any valid connection string for it will
# work; just replace the scheme with `shillelagh+multicorn2`
engine = create_engine(
"shillelagh+multicorn2://shillelagh:shillelagh123@localhost:5432/shillelagh",
)
connection = engine.connect()

SQL = (
'SELECT * FROM "https://docs.google.com/spreadsheets/d/'
'1LcWZMsdCl92g7nA-D6qGRqg1T5TiHyuKJUY1u9XAnsk/edit#gid=0"'
)
for row in connection.execute(SQL):
print(row)
38 changes: 38 additions & 0 deletions postgres/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Use the official Postgres image as a base
FROM postgres:13

WORKDIR /code
COPY . /code

# Use root for package installation
USER root

# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
git \
postgresql-server-dev-13 \
python3 \
python3-dev \
python3-pip \
python3-venv \
wget

# Download, build, and install multicorn2
RUN wget https://github.com/pgsql-io/multicorn2/archive/refs/tags/v2.5.tar.gz && \
tar -xvf v2.5.tar.gz && \
cd multicorn2-2.5 && \
make && \
make install


# Create a virtual environment and install dependencies
RUN python3 -m venv /code/venv && \
/code/venv/bin/pip install --upgrade pip && \
/code/venv/bin/pip install -e '.[all]'

# Set environment variable for PostgreSQL to use the virtual environment
ENV PATH="/code/venv/bin:$PATH"

# Switch back to the default postgres user
USER postgres
19 changes: 19 additions & 0 deletions postgres/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
version: '3.8'

services:
postgres:
build:
context: ..
dockerfile: postgres/Dockerfile
environment:
POSTGRES_PASSWORD: shillelagh123
POSTGRES_USER: shillelagh
POSTGRES_DB: shillelagh
volumes:
- db_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
ports:
- "5432:5432"

volumes:
db_data:
1 change: 1 addition & 0 deletions postgres/init.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CREATE EXTENSION IF NOT EXISTS multicorn;
7 changes: 7 additions & 0 deletions requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ certifi==2022.6.15
# via requests
charset-normalizer==2.1.0
# via requests
exceptiongroup==1.1.3
# via cattrs
greenlet==2.0.2
# via
# shillelagh
Expand All @@ -45,9 +47,14 @@ sqlalchemy==1.4.39
# via shillelagh
typing-extensions==4.3.0
# via shillelagh
# via
# cattrs
# shillelagh
url-normalize==1.4.3
# via requests-cache
urllib3==1.26.10
# via
# requests
# requests-cache
zipp==3.15.0
# via importlib-metadata
4 changes: 4 additions & 0 deletions requirements/test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ lazy-object-proxy==1.7.1
# via astroid
mccabe==0.7.0
# via pylint
multicorn @ git+https://github.com/pgsql-io/multicorn2.git@v2.5
# via shillelagh
multidict==6.0.2
# via
# aiohttp
Expand Down Expand Up @@ -137,6 +139,8 @@ psutil==5.9.1
# via shillelagh
pyarrow==16.0.0
# via shillelagh
psycopg2-binary==2.9.9
# via shillelagh
pyasn1==0.4.8
# via
# pyasn1-modules
Expand Down
10 changes: 10 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -80,13 +80,16 @@ testing =
google-auth>=1.23.0
holidays>=0.23
html5lib>=1.1
jsonpath-python>=1.0.5
multicorn @ git+https://github.com/pgsql-io/multicorn2.git@v2.5
pandas>=1.2.2
pip-tools>=6.4.0
pre-commit>=2.13.0
pip-compile-multi>=2.6.3
prison>=0.2.1
prompt_toolkit>=3
psutil>=5.8.0
psycopg2-binary>=2.9.9
pyarrow>=14.0.1
pyfakefs>=4.3.3
pygments>=2.8
Expand All @@ -111,10 +114,13 @@ all =
google-auth>=1.23.0
holidays>=0.23
html5lib>=1.1
jsonpath-python>=1.0.5
multicorn @ git+https://github.com/pgsql-io/multicorn2.git@v2.5
pandas>=1.2.2
prison>=0.2.1
prompt_toolkit>=3
psutil>=5.8.0
psycopg2-binary>=2.9.9
pyarrow>=14.0.1
pygments>=2.8
python-graphql-client>=0.4.3
Expand Down Expand Up @@ -153,6 +159,9 @@ htmltableapi =
beautifulsoup4>=4.11.1
html5lib>=1.1
pandas>=1.2.2
multicorn =
multicorn @ git+https://github.com/pgsql-io/multicorn2.git@v2.5
psycopg2-binary>=2.9.9
pandasmemory =
pandas>=1.2.2
s3selectapi =
Expand Down Expand Up @@ -184,6 +193,7 @@ sqlalchemy.dialects =
shillelagh.apsw = shillelagh.backends.apsw.dialects.base:APSWDialect
shillelagh.safe = shillelagh.backends.apsw.dialects.safe:APSWSafeDialect
gsheets = shillelagh.backends.apsw.dialects.gsheets:APSWGSheetsDialect
shillelagh.multicorn2 = shillelagh.backends.multicorn.dialects.base:Multicorn2Dialect
console_scripts =
shillelagh = shillelagh.console:main
# For example:
Expand Down
5 changes: 3 additions & 2 deletions src/shillelagh/backends/apsw/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -286,9 +286,10 @@ def _drop_table_uri(self, operation: str) -> Optional[str]:
operation = "\n".join(
line for line in operation.split("\n") if not line.strip().startswith("--")
)
schema = re.escape(self.schema)
regexp = re.compile(
rf"^\s*DROP\s+TABLE\s+(IF\s+EXISTS\s+)?"
rf'({self.schema}\.)?(?P<uri>(.*?)|(".*?"))\s*;?\s*$',
r"^\s*DROP\s+TABLE\s+(IF\s+EXISTS\s+)?"
rf'({schema}\.)?(?P<uri>(.*?)|(".*?"))\s*;?\s*$',
re.IGNORECASE,
)
if match := regexp.match(operation):
Expand Down
15 changes: 12 additions & 3 deletions src/shillelagh/backends/apsw/dialects/base.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
"""
A SQLALchemy dialect.
"""

# pylint: disable=protected-access, abstract-method
"""A SQLALchemy dialect."""

from typing import Any, Dict, List, Optional, Tuple, cast

Expand Down Expand Up @@ -102,7 +105,6 @@ def has_table( # pylint: disable=unused-argument
connection: _ConnectionFairy,
table_name: str,
schema: Optional[str] = None,
info_cache: Optional[Dict[Any, Any]] = None,
**kwargs: Any,
) -> bool:
"""
Expand All @@ -111,7 +113,14 @@ def has_table( # pylint: disable=unused-argument
try:
get_adapter_for_table_name(connection, table_name)
except ProgrammingError:
return False
return bool(
super().has_table(
connection,
table_name,
schema,
**kwargs, # pylint: disable=unused-argument
),
)
return True

# needed for SQLAlchemy
Expand Down
23 changes: 2 additions & 21 deletions src/shillelagh/backends/apsw/vt.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@
StringDuration,
StringInteger,
)
from shillelagh.filters import Filter, Operator
from shillelagh.lib import best_index_object_available, deserialize
from shillelagh.filters import Operator
from shillelagh.lib import best_index_object_available, deserialize, get_bounds
from shillelagh.typing import (
Constraint,
Index,
Expand Down Expand Up @@ -245,25 +245,6 @@ def get_order(
]


def get_bounds(
columns: Dict[str, Field],
all_bounds: DefaultDict[str, Set[Tuple[Operator, Any]]],
) -> Dict[str, Filter]:
"""
Combine all filters that apply to each column.
"""
bounds: Dict[str, Filter] = {}
for column_name, operations in all_bounds.items():
column_type = columns[column_name]
operators = {operation[0] for operation in operations}
for class_ in column_type.filters:
if all(operator in class_.operators for operator in operators):
bounds[column_name] = class_.build(operations)
break

return bounds


class VTModule: # pylint: disable=too-few-public-methods
"""
A module used to create SQLite virtual tables.
Expand Down
Empty file.
Loading

0 comments on commit b43fc70

Please sign in to comment.