Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace support #3404

Closed
5 tasks
konstin opened this issue May 6, 2024 · 3 comments · Fixed by #3705
Closed
5 tasks

Workspace support #3404

konstin opened this issue May 6, 2024 · 3 comments · Fixed by #3705
Labels
preview Experimental behavior

Comments

@konstin
Copy link
Member

konstin commented May 6, 2024

Workspaces are a feature aimed at supporting multiple packages in the same repository. It makes large projects easier to manage by splitting them into smaller packages with independent dependencies. They handle editable installs and publishing automatically.

They are a commonly supported feature in other ecosystems, see the Review of large projects on github and Review of packaging tools. Our implementation will be modeled after cargo and rye.

Usage

There a two main usage patterns: A root package and helpers, and the flat workspace.

Root package and helpers:

albatross
├── packages
│   ├── provider_a
│   │   ├── pyproject.toml
│   │   └── src
│   │       └── provider_a
│   │           ├── __init__.py
│   │           └── foo.py
│   └── provider_b
│       ├── pyproject.toml
│       └── src
│           └── provider_b
│               ├── __init__.py
│               └── bar.py
├── pyproject.toml
├── README.md
├── uv.lock
└── src
    └── albatross
        └── main.py

Flat workspace

albatross
├── packages
│   ├── albatross
│   │   ├── pyproject.toml
│   │   └── src
│   │       └── albatross
│   │           ├── __init__.py
│   │           └── foo.py
│   ├── provider_a
│   │   ├── pyproject.toml
│   │   └── src
│   │       └── provider_a
│   │           ├── __init__.py
│   │           └── foo.py
│   └── provider_b
│       ├── pyproject.toml
│       └── src
│           └── provider_b
│               ├── __init__.py
│               └── bar.py
├── pyproject.toml
├── README.md
└── uv.lock

Structure

A workspace consists of a workspace root and members in the workspace.

The workspace root is a directory with a pyproject.toml, all members need to be below that directory. The workspace root defines members and exclusions. All packages below it must either be a member or excluded. The workspace root can be a package itself or it is a virtual manifest.

Each member is a directory with a pyproject.toml that contains a [project] section. Each member is a python package, with a name, a version and dependencies. Workspace members can depend on other workspace members. You can consider the workspace another package source or index, similar to --find-links.

Workspace discovery

Currently in the uv pip interface, you have to explicitly provide the input files to use for dependencies. In the post-pip interface, uv will be based around workspaces with packages. For that, we need to find the relevant pyproject.toml files.

  • Go from the current directory up until we find a pyproject.toml. This is the current package.
  • If the package pyproject.toml contains a tool.uv.workspace section, it’s also the workspace root.
  • If not, go up again until we find a pyproject.toml
    • If it exists and contains a tool.uv.workspace section:
      • If we’re in the members, it’s the workspace root.
      • If we’re in the excludes, the package pyproject.toml is implicitly the workspace root, don’t go any further up.
      • If we’re in neither, error, that’s not allowed.
    • If it does not exist (we end up at the filesystem root), the package pyproject.toml is implicitly the workspace root (like cargo)
    • If it exists, but does not contain a tool.uv.workspace, ignore, our package in the data or test files of some other package.
  • Check: Go up further and check that there isn’t another invalid workspace root above ours that includes us.
  • Walk through the directories in the members and collect all other workspace member packages
  • Check: Between our workspace root and either member package directory, there isn’t any other stray pyproject.toml
  • Collect all entrypoint/scripts from the entire workspace, so we can run them no matter in which directory we are.
  • Lower the requirements of our target package using the information about the available workspace packages. By default, a required workspace package is installed as editable.

Features beyond MVP

@konstin konstin added the preview Experimental behavior label May 6, 2024
@hauntsaninja
Copy link
Contributor

In our monorepo at work we have a fairly flexible layout, with more nesting / categorisation of projects. The rules we have for package discovery is that a) name in pyproject.toml must match folder name (limiting how much we have to read pyproject.toml files to do anything), b) a package cannot have another package inside it (limiting how much directory walking we need to do).

@zanieb
Copy link
Member

zanieb commented May 17, 2024

cc @potiuk regarding if this would work in Airflow

@potiuk
Copy link

potiuk commented May 20, 2024

cc @potiuk regarding if this would work in Airflow

Generally yes. There are of course some details - particularly about optionality of those providers.

  1. What we really would need in this case are those use cases:

a) power user (or CI) needs to install "albatross" package and all "providers" together and be able to hack on them together.

b) provider contributor (or CI) should be able to install one (or few, or all) of the "providers" separately in editable mode - but wiith the "albatross" package installed from PyPi, or URL rather than from sources. this should allow for example to run the tests for main version of "provider_a" against past released version of "albatross" (or specifig tag / branch of "albatross")

c) albatross contributor should be able to install in editable mode just "albatross" and do not care about all other "providers"

  1. Also an important use for airflow is that the providers might share the same top-level Python package with the main project, so part of their package structure is overlapping:
albatross
├── packages
│   ├── provider_a
│   │   ├── pyproject.toml
│   │   └── src
│   │       └── albatross
│   │           └── providers
│   │                └── provider_a
│   │                   ├── __init__.py
│   │                   └── foo.py
│   └── provider_b
│       ├── pyproject.toml
│       └── src
│   │       └── albatross
│   │           └── providers
│   │                └── provider_b
│   │                   ├── __init__.py
│   │                   └── foo.py
│├── pyproject.toml
├── README.md
├── uv.lock
└── src
    └── albatross
        └── main.py

I guess that will make things quite a bit difficult - because that means that when you do smth like thatin proivder_a:

from albatross.providers.provider_b import foo

It should import foo from provider b.

  1. Another important thing is that this should nicely work with IDEs (free PyCharm for sure). The free PyCharm has limitation to only be able to have one "project" - no modules within project so you will not be able to open and hack together on provider_a and _b or provider_a and albatross (which is very useful to be able to edit both.

konstin added a commit that referenced this issue May 28, 2024
Add workspace support when using `-r <path>/pyproject.toml` or `-e
<path>` in the pip interface. It is limited to all-editable
static-metadata workspaces, and tests only include a single main
workspace, ignoring path dependencies in another workspace. This can be
considered the MVP for workspace support: You can create a workspace,
you can install from it, but some options and conveniences are still
missing. I'll file follow-up tickets (support in lockfiles, support path
deps in other workspace, #3625)

There is also support in `uv run`, but we need
#3700 first to properly support
using different current projects in the bluejay interface, currently the
resolution and therefore the lockfile depends on the current project.
I'd do this change first (it's big enough already), then #3700, and then
add workspace support properly to bluejay.

Fixes #3404
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Experimental behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants