Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding svn validations basic flow #1

Closed
wants to merge 25 commits into from
Closed

Adding svn validations basic flow #1

wants to merge 25 commits into from

Conversation

gopidesupavan
Copy link
Owner

No description provided.

@gopidesupavan
Copy link
Owner Author

The core driver for action is publish-config file. based on the action rules and mode it operates.

README.md Outdated
The `publish-config.yml` file has composed of multiple rules to validate the artifacts and publish them to PyPI. The configuration file is structured as follows:

```yaml
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config file required to be configured in repos.

README.md Outdated
publishers:
name: providers
url: https://dist.apache.org/repos/dist/dev/airflow/
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan is to checkout the project and use the path filed to workout validations

@gopidesupavan
Copy link
Owner Author

cc: @potiuk

@gopidesupavan
Copy link
Owner Author

Here some workflows that triggered using this branch:

https://github.com/gopidesupavan/example-workspace/blob/main/publish-config.yml
https://github.com/gopidesupavan/example-workspace/actions/runs/12104340868/job/33747628251

Not sure that publish config structure make sense or not. if there is some better structure i can update :)

action.yml Outdated
NAME: ${{ fromJSON(steps.config-parser.outputs.pub_config).publishers.rules.signature-check.name }}
SCRIPT: ${{ fromJSON(steps.config-parser.outputs.pub_config).publishers.rules.signature-check.script }}
KEYS: ${{ fromJSON(steps.config-parser.outputs.pub_config).publishers.rules.signature-check.keys }}
KEYS_FILE_LOCATION: ${{ github.workspace }}/${{ inputs.temp-dir }}/KEYS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can download KEYS from SVN

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ... I see we do :) .

action.yml Outdated
run: |
echo "Verifying $NAME"
chmod +x $SCRIPT
$SCRIPT "$type"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I like this approach whith $SCRIPT. I think much better (and more composable) approach would be to have separate - very small actions for each of those scripts and compose this action out of smaller composable ones. This way we will not have to configure "scripts" in the configuration, but we would configure particular "actions" to be enabled. While it's less flexible, it would be a better abstraction IMHO. I somehow feel being able to provide a "script" to run is almost the same as writing your own action, where what we want, is to provide a user a set of predefined "actions" they can use. We do not want them to write new scripts.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. That makes sense—using small, composable actions. So, I can split these steps into multiple actions within the same repository. What do you think?

As for the scripts I initially had in mind: the default script we provide would validate the rule. If someone wants to override it and use their custom script, they can specify the script location as a parameter in the config file. For example, in this workflow, the script is sourced from the workflow repository itself https://github.com/gopidesupavan/example-workspace/blob/main/publish-config.yml#L26.

On the other hand, for svn-check, no script is explicitly defined in the workflow https://github.com/gopidesupavan/example-workspace/blob/main/publish-config.yml#L18, so it falls back to the default script available in the action repository: https://github.com/gopidesupavan/gh-svn-pypi-publisher/blob/basic-feat/src/scripts/svn_checker.py.

Does this behavior okay to have?

Copy link

@potiuk potiuk Dec 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. That makes sense—using small, composable actions. So, I can split these steps into multiple actions within the same repository. What do you think?

Yep.

As for the scripts I initially had in mind: the default script we provide would validate the rule. If someone wants to override it and use their custom script, they can specify the script location as a parameter in the config file. For example, in this workflow, the script is sourced from the workflow repository itself https://github.com/gopidesupavan/example-workspace/blob/main/publish-config.yml#L26.

I see, but if we we have small composable sub-actions, then it's even less effort to replace such "sub-action" from "our" repo, with your own "sub-action" you write - following the same pattern and writing your own hard-coded script. So in this case the users will get the "customizability" at the "sub-action" level, not the "script" level. Which I think is much better.

Copy link

@potiuk potiuk Dec 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say (conceptually):

Our "composed" action:

Action "Publish to PyPI"
    Sub action read configuration
    Sub action verify SVN
    sub action verify Cheksum
    Sub action verify Signature
    Sub action Push To PyPI

And someone who would like to customize it, would copy our action to their repo and do this

Action "Publish to PyPI (my own)"
    Sub action read configuration
    Sub action verify SVN (my own)
    sub action verify Cheksum
    Sub action verify Signature
    Sub action Push To PyPI

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the "composed" action will be very simple and easy to copy and replace parts of it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes small composable actions will definitely help here better way to handle these subsections and lot more easier for me to add this aswell :) great suggestion small composable action 🚀

- .tar.gz
- .tar.gz.asc
- .tar.gz.sha512
- -py3-none-any.whl
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We Likely need to make it a bit more flexible and allow regexp here. While Airlow only publishes one wheel, many of other projects that have some binary components will have multiple wheel files for many architectures/ ABIs - https://peps.python.org/pep-0491/#file-name-convention

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not hardcoded one, any file extension can be provided from the config file. Sure, agree can update with regex support. :)


for i in *.asc
do
echo -e "Checking $i\n"; gpg --verify $i
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have... add colors.

@@ -0,0 +1,15 @@
#!/bin/bash
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also... One experience from the past... While it is seemingly easier to write BASH scripts in simple cases, it's VERY quickly when the scripts become too complex/difficult to understand and maintain. I'd rather rewrite all those scripts in Python and use uv to run them in github actions with the nice support for standalone scripts that uv already has:

https://docs.astral.sh/uv/guides/scripts/#declaring-script-dependencies

This is PEP723 - https://peps.python.org/pep-0723/ that uv already supports and sooner or later it will be natively supported in Python environment (because that PEP is already approved). I think it's the right time to start using it in our automation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lack of this "inline dependency declarations" was for me the one reason in the past I prefered to choose bash over Python, but since it is already there, and we have uv support for running such scripts and automatically creating and using the venv that is needed to do it, has removed the last obstacle for me where python scripts were not as good as bash scripts. With this one, I have absolutely no reason to use bash for scripting.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(one thing it allows for example to add colors by declaring rich as dependency in such inline declaration).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure, I like the python scripts to do these tasks. as this is initial draft and tried existing process in this. :). agree bash is bit complex to understand when it grows to bigger 😄

@@ -0,0 +1,44 @@
import os
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about declaring the config structure via json schema (and adding inline requirements to the script to do so).

I think in this case, rather than parsing the file with dedicated parser, we could add validaton in the schema and convert the .yml file into outputs in a generic way:

x:
   y:
     z

=> outputs.x.y.z

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thats possible we can do that, and thats easy to extract output aswell :) in other steps

Copy link
Owner Author

@gopidesupavan gopidesupavan Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was scratching my head to set some config structure. with the subsection approach we dont need bother about script handling which is really good thing and it simplifies :) how about this config structure?. anyone uses these actions they have to send input config like this to action . Any suggestion :) ?

project:
    Name: some-project
    Description: Example project for publishing to PyPI

publisher:
    name: providers
    url: https://dist.apache.org/repos/dist/release/airflow/
    path: "airflow/providers/"

checks:
  svn-check:
    - id: "extension-check"
      description: "Extension check"
      identifiers:
        - type: "regex"
          pattern: '.*'
          
    - id: "package-name-check"
      description: "Package name check"
      identifiers:
          - type: "regex"
            pattern: 'apache-airflow-providers-(.*?)-python3-none-any.whl'

  checksum-check:
    - id: "checksum-check"
      description: "SHA512 Check"
      algorithm: "512"
      
  signature-check:
    - id: "signature-check"
      description: "Signature Check"
      method: "gpg"
      keys: "https://dist.apache.org/repos/dist/release/airflow/KEYS"
  
  publish:
    - id: "publish"
      description: "Publish to PyPI"
      exclude:
        - ".*\\.asc"
        - ".*\\.sha512"

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Believe one more check in svn to validate the number file count per package?
eg: our repo has

The following files should be present (6 files):

.tar.gz + .asc + .sha512 (one set of files per provider)
-py3-none-any.whl + .asc + .sha512 (one set of files per provider)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Config looks good!
  • I'd say we should just check if each file has .asc and .sha512 - we do not need to fully verify the number of files, just making sure that all of them are signed and checksumed and that they are right - should be enough, so likely what we have in "checksum" and "signature" check will be enough.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool :) will use same config then and use Json to validate schema.

@potiuk
Copy link

potiuk commented Dec 1, 2024

Good. Looks very close to what I thought functionally :) !

I added few suggestions on how to make use of some newer features in Python ecosystem that makes Python scripting far more appealing than bash scripting, also some suggestion on making more generic config validation, which might help with both - validating the config automatically/avoiding typos as well as generic conversion of such configuration to GitHub outputs. Also I think splitting it to more composable actions used from the "main" action rather than passing SCRIPTs is - I think - better (and more easily reusable) abstraction.

@gopidesupavan
Copy link
Owner Author

Good. Looks very close to what I thought functionally :) !

I added few suggestions on how to make use of some newer features in Python ecosystem that makes Python scripting far more appealing than bash scripting, also some suggestion on making more generic config validation, which might help with both - validating the config automatically/avoiding typos as well as generic conversion of such configuration to GitHub outputs. Also I think splitting it to more composable actions used from the "main" action rather than passing SCRIPTs is - I think - better (and more easily reusable) abstraction.

Yeah, those are really great suggestions and will workout those. 😄

@gopidesupavan
Copy link
Owner Author

gopidesupavan commented Dec 7, 2024

There are few yet to be added, tests and final publish action(this is pretty simple)

Does the changes structure fine?

so now if anyone wants to use this, here is the sample workflow they can configure.

name: GH pypi publish
description: 'Publish to PyPI'

on:
  workflow_dispatch:
    inputs:
      publish-config:
        description: 'Path to the publish config file'
        required: true
        default: 'publish-config.yml'
      temp-dir:
        description: 'Temporary directory to checkout svn repo'
        required: false
        default: 'temp-svn-repo'
      mode:
        description: 'Mode to run the action'
        required: false
        default: 'verify'

jobs:
  gh-action-pypi-publish:
    runs-on: ubuntu-latest
    environment: test
    steps:
     - name: Checkout Code
       uses: actions/checkout@v3

     - name: Setup Python
       uses: actions/setup-python@v4
       with:
         python-version: '3.11'

     - name: "Config parser"
       id: config-parser
       uses: ./read-config
       with:
        publish-config: ${{ inputs.publish-config }}

     - name: "Checkout svn ${{ fromJSON(steps.config-parser.outputs.publisher-url) }}"
       id: "svn-checkout"
       uses: github/gh-svn-pypi-publisher/init@main
       with:
         temp-dir: ${{ inputs.temp-dir }}
         repo-url: ${{ fromJSON(steps.config-parser.outputs.publisher-url) }}
         repo-path: ${{ fromJSON(steps.config-parser.outputs.publisher-path) }}

     - name: "Svn check"
       id: "svn-check"
       uses: github/gh-svn-pypi-publisher/svn@main
       with:
        svn-config: ${{ steps.config-parser.outputs.checks-svn }}
        repo-url: ${{ fromJSON(steps.config-parser.outputs.publisher-url) }}
        repo-path: ${{ fromJSON(steps.config-parser.outputs.publisher-path) }}

     - name: "Checksum check"
       id: "checksum-check"
       uses: github/gh-svn-pypi-publisher/checksum@main
       with:
        checksum-config: ${{ steps.config-parser.outputs.checks-checksum }}
        repo-url: ${{ fromJSON(steps.config-parser.outputs.publisher-url) }}
        repo-path: ${{ fromJSON(steps.config-parser.outputs.publisher-path) }}

     - name: "Signature check"
       id: "signature-check"
       uses: github/gh-svn-pypi-publisher/signature@main
       with:
        signature-config: ${{ steps.config-parser.outputs.checks-signature }}
        repo-url: ${{ fromJSON(steps.config-parser.outputs.publisher-url) }}
        repo-path: ${{ fromJSON(steps.config-parser.outputs.publisher-path) }}

or should we abstract above workflow aswell and then final workflow file something look like this below.

name: GH pypi publish
description: 'Publish to PyPI'

on:
  workflow_dispatch:
    inputs:
      publish-config:
        description: 'Path to the publish config file'
        required: true
        default: 'publish-config.yml'
      temp-dir:
        description: 'Temporary directory to checkout svn repo'
        required: false
        default: 'temp-svn-repo'
      mode:
        description: 'Mode to run the action'
        required: false
        default: 'verify'

jobs:
  gh-action-pypi-publish:
    runs-on: ubuntu-latest
    environment: test
    steps:
     - name: Checkout Code
       uses: actions/checkout@v3

    - name: "Publish packages"
       id: "publish-packages"
       uses: github/gh-svn-pypi-publisher/@main
       with:
        checksum-config: ${{ inputs.publish-config }}
        temp-dir ${{ inputs.temp-dir }}
        mode: ${{ inputs.mode}}

this feels like to much abstraction for me ? WDYT :)

@gopidesupavan
Copy link
Owner Author

This is my understand for publish.

  1. For rc release we download all the files from https://dist.apache.org/repos/dist/dev/airflow/providers/pypi/ and publish to pypi?
  2. For final release candidates, we can use this SOURCE_DIR="${ASF_DIST_PARENT}/asf-dist/dev/airflow/providers" and filter all the packages from https://dist.apache.org/repos/dist/release/airflow/providers/ and upload to pypi?

@gopidesupavan
Copy link
Owner Author

gopidesupavan commented Dec 8, 2024

Alright publish action also added now :)

@gopidesupavan
Copy link
Owner Author

Alright this ready for review @potiuk :) when you get a chance..

@gopidesupavan gopidesupavan requested a review from potiuk December 9, 2024 13:33
@potiuk
Copy link

potiuk commented Dec 9, 2024

I will look at it tomorrow :)

@gopidesupavan
Copy link
Owner Author

I will look at it tomorrow :)

Thank you :)

@potiuk
Copy link

potiuk commented Dec 10, 2024

Also - I am going to do a deep dive into this including reading in detail this one pypa/pip#13048 (review) -> pip is working on adding trusted publishing as we speak and there are some real gems there. Once we get it in a shape we are happy with, we might even ask some of the people there to comment on our action + I will get a few people from the ASF to comment here.

@potiuk
Copy link

potiuk commented Dec 10, 2024

So I recommend you to also take a look there :).

@gopidesupavan
Copy link
Owner Author

gopidesupavan commented Dec 10, 2024

Also - I am going to do a deep dive into this including reading in detail this one pypa/pip#13048 (review) -> pip is working on adding trusted publishing as we speak and there are some real gems there. Once we get it in a shape we are happy with, we might even ask some of the people there to comment on our action + I will get a few people from the ASF to comment here.

Yeah that discussion is helpful, as mentioned webknjaz in comments about environment and approval, thats really nice. i actually tried out in another repo using these custom actions it pauses workflow until approval and then it starts further validations. In our case we are not building any builds in this current workflow, we are downloading from svn repo this is at present state :) , i feel. but agree when start adding reproducibility checks and further build process, webknjaz made good security point there, we will also to make sure the publish runs in separate workflow. i will also follow that conversion closely thanks for sharing otherwise i would have missed that :)

Sure absolutely fine to pull people and review, more feedback never hurts , it always improve things better better :)

@gopidesupavan
Copy link
Owner Author

image image

Like this it will ask approval status when we use environments + reviewers , we can add branches also in the github environment settings :)

publish/action.yml Outdated Show resolved Hide resolved
publish/action.yml Outdated Show resolved Hide resolved
@potiuk
Copy link

potiuk commented Dec 12, 2024

BTW. Shall we already attempt to move it to airlfow repo and start configuring / dry running it ? I think - since we do not realy need airflow sources - that one sounds like a good idea to go to a separate reposiory and we should already run and test it there together - and invite Elad and others to further iterate on it ?

@gopidesupavan
Copy link
Owner Author

BTW. Shall we already attempt to move it to airlfow repo and start configuring / dry running it ? I think - since we do not realy need airflow sources - that one sounds like a good idea to go to a separate reposiory and we should already run and test it there together - and invite Elad and others to further iterate on it ?

Yeah , correct we dont need airflow sources we can configure our existing released versions in config and play around :).

Should i refer this action in our airflow? or should move all these actions to airflow? and configure.

@potiuk
Copy link

potiuk commented Dec 12, 2024

Maybe move. I created a repo for you https://github.com/apache/airflow-publish - in about 5-10 minutes you should be able to add things there - you will have write access as with airflow.

@gopidesupavan
Copy link
Owner Author

Maybe move. I created a repo for you https://github.com/apache/airflow-publish - in about 5-10 minutes you should be able to add things there - you will have write access as with airflow.

oh thats awesome :)

@gopidesupavan
Copy link
Owner Author

We can do test on airflow 2.10.4 :) its currently in process

@potiuk
Copy link

potiuk commented Dec 12, 2024

We can do test on airflow 2.10.4 :) its currently in process

That would be awesome if we could try it there - first "dry run" of course.

@gopidesupavan
Copy link
Owner Author

gopidesupavan commented Dec 13, 2024

Update now, https://github.com/apache/airflow-publish/tree/airflow-artifacts-publish. it looks like workflow dispatch should be in default branch main. cant trigger from other branches

@gopidesupavan
Copy link
Owner Author

Okay closing this now..now that we have separate repository exists.

Thanks @potiuk for all the suggestions 🙂 and really appreciate your support and guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants