-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding svn validations basic flow #1
Conversation
f225097
to
f77a114
Compare
The core driver for action is publish-config file. based on the action rules and mode it operates. |
README.md
Outdated
The `publish-config.yml` file has composed of multiple rules to validate the artifacts and publish them to PyPI. The configuration file is structured as follows: | ||
|
||
```yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config file required to be configured in repos.
README.md
Outdated
publishers: | ||
name: providers | ||
url: https://dist.apache.org/repos/dist/dev/airflow/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plan is to checkout the project and use the path
filed to workout validations
cc: @potiuk |
Here some workflows that triggered using this branch: https://github.com/gopidesupavan/example-workspace/blob/main/publish-config.yml Not sure that publish config structure make sense or not. if there is some better structure i can update :) |
action.yml
Outdated
NAME: ${{ fromJSON(steps.config-parser.outputs.pub_config).publishers.rules.signature-check.name }} | ||
SCRIPT: ${{ fromJSON(steps.config-parser.outputs.pub_config).publishers.rules.signature-check.script }} | ||
KEYS: ${{ fromJSON(steps.config-parser.outputs.pub_config).publishers.rules.signature-check.keys }} | ||
KEYS_FILE_LOCATION: ${{ github.workspace }}/${{ inputs.temp-dir }}/KEYS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can download KEYS from SVN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ... I see we do :) .
action.yml
Outdated
run: | | ||
echo "Verifying $NAME" | ||
chmod +x $SCRIPT | ||
$SCRIPT "$type" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I like this approach whith $SCRIPT. I think much better (and more composable) approach would be to have separate - very small actions for each of those scripts and compose this action out of smaller composable ones. This way we will not have to configure "scripts" in the configuration, but we would configure particular "actions" to be enabled. While it's less flexible, it would be a better abstraction IMHO. I somehow feel being able to provide a "script" to run is almost the same as writing your own action, where what we want, is to provide a user a set of predefined "actions" they can use. We do not want them to write new scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean. That makes sense—using small, composable actions. So, I can split these steps into multiple actions within the same repository. What do you think?
As for the scripts I initially had in mind: the default script we provide would validate the rule. If someone wants to override it and use their custom script, they can specify the script location as a parameter in the config file. For example, in this workflow, the script is sourced from the workflow repository itself https://github.com/gopidesupavan/example-workspace/blob/main/publish-config.yml#L26.
On the other hand, for svn-check, no script is explicitly defined in the workflow https://github.com/gopidesupavan/example-workspace/blob/main/publish-config.yml#L18, so it falls back to the default script available in the action repository: https://github.com/gopidesupavan/gh-svn-pypi-publisher/blob/basic-feat/src/scripts/svn_checker.py.
Does this behavior okay to have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean. That makes sense—using small, composable actions. So, I can split these steps into multiple actions within the same repository. What do you think?
Yep.
As for the scripts I initially had in mind: the default script we provide would validate the rule. If someone wants to override it and use their custom script, they can specify the script location as a parameter in the config file. For example, in this workflow, the script is sourced from the workflow repository itself https://github.com/gopidesupavan/example-workspace/blob/main/publish-config.yml#L26.
I see, but if we we have small composable sub-actions, then it's even less effort to replace such "sub-action" from "our" repo, with your own "sub-action" you write - following the same pattern and writing your own hard-coded script. So in this case the users will get the "customizability" at the "sub-action" level, not the "script" level. Which I think is much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say (conceptually):
Our "composed" action:
Action "Publish to PyPI"
Sub action read configuration
Sub action verify SVN
sub action verify Cheksum
Sub action verify Signature
Sub action Push To PyPI
And someone who would like to customize it, would copy our action to their repo and do this
Action "Publish to PyPI (my own)"
Sub action read configuration
Sub action verify SVN (my own)
sub action verify Cheksum
Sub action verify Signature
Sub action Push To PyPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the "composed" action will be very simple and easy to copy and replace parts of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes small composable actions will definitely help here better way to handle these subsections and lot more easier for me to add this aswell :) great suggestion small composable action 🚀
publish-config.yml
Outdated
- .tar.gz | ||
- .tar.gz.asc | ||
- .tar.gz.sha512 | ||
- -py3-none-any.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We Likely need to make it a bit more flexible and allow regexp here. While Airlow only publishes one wheel, many of other projects that have some binary components will have multiple wheel files for many architectures/ ABIs - https://peps.python.org/pep-0491/#file-name-convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not hardcoded one, any file extension can be provided from the config file. Sure, agree can update with regex support. :)
src/scripts/checksum_check.sh
Outdated
|
||
for i in *.asc | ||
do | ||
echo -e "Checking $i\n"; gpg --verify $i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have... add colors.
src/scripts/checksum_check.sh
Outdated
@@ -0,0 +1,15 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also... One experience from the past... While it is seemingly easier to write BASH scripts in simple cases, it's VERY quickly when the scripts become too complex/difficult to understand and maintain. I'd rather rewrite all those scripts in Python and use uv
to run them in github actions with the nice support for standalone scripts that uv
already has:
https://docs.astral.sh/uv/guides/scripts/#declaring-script-dependencies
This is PEP723 - https://peps.python.org/pep-0723/ that uv
already supports and sooner or later it will be natively supported in Python environment (because that PEP is already approved). I think it's the right time to start using it in our automation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lack of this "inline dependency declarations" was for me the one reason in the past I prefered to choose bash
over Python, but since it is already there, and we have uv
support for running such scripts and automatically creating and using the venv that is needed to do it, has removed the last obstacle for me where python
scripts were not as good as bash
scripts. With this one, I have absolutely no reason to use bash
for scripting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(one thing it allows for example to add colors by declaring rich
as dependency in such inline declaration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure, I like the python scripts to do these tasks. as this is initial draft and tried existing process in this. :). agree bash is bit complex to understand when it grows to bigger 😄
src/scripts/config_parser.py
Outdated
@@ -0,0 +1,44 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about declaring the config structure via json schema (and adding inline requirements to the script to do so).
I think in this case, rather than parsing the file with dedicated parser, we could add validaton in the schema and convert the .yml file into outputs in a generic way:
x:
y:
z
=> outputs.x.y.z
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thats possible we can do that, and thats easy to extract output aswell :) in other steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was scratching my head to set some config structure. with the subsection approach we dont need bother about script handling which is really good thing and it simplifies :) how about this config structure?. anyone uses these actions they have to send input config like this to action . Any suggestion :) ?
project:
Name: some-project
Description: Example project for publishing to PyPI
publisher:
name: providers
url: https://dist.apache.org/repos/dist/release/airflow/
path: "airflow/providers/"
checks:
svn-check:
- id: "extension-check"
description: "Extension check"
identifiers:
- type: "regex"
pattern: '.*'
- id: "package-name-check"
description: "Package name check"
identifiers:
- type: "regex"
pattern: 'apache-airflow-providers-(.*?)-python3-none-any.whl'
checksum-check:
- id: "checksum-check"
description: "SHA512 Check"
algorithm: "512"
signature-check:
- id: "signature-check"
description: "Signature Check"
method: "gpg"
keys: "https://dist.apache.org/repos/dist/release/airflow/KEYS"
publish:
- id: "publish"
description: "Publish to PyPI"
exclude:
- ".*\\.asc"
- ".*\\.sha512"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Believe one more check in svn to validate the number file count per package?
eg: our repo has
The following files should be present (6 files):
.tar.gz + .asc + .sha512 (one set of files per provider)
-py3-none-any.whl + .asc + .sha512 (one set of files per provider)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Config looks good!
- I'd say we should just check if each file has .asc and .sha512 - we do not need to fully verify the number of files, just making sure that all of them are signed and checksumed and that they are right - should be enough, so likely what we have in "checksum" and "signature" check will be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool :) will use same config then and use Json to validate schema.
Good. Looks very close to what I thought functionally :) ! I added few suggestions on how to make use of some newer features in Python ecosystem that makes Python scripting far more appealing than bash scripting, also some suggestion on making more generic config validation, which might help with both - validating the config automatically/avoiding typos as well as generic conversion of such configuration to GitHub outputs. Also I think splitting it to more composable actions used from the "main" action rather than passing SCRIPTs is - I think - better (and more easily reusable) abstraction. |
Yeah, those are really great suggestions and will workout those. 😄 |
There are few yet to be added, tests and final publish action(this is pretty simple) Does the changes structure fine? so now if anyone wants to use this, here is the sample workflow they can configure.
or should we abstract above workflow aswell and then final workflow file something look like this below.
this feels like to much abstraction for me ? WDYT :) |
This is my understand for publish.
|
Alright publish action also added now :) |
Alright this ready for review @potiuk :) when you get a chance.. |
I will look at it tomorrow :) |
Thank you :) |
Also - I am going to do a deep dive into this including reading in detail this one pypa/pip#13048 (review) -> |
So I recommend you to also take a look there :). |
Yeah that discussion is helpful, as mentioned webknjaz in comments about environment and approval, thats really nice. i actually tried out in another repo using these custom actions it pauses workflow until approval and then it starts further validations. In our case we are not building any builds in this current workflow, we are downloading from svn repo this is at present state :) , i feel. but agree when start adding reproducibility checks and further build process, webknjaz made good security point there, we will also to make sure the publish runs in separate workflow. i will also follow that conversion closely thanks for sharing otherwise i would have missed that :) Sure absolutely fine to pull people and review, more feedback never hurts , it always improve things better better :) |
BTW. Shall we already attempt to move it to airlfow repo and start configuring / dry running it ? I think - since we do not realy need airflow sources - that one sounds like a good idea to go to a separate reposiory and we should already run and test it there together - and invite Elad and others to further iterate on it ? |
Yeah , correct we dont need airflow sources we can configure our existing released versions in config and play around :). Should i refer this action in our airflow? or should move all these actions to airflow? and configure. |
Maybe move. I created a repo for you https://github.com/apache/airflow-publish - in about 5-10 minutes you should be able to add things there - you will have write access as with airflow. |
oh thats awesome :) |
We can do test on airflow 2.10.4 :) its currently in process |
That would be awesome if we could try it there - first "dry run" of course. |
Update now, https://github.com/apache/airflow-publish/tree/airflow-artifacts-publish. it looks like workflow dispatch should be in default branch main. cant trigger from other branches |
Okay closing this now..now that we have separate repository exists. Thanks @potiuk for all the suggestions 🙂 and really appreciate your support and guidance. |
No description provided.