Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate tasks in make-release.sh to CI actions #216

Closed
1 of 3 tasks
cschwan opened this issue Mar 12, 2023 · 9 comments
Closed
1 of 3 tasks

Migrate tasks in make-release.sh to CI actions #216

cschwan opened this issue Mar 12, 2023 · 9 comments
Assignees

Comments

@cschwan
Copy link
Contributor

cschwan commented Mar 12, 2023

The script make-release.sh has become quite complicated and we should simplify it by breaking it up into smaller tasks, some of which can be CI actions:

  • The cargo publish section can be a CI action similar to our Python wheels action. For this
@cschwan cschwan self-assigned this Mar 12, 2023
@alecandido
Copy link
Member

alecandido commented Mar 12, 2023

Instead of making a complex Bash script (or many Bash scripts to run together), we can use Python: it's still a scripting language, but has the benefit of scaling better (we can always break a script in functions, then modules, and eventually a tiny package, while bash stops at functions).

Moreover, we have more expertise in Python than Bash, and definitely more than Perl:

prerelease=$(echo ${version} | perl -pe 's/^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$/\4/')

(just to prove that Bash alone is not sufficient, and for semver there are existing packages https://github.com/python-semver/python-semver to simplify the code).

I'd also suggest collecting in a single folder the utility scripts we're using. So, not only make-release.sh, but also generate-coverage.sh, update-wheels.sh, and test-ploughshare-conversion.sh (the last one can also fit in examples/).

@cschwan
Copy link
Contributor Author

cschwan commented Mar 12, 2023

I don't particularly like this idea because if we do that we suddenly end up with with a huge dependency chain that breaks on some computers because pip is too old and takes very long to run. I'm also not convinced the Python scripts will be shorter.

My point is: Python isn't a panacea. It has its uses when BASH scripts become more complicated, but I don't we've reached that point yet.

@alecandido
Copy link
Member

So, you can decide that you don't want to add semver nor other package, just to depend on Python and its standard library. This is a fair point, and you'd avoid dependency management. I perfectly agree with this.

And I also agree that Python is not a panacea. It is just Bash that is bad.
If you want to use some other scripting language, like JS (with Node), Ruby, or something else there are pros and cons for each of them, but eventually the best choice is just the one you're more familiar with (while I'd not suggest something like R, since it has a completely different scope).

Bash is optimal to work with files, for as long as your scripts are 5 lines long, and not much more. After a certain threshold, you see all its limits, since it is designed to be good as a shell (i.e. interactive mode), rather than a programming language.
Most likely the difference is that I put the threshold much lower than you, since I really prefer short pure functions in general (even though I fail myself in being short often, purity is just a matter of trade-offs).

One example: it definitely encourages old-style imperative programming, with near-zero modularity and string manipulation everywhere, instead of function calls (not by chance Python and JS are transitioning to types, since they are useful to maintain scripts as well).

Moreover, Bash is a further "language" people should know to contribute. It is very well-known at basic level, i.e. using commands and pipes, but when you come to if conditions, its double-brackets and so on, it is much more obscure to the most, and not so standard (you're naming scripts <script>.sh, but they would not run with sh <script>.sh, so .bash would be a better extension, or drop the extension at all such that people knows they should run as executables, and thus is up to the shebang).

@cschwan
Copy link
Contributor Author

cschwan commented Mar 13, 2023

So, you can decide that you don't want to add semver nor other package, just to depend on Python and its standard library. This is a fair point, and you'd avoid dependency management. I perfectly agree with this.

And I also agree that Python is not a panacea. It is just Bash that is bad. If you want to use some other scripting language, like JS (with Node), Ruby, or something else there are pros and cons for each of them, but eventually the best choice is just the one you're more familiar with (while I'd not suggest something like R, since it has a completely different scope).

I disagree with the premise that Bash is bad, I think it's often a very good choice because the scripts a very succinct. But when it comes to the choice of scripting languages I admit that I'm partial to Bash. There have been many times where I solved a problem with it extremely quickly and efficiently for which other languages would probably have been a worse choice. I say that because I think Bash isn't really much of a language on its own but rather a way to combine many languages together: awk, grep, sed, ... anything you know really.

Bash is optimal to work with files, for as long as your scripts are 5 lines long, and not much more. After a certain threshold, you see all its limits, since it is designed to be good as a shell (i.e. interactive mode), rather than a programming language. Most likely the difference is that I put the threshold much lower than you, since I really prefer short pure functions in general (even though I fail myself in being short often, purity is just a matter of trade-offs).

I agree that I definitely have a much larger threshold 😄, which I admit isn't entirely good.

One example: it definitely encourages old-style imperative programming, with near-zero modularity and string manipulation everywhere, instead of function calls (not by chance Python and JS are transitioning to types, since they are useful to maintain scripts as well).

I really don't care about that when writing Bash scripts, they're supposed to solve a problem, so anything that works is fine. However, you could argue argue that using pipes is very similar to functional programming, which is considered 'modern' these days.

Moreover, Bash is a further "language" people should know to contribute. It is very well-known at basic level, i.e. using commands and pipes, but when you come to if conditions, its double-brackets and so on, it is much more obscure to the most, and not so standard (you're naming scripts <script>.sh, but they would not run with sh <script>.sh, so .bash would be a better extension, or drop the extension at all such that people knows they should run as executables, and thus is up to the shebang).

Bash is one further language, that's true, but so is any other language that would replace it. We're using many programming languages in PineAPPL: Bash, C, C++, Fortran, Python and Rust, and they're all unavoidable, including Bash which is used inside the CI. For instance the ./generate-coverage.sh script, which I used as a playground to generate doctest coverage. If you have look into https://github.com/NNPDF/pineappl/blob/master/.github/workflows/rust.yml you'll see that towards the end it's almost identical to ./generate-coverage.sh (which doesn't generate .lcov files but rather a much more detailed HTML report). I doubt that using any other language will significantly improve this script. The same holds true, in my opinion, for all other scripts (except maybe make-release.sh), which all perform file-heavy tasks.

I'd argue that one should treat Bash like any other language, and https://tldp.org/LDP/abs/html/ is a good place to learn it properly. But many things can be learned on-the-fly, like the difference between single- and double-brackets. However, often I find this difference to be completely irrelevant.

As for the extension, stackoverflow seems to agree that .sh is a common extension, but people argue that none would be better.

But that all being said: look at my original comment, where I argue basically the same point as you are arguing: make-release.sh should be improved.

@alecandido
Copy link
Member

I disagree with the premise that Bash is bad, I think it's often a very good choice because the scripts a very succinct. But when it comes to the choice of scripting languages I admit that I'm partial to Bash. There have been many times where I solved a problem with it extremely quickly and efficiently for which other languages would probably have been a worse choice. I say that because I think Bash isn't really much of a language on its own but rather a way to combine many languages together: awk, grep, sed, ... anything you know really.

So, the statement "Bash is bad" without any context is for sure false, as it would be for any other language or tool (I'm sure even Cobol might have its own perfect applications...).
The problem is that Bash is not a language, exactly as you're saying, so it is good if you can limit to a few command invocations, but it is not good if you have to manipulate data, because at that point you have much poorer support than most other languages, even for math and other basic operations. It is simply not made for that purpose.

I really don't care about that when writing Bash scripts, they're supposed to solve a problem, so anything that works is fine. However, you could argue argue that using pipes is very similar to functional programming, which is considered 'modern' these days.

I care about every single line of code that is going to survive, since it will have to be understood and maintained.
Especially workflows and utility scripts are particularly delicate, since they are prone to be updated whenever something external to them changes (so the usual practice "it works, don't touch it" does not apply very often).

E.g. the following:

wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/ATLASWPT11-Wplus_tot.appl'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/E906nlo_bin_00.pineappl.lz4'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/E906nlo_bin_00.tar'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/FK_ATLASTTBARTOT13TEV.dat'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/FK_POSXDQ.dat'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/LHCBWZMU7TEV_PI_part1.appl'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/LHCB_DY_8TEV.pineappl.lz4'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/LHCB_DY_8TEV.tar'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/LHCB_WP_7TEV.pineappl.lz4'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/LHCB_WP_7TEV.tar'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/NJetEvents_0-0-2.tab.gz'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/NUTEV_CC_NU_FE_SIGMARED.pineappl.lz4'
wget --no-verbose --no-clobber -P test-data 'https://data.nnpdf.science/pineappl/test-data/NUTEV_CC_NU_FE_SIGMARED.tar'
wget --no-verbose --no-clobber -P test-data 'https://ploughshare.web.cern.ch/ploughshare/db/applfast/applfast-atlas-dijets-fnlo-arxiv-1312.3524/grids/applfast-atlas-dijets-fnlo-arxiv-1312.3524-xsec000.tab.gz'
wget --no-verbose --no-clobber -P test-data 'https://ploughshare.web.cern.ch/ploughshare/db/applfast/applfast-h1-dijets-appl-arxiv-0010054/grids/applfast-h1-dijets-appl-arxiv-0010054-xsec000.appl'
wget --no-verbose --no-clobber -P test-data 'https://ploughshare.web.cern.ch/ploughshare/db/applfast/applfast-h1-incjets-fnlo-arxiv-0706.3722/grids/applfast-h1-incjets-fnlo-arxiv-0706.3722-xsec000.tab.gz'
wget --no-verbose --no-clobber -P test-data 'https://ploughshare.web.cern.ch/ploughshare/db/atlas/atlas-atlas-wpm-arxiv-1109.5141/grids/atlas-atlas-wpm-arxiv-1109.5141-xsec001.appl'

it is clearly much better as a for loop, and this one:
--ignore-filename-regex='/.cargo/registry' \
--ignore-filename-regex='rustc' \
--ignore-filename-regex='pineappl/tests' \
--ignore-filename-regex='pineappl_capi' \
--ignore-filename-regex='pineappl_cli/tests' \

would be much clearer splitting the list of folders on their own, and giving them a telling name.

Pipes are not necessarily functional-like, but they usually are, and there is a good reason why people now care about functional: especially in UI there were a lot of people happily manipulating things in a single global scope (that is exactly what JS was about), but then it was a mess for debugging, since the state was incredibly hard to reproduce for complex applications.
It doesn't have to be always functional, and strict purism has always its drawbacks (in this sense I'm a great fan of https://peps.python.org/pep-0020/, in this case the "Although practicality beats purity." - still Python, sorry), but for sure functional has some advantages (especially in long-living programs, not trivial when manipulating large amount of data).

Bash is one further language, that's true, but so is any other language that would replace it.

True that PineAPPL is mainly Rust, and I'd not script in Rust...
My point was more about people collaborating with this project (i.e. NNPDF), who decided to be Python-centric. Moreover, this is not an isolated choice, since machine-learning is definitely mainly supported in Python, and the scientific community is migrating there (even if C/C++ and Fortran are definitely still relevant, and Julia is also becoming more common).

including Bash which is used inside the CI.

This can be limited arbitrarily, invoking external scripts (that is usually what you do in Bash, and you do want a simple CI workflow, not the ideal place to define complex tasks).

which all perform file-heavy tasks.

Bash is good at manipulating files, but it is not the only good one. Python has been used for system scripting since long, but definitely it also keeps improving.

@alecandido
Copy link
Member

However, getting more to the point:

Python pros:

  • better structure, simplify code
  • more familiar to other collaborators

All the sed and similar can be done in Python as well, just with standard library.

Bash pros:

  • cargo and git native interface

I'd add one more candidate:

JS (Node) pros:

The JS proposal is only interesting if you are positive about automating the task and delegate to GitHub (and we solve the problem about the not uniform environment and very old computers). If you prefer to do it manually, better to avoid (but I doubt, since the script is even automatically pushing and releasing).

@cschwan
Copy link
Contributor Author

cschwan commented Apr 11, 2023

The publish commands have been move into the newly-added workflow crates.yml in commit fa0fc65. This still needs to be tested.

@cschwan
Copy link
Contributor Author

cschwan commented Apr 19, 2023

I've fixed some bugs in commits 00b247f and 51f84ce, but this still isn't enough, because the container needs to have APPLgrid and fastNLO installed.

@cschwan
Copy link
Contributor Author

cschwan commented Apr 25, 2023

The container has been changed in commit 8cf40df.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants