Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate use of parallel easyconfig parameter and fix updating the template value #4580

Open
wants to merge 7 commits into
base: 5.0.x
Choose a base branch
from

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Jul 17, 2024

Rebase of #3842 to 5.0x

ec['parallel'] currently doubles as an EC option and as the storage for the calculated parallelism set by the EasyBlock.
This makes it hard to reason about especially as maxparallel has pretty much the same effect. Also changes to ec['parallel'] done by e.g. easyblocks (or the set_parallel method) are not reflected by the template %(parallel)s

Solution: Introduce a property which on write updates the template and some magic to mirror the effect of the now deprecated ec['parallel']

Additional change I'd like to do: Treat parallel = False (legacy) as maxparallel =1 so cfg.parallel is always a number

See #3811 (comment) for the motivation

Requires easybuilders/easybuild-easyconfigs#19375 to avoid deprecation warnings

Compared to #3842 I had to move the deprecation to 5.1 as easyblocks need updating but that requires this PR. So either we keep 5.1 as the target or merge this, update the easyblocks, then remove the deprecation (and fail hard) for 5.0

@Flamefire Flamefire changed the title Deprecate parallel 5 Deprecate use of ec['parallel'] and fix updating the template value - 5.0x Jul 17, 2024
@Flamefire Flamefire force-pushed the deprecate_parallel-5 branch from f714fb3 to 8a76544 Compare July 17, 2024 14:53
@boegel boegel added change EasyBuild-5.0 EasyBuild 5.0 labels Jul 31, 2024
@boegel boegel added this to the 5.0 milestone Jul 31, 2024
@boegel boegel changed the title Deprecate use of ec['parallel'] and fix updating the template value - 5.0x Deprecate use of ec['parallel'] and fix updating the template value Jul 31, 2024
@boegel boegel changed the title Deprecate use of ec['parallel'] and fix updating the template value Deprecate use of ec['parallel'] and fix updating the template value Jul 31, 2024
@Flamefire Flamefire force-pushed the deprecate_parallel-5 branch from 8a76544 to 81a2649 Compare August 8, 2024 07:36
@Flamefire Flamefire force-pushed the deprecate_parallel-5 branch 2 times, most recently from 9e11b40 to 1554bd4 Compare September 17, 2024 10:27
@boegel boegel changed the title Deprecate use of ec['parallel'] and fix updating the template value Deprecate use of parallel easyconfig parameter and fix updating the template value Oct 8, 2024
ec['parallel'] currently doubles as an EC option and as the storage for
the calculated parallelism set by the EasyBlock.
This makes it hard to reason about especially as maxparallel has pretty
much the same effect. Also changes to ec['parallel'] done by e.g.
easyblocks (or the set_parallel method) are not reflected by the
template `%(parallel)s`

Solution: Introduce a property which on write updates the template and
some magic to mirror the effect of the now deprecated ec['parallel']
Migrate from `self.cfg['parallel']` to `self.cfg.parallel`
We need to fix our own easyblocks first but that requires this change.
So allow until 5.1 or remove after updating the easyblocks.
The EC parameter is deprecated so the tests fail.
@Flamefire Flamefire force-pushed the deprecate_parallel-5 branch from 748a503 to 162621f Compare December 4, 2024 08:52
@Flamefire
Copy link
Contributor Author

After this is merged some easyblocks need to be updated accordingly

$ grep -rF "cfg['parallel']" --files-with-matches | wc -l
42

@boegel boegel requested a review from bartoldeman December 11, 2024 15:22
@boegel boegel requested review from Micket and removed request for bartoldeman January 8, 2025 15:51
Micket
Micket previously approved these changes Jan 8, 2025
Copy link
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I would like to test this with updated easyblocks ready all at once, then I would merge it.

@Flamefire
Copy link
Contributor Author

Flamefire commented Jan 9, 2025

I looked into updating the easyblocks. I see one instance of:

        if self.cfg['parallel']:
            install_options.append(
                'NUM_THREADS={!s}'.format(self.cfg['parallel']))
        else:
            install_options.append('NUM_THREADS=1')

And many like

        paracmd = ''
        if self.cfg['parallel']:
            paracmd = "-j %s" % self.cfg['parallel']

So we do have a distinction between parallel and maxparallel.
This is actually a regression now I introduced in easybuilders/easybuild-easyconfigs#19375 as the replacement parallel = False to maxparallel = 1 is not correct. See e.g. https://github.com/easybuilders/easybuild-easyconfigs/pull/19375/files#diff-c36fe78c5cb7d5e567d76df45ad25ede06741233b3d9f5efc8ae8d0d97feaee2L27 where a comment states

custom coconut script does not recognize -j

So some changes would need to be adjusted and others tested.

And I think I got things backward. What I now think was the evolution of those 2 parameters:

  1. Only parallel = True/False using the automatically determined value if True
  2. Adding maxparallel so that for parallel = True it can be limited
  3. Realizing parallel = True; maxparallel = x can be shortened to parallel = x with no change to parallel = False

This is currently still supported by setting maxparallel = False although this does look odd.

How do we want to proceed?

  1. Revert back to parallel = [max. parallelism | False to not specify] and deprecate maxparallel instead using the value to set parallel.
  2. Continue with the current approach and accept maxparallel = False for the rare case where we really do not want to add any parallel flag. I've seen only one comment to this extent the others seem to be used as a synonym for 1

In both cases I'd deprecate setting cfg['parallel'] in favor of cfg.parallel which can update the template value. Or we can update the template value in __setitem__ when parallel is changed.
Additionally I'd set the template value to 1 when parallel gets set to False, currently it would result in either False or 0 when
used which certainly isn't right.

Of course combining parallel and maxparallel into a single parameter is a functional change that could be observed. However only the tensorflow easyblock uses that due to our abuse of changing cfg['parallel'] to the determined parallelism.

My suggestion is hence:

  • do deprecate the parallel EC parameter in favor of maxparallel and document using maxparallel = False even though it looks odd
  • maxparallel = False results in cfg.parallel = False and %(parallel)s = 1
  • Access to cfg['parallel'] is intercepted. Reads go to _parallelLegacy, deprecated, to be removed in favor of cfg.parallel or cfg['maxparallel' ] as it isn't clear what is actually intended here.
  • Writes will also be deprecated and go to cfg.parallel which also updates the template and _parallelLegacy but not cfg['parallel' ] or cfg['maxparallel']
  • We need to update _parallelLegacy in the place where we did before, this is already done through cfg.parallel

With this we should not have any behavior change in almost all cases. One is of course the "fix" of the template.
The only thing missing in this PR is redirecting writes to cfg['parallel'] which for backwards compatibility needs to NOT change self._parallel until it has been set. This is the inconsistent behavior we have right now: Prior to prepare_step cfg['parallel'] acts as maxparallel while after it is an exact value not changed again by the framework.

@Micket
Copy link
Contributor

Micket commented Jan 10, 2025

So the corsika example I think we should just solve some other way; it's just abusing ConfigureMake (it's not a "make" at all), while it really just wants to do it's own custom thing.
I wouldn't bend over backwards in framework or ConfigureMake easyblock just to deal with that odd case, I would rather just have that one use it's own easyblock.

I did a (admittedly quickly) skim the easyblocks during our meeting, and I didn't think it looked to be a problem with the new approach? apart from the trivial cases, e.g.

            install_options.append(f'NUM_THREADS={self.cfg.parallel}')

others, if they truly do need to skip the flag when it's not parallel (I suspect many of them actually just work with -j 1), then they can just use self.cfg.parallel >1 as i see no other reasonable way to interpret 1.

        paracmd = ''
        if self.cfg.parallel > 1:
            paracmd = "-j %s" % self.cfg.parallel

@Flamefire
Copy link
Contributor Author

It does need a definition of how to handle False though and currently I kept parallel = False resulting in self.parallel = 0 which IMO we should not as it is inconsistent.

I'm not so sure about dismissing the corsika example as it is actually close enough to regular configure && make that users might want that. To me it looks like ./configure --options && ./coconut
Although the install step is a copy, not ./coconut install so CmdCp looks like a better fit indeed.

Question would be how many ECs previously using False rely on this. I guess we can just update the easyblocks and framework (locally) and run with the ECs I modified that use False and see what breaks

@Micket
Copy link
Contributor

Micket commented Jan 13, 2025

we should not as it is inconsistent.

Yes, we should probably make sure those cases don't turn into self.parallel == 0


We discussed this in the EB 5.0 sync meeting today, evaluating a few different options, pros and cons and their affect on things that "abuse" ConfigureMake with custom build_cmd's (which are no longer make).
Basically anything setting build_cmd is affected here (fftlib, SBCL, Rust, and more).

The path of least pain was deemed to be that ConfigureMake should only add -j when self.parallel > 1.
make guarantees to run sequentially and we have some easyconfigs that rely on not adding -j (like CORSIKA) and some that rely on adding -j XXX for parallel builds (Rust).

In the future I hope to move some of those easyconfigs over to using the generic easyblock (#4531) so that *Make* can maybe be made a bit cleaner, but for now conditionally adding -j should be a good workaround with no downside (as far as any of us could think of)

@Flamefire
Copy link
Contributor Author

Flamefire commented Jan 14, 2025

I added commits to change False to 1.
I was also thinking about possible breaking changes by this whole approach (besides no more False values) as deriving easyblocks using ec['parallel'] from updated easyblocks using ec.parallel isn't trivial. Most importantly: After the ready-step an easyblock might set ec['parallel'] in a configure step and the parent easyblock might then use ec.parallel to issue the build command. The latter should use the value the former intended. Only way I could imagine is returning the _parallelLegacy value instead of ec._parallel as done after the deprecation phase. Writing to ec.parallel is reflected by both the property and ec['parallel'] so it should also work the other way round.
Not perfect but I guess as good as it gets. I enhanced the comments to describe it in more detail as I thought about using DEPRECATED_EASYCONFIG_PARAMETERS to redirect parallel to maxparallel but that would change the behavior for easyblocks not yet updated.

Edit: I'd keep the behavior as I had implemented it:

  • Writes to ec['parallel'] go to ec['_parallelLegacy'] as do reads
  • set_parallel during the preparation-phase take this and maxparallel into account as upper bounds and sets the property
  • The property returns _parallelLegacy
  • Setting the property updates the template and _parallelLegacy
  • None is converted to 1 when getting or setting the property, this is a breaking change for consistency

This way should be fully backwards compatible except that everything going through the property updates the template value which it did not before.

Logically the new self._parallel backing field is not really used even though it is set: If it isn't set yet (by set_parallel) an error will be shown. If it gets set _parallelLegacy also gets set which will be returned by the property. So the line return self._parallel is currently logically dead code.
However this makes it easy to remove that transitional key after the deprecation phase and add parallel -> max_parallel to the renamed fields list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Breaking changes
Development

Successfully merging this pull request may close these issues.

3 participants