Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For NERSC machines (including pm-cpu, pm-gpu) remove exclusive slurm batch directive #6402

Merged

Conversation

ndkeen
Copy link
Contributor

@ndkeen ndkeen commented May 7, 2024

Remove the exclusive directive from the batch files for all NERSC machines.
The reason we want to do this:
a) no longer needed (in fact default)
b) it allows a user to use --qos shared when desired (and there is no flag to turn off exclusive)

Note that using -q shared or --qos shared allows a job to share resources with other jobs on the node.
The advantage is that jobs using shared are not charged as much as a full node.

There will be no performance impact.

[bfb]

@ndkeen ndkeen added Machine Files BFB PR leaves answers BFB pm-gpu Perlmutter machine at NERSC (GPU nodes) pm-cpu Perlmutter at NERSC (CPU-only nodes) labels May 7, 2024
@ndkeen ndkeen self-assigned this May 7, 2024
@ndkeen ndkeen requested a review from rljacob May 7, 2024 23:38
Copy link

github-actions bot commented May 7, 2024

PR Preview Action v1.4.7
🚀 Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-6402/
on branch gh-pages at 2024-05-07 23:54 UTC

ndkeen added a commit that referenced this pull request May 9, 2024
…into next (PR #6402)

Remove the exclusive directive from the batch files for all NERSC machines.
The reason we want to do this:
a) no longer needed (in fact default)
b) it allows a user to use --qos shared when desired (and there is no flag to turn off exclusive)

Note that using -q shared or --qos shared allows a job to share resources with other jobs on the node.
The advantage is that jobs using shared are not charged as much as a full node.

There will be no performance impact.

[bfb]
@ndkeen
Copy link
Contributor Author

ndkeen commented May 9, 2024

merged to next

@ndkeen
Copy link
Contributor Author

ndkeen commented May 11, 2024

Do not expect anything to be different with this change, but still wanted to see tests look ok. Merging to master

@ndkeen ndkeen merged commit 58238ce into master May 11, 2024
21 checks passed
@ndkeen ndkeen deleted the ndk/machinefiles/pm-remove-exclusive-sbatch-directive branch May 11, 2024 03:20
ndkeen added a commit that referenced this pull request May 20, 2024
After #6402 removed exclusive directive, which allows jobs to use shared qos, I realized we still needed the option to use shared qos.

This PR should allow:

./xmlchange JOB_QUEUE=shared
I also adjusted the total number of nodes available for each machine.

[bfb]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB Machine Files pm-cpu Perlmutter at NERSC (CPU-only nodes) pm-gpu Perlmutter machine at NERSC (GPU nodes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants