Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a problem on Gaea and Jenkins, which is why this
activate->deactivate->activate
route is used specifically for Gaea. Please take a look at this discussion:#404 (comment)
Maybe something has changed since moving to Lua, but pretty sure it was working with Jenkins as it was a couple of days ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Daniel. The deactivate worked for me when I tried it without first activating an environment, but I am looking into this further now. I think that we may not need the deactivate at all, actually. The problem I was trying to fix on my end was because I had an old conda init stanza in my .bash_profile that was confusing the new conda. I will see if something similar is messing up the Jenkins job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mark-a-potts Maybe there is a way to make gaea consistent with other systems (one conda activate) once EPIC installs are used in PR #419 ? I also don't like the hack either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, the Jenkins CI pipeline for Gaea successfully ran through to completion for this PR. If it was possible to remove the
conda deactivate
entirely, that would be great, but it might need to wait until after the updated miniconda PRs come in.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaelLueken Indeed. I was under the impression that the
conda deactivate
will be a no-op beforeconda activate $SRW
is executed, but it looks like it does deactivate the conda from the parent shell. I guess this is one less command for gaea but would be desirable to get rid of theconda deactivate
as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested a little more this afternoon, and I can't quite figure out what is going on. When I load wflow_gaea by hand, the python path is right and conda is known to be a function, but when I run a step with rocoto, the load_modules_and_run.sh script does not know that conda is a function and so the conda activate line does not work without a conda deactivate first.
You are probably right @MichaelLueken, that we should test again once the miniconda PRs come through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some more testing today and think I have found the problem, though I am not entirely sure why it is a problem. It seems that the extra module load for miniconda3 that happens in the modulefiles/tasks/gaea/*.lua files adds an extra level of conda stuff that messes things up. I removed all but the vx modulefile under gaea and everything seems to be working correctly without a need to deactivate the conda environment. I don't quite understand why the extra conda modules seem to be okay on other platforms, but not gaea, but I think this is a bit better of a solution than the deactivate workaround. I will re-open this and push the latest updates I have made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is happening on all systems now after transition to EPIC modules with @natalie-perlin 's PR?
#431 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it is currently not activating conda in
load_modules_run_task.sh
becauseSRW_ENV
is not set for it. So gaea is still an outlier compared to other Tier-1 systems. I wish there was a better solution that switches the python paths to*/envs/regional_workflow/bin
instead of*/bin
which is the root cause of the problem.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it appears as though @natalie-perlin's PR #431 is having the same issue for all machines, merging this PR would likely break what Natalie has done. Would it be best to open a PR in Natalie's fork to include these changes in PR #431 and then ask her to make similar modifications to the local modulefiles for machines that aren't Gaea (since Mark has already done this for Gaea)?