Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MPAS-Framework] Add a new name_in_output registry attribute to variables #5120

Merged

Conversation

xylar
Copy link
Contributor

@xylar xylar commented Aug 10, 2022

This can be used to set a different name in output streams than the name in code or the unique name required for input.

This merge also adds corresponding outputFieldName and outputConstituentNames to each field. These new attributes will be used to allow different names in output than in the code.

The name_in_output or the outputFieldName attribute must only be used for pure output variables. Since it doesn't change the name read from input files, renaming variables in an output file that also gets used as an input files (such as a restart file or initial condition) would cause unexpected results and should be avoided.

Prior to this merge, the output name must be unique, but this causes trouble in time mean output, particularly for coordinate variables. Coordinate variables are usually variables with the exact same name as their dimension. For example, we plan to add a CF-compliant Time coordinate to supplement xtime. The Time would become timeMonthly_avg_Time in the current version of the time-series-stats monthly average analysis member. But this is no longer a proper (or convenient) coordinate variable. It still needs to be called Time in the monthly output files. That is, it needs to have the same name as another variable in the code on purpose.

Similarly, temperature becomes timeMonthly_avg_activeTracers_temperature in time-series-stats monthly average. But with these changes, this one could be renamed to just temperature in the output, which would be a lot easier to work with. That kind of simplification is not yet planned but could be a nice medium-term project.

[BFB]

xylar added 10 commits August 6, 2022 17:03
These new attributes will be used to allow different names in
ouput than in the code.  Currently, the output name must be
unique, but this causes trouble in time mean output, particularly
for coordinate variables.
This can be used to set a different name in output streams than
the name in code or the unique name required for input.
@xylar
Copy link
Contributor Author

xylar commented Aug 10, 2022

See E3SM-Ocean-Discussion#23 for additional discussion.

@xylar
Copy link
Contributor Author

xylar commented Aug 10, 2022

Follow-up work based on this PR will add Time and Time_bnds coordinate variables to MPAS-Ocean, MPAS-Seaice and MALI output for better CF compliance.

@xylar
Copy link
Contributor Author

xylar commented Aug 13, 2022

@akturner, @cbegeman, and @mark-petersen, you are free to test further if you want but you may also approve based on your review of E3SM-Ocean-Discussion#23

@matthewhoffman, I'm mostly wanting to make sure this does no harm to MALI. If there are tests you would like me to run, let me know.

Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on my testing and comments in E3SM-Ocean-Discussion#23.

Copy link
Contributor

@akturner akturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on testing in E3SM-Ocean-Discussion#23.

Copy link
Contributor

@matthewhoffman matthewhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar , I looked this over and the changes make sense based on the PR description. I don't feel the need for MALI-specific testing beyond the testing already done for Ocean and Sea Ice, which certainly include fields that don't use the new functionality.

I have, however, a couple questions that you may want to answer in the PR description:

  1. Presumably name_in_output only functions as expected for pure output variables. If you tried to read in any such field (either as input or restart), I'm guessing it would be unrecognized. Is there any way to check/avoid this issues? Or at the least, add an explanation of that in the PR description and possibly somewhere in the code if there is an appropriate place (I'm not sure where that would be, maybe where the new field attributes are added in mpas_field_types.inc?).
  2. The PR description says "Prior to this merge, the output name must be unique, but this causes trouble in time mean output, particularly for coordinate variables." I'm confused by this statement. Presumably prior to this merge the output name is just the name in the code write? And can you explain why the old behavior caused trouble in time mean output?
  3. What happens if two different fields are given identical name_in_output values?

@xylar
Copy link
Contributor Author

xylar commented Aug 16, 2022

Thank you @matthewhoffman. I really appreciate your review and your helpful comments.

  1. Presumably name_in_output only functions as expected for pure output variables. If you tried to read in any such field (either as input or restart), I'm guessing it would be unrecognized. Is there any way to check/avoid this issues?

If you have a good way to detect this, I would be happy to add some code. I don't feel familiar enough with the different types of streams to know if there is an easy way to differentiate pure output streams from i/o streams like restart files. I guess I think a check is largely overkill because time-series-stats seems like the only place this functionality is likely to be used, and that is always output-only.

Or at the least, add an explanation of that in the PR description and possibly somewhere in the code if there is an appropriate place (I'm not sure where that would be, maybe where the new field attributes are added in mpas_field_types.inc?).

I will add something to the PR description for sure. I agree, there's probably not a better place in the code than mpas_field_types.inc so I can add a comment there, too.

  1. The PR description says "Prior to this merge, the output name must be unique, but this causes trouble in time mean output, particularly for coordinate variables." I'm confused by this statement. Presumably prior to this merge the output name is just the name in the code write? And can you explain why the old behavior caused trouble in time mean output?

Coordinate variables are usually variables with the exact same name as their dimension. For example, we plan to add a Time coordinate to supplement xtime. The Time becomes timeMonthly_avg_Time in the time-series-stats monthly average analysis member. But this is no longer a proper (or convenient) coordinate variable. It still needs to be called Time in the monthly output files. That is, it needs to have the same name as another variable in the code on purpose.

Similarly, temperature becomes timeMonthly_avg_activeTracers_temperature in time-series-stats monthly average. But with these changes, this one could be renamed to just temperature in the output, which would be a lot easier to work with. That kind of simplification will have to wait for another time but the Time and associated Time_bnds coordinate work is waiting in the wings for this PR to go in.

  1. What happens if two different fields are given identical name_in_output values?

I don't know what would happen. My guess is that the second variable would overwrite the first. But this almost certainly would not be something someone should try to do on purpose. The idea in giving different variables the same output name is that they would be written out in different contexts. Again, I'm not sure it's worth putting too much effort into protecting users from misusing this functionality. But I can if it's important to you.

Comment on lines +21 to +23
! outputFieldName and outputConstituentNames are alternative names for the field or its constituents for use in
! output files. These should only be given names different from fieldName or constituentNames for pure output
! variables such as diagnostic output and never for variables present in initial conditions or restart files.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewhoffman, are you good with this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @xylar . This is helpful. I would prefer to also see more info from the discussion above, specifically a reference to the specific use case this was implemented for (I think a specific example helps communicate the functionality), as well as the warning about possible collisions if multiple variables had the same name_in_output in the same output stream. I agree this situation seems too difficult to try to protect against misuse, but I think it's important to document clearly what constitutes misuse.

One other clarification for me - this new attribute is accessible through Registry, right? But in this PR you are accessing it programmatically, right? I was confused by that initially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a 3 line comment here is already pushing the standard practice. I hesitate to put a lengthy discussion in a file folks are unlikely to look at in detail. I don't want to set the president that this is the "right" place to document things in detail just because we lack proper documentation. If we expand this practice, this file will get out of hand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't directly make use of the new functionality either in code or in the register. It just makes sure neither breaks anything. Future, component specific PRs will actually use the functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If helpful, I might be able to find a little time later this week to have a video chat about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar , you're right that we don't have a good mechanism for documenting this sort of information. I've always thought framework should have been better documented, and I'm not opposed to breaking standard practice going forward. But I'm also ok with this PR serving as the documentation if you prefer to leave things as you have them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matthewhoffman. I fully agree that better framework documentation is needed in general and that this PR isn't a great, easy to find alternative to proper documentation. Maybe we need a future E3SM hackathon on documenting the framework?

I looked over the comment here and your request again. I still feel like a longer description here would just get lost. Until there is a better place for it, I think this PR will have to suffice for documenting the purpose of this new functionality. Hopefully, the follow-up PRs to add Time and Time_bnds coordinates will help to demonstrate the capability by example more clearly.

@xylar xylar requested a review from matthewhoffman August 23, 2022 14:51
@xylar
Copy link
Contributor Author

xylar commented Aug 24, 2022

Thank you @mark-petersen, @cbegeman, @akturner and @matthewhoffman for taking the time to review this.

@jonbob, I believe this is ready for you to integrate when it's a good time.

jonbob added a commit that referenced this pull request Aug 24, 2022
… next (PR #5120)

MPAS-Framework: Add a new name_in_output registry attribute to variables

This change to the MPAS-Framework can be used to set a different name in
output streams than the name in code or the unique name required for
input.

This merge also adds corresponding outputFieldName and
outputConstituentNames to each field. These new attributes will be used
to allow different names in output than in the code.

The name_in_output or the outputFieldName attribute must only be used
for pure output variables. Since it doesn't change the name read from
input files, renaming variables in an output file that also gets used as
an input files (such as a restart file or initial condition) would cause
unexpected results and should be avoided.

[BFB]
@jonbob
Copy link
Contributor

jonbob commented Aug 24, 2022

Test merge passes:

  • SMS_D_Ld3.T62_oQU120.CMPASO-IAF.chrysalis_intel
  • SMS_Ld1.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel.allactive-wcprod
  • ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel

merged to next

@jonbob jonbob merged commit c5f8b37 into E3SM-Project:master Aug 25, 2022
@jonbob
Copy link
Contributor

jonbob commented Aug 25, 2022

merged to master

@xylar xylar deleted the mpas-framework/add-name-in-output-attribute branch August 25, 2022 19:38
jonbob added a commit that referenced this pull request Aug 30, 2022
Add outputFieldNames to sea-ice pointwiseStats

This merge fixes a bug in MPAS-Seaice's pointwiseStats analysis member
that was introduced in #5120. The outputFieldName is now given a value,
whereas it was left with a garbage value after #5120.

Fixes #5157
[BFB]
jonbob added a commit that referenced this pull request Aug 31, 2022
Add outputFieldNames to sea-ice pointwiseStats

This merge fixes a bug in MPAS-Seaice's pointwiseStats analysis member
that was introduced in #5120. The outputFieldName is now given a value,
whereas it was left with a garbage value after #5120.

Fixes #5157
[BFB]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants