Add variables that can replace compset regex matches in config_component and similar files #3816

billsacks · 2020-12-24T14:04:25Z

This idea has been a constant refrain for me over the last few years, but I have brought it up in the context of other issues – see especially #119 and #1522. It's finally time that it gets its own issue.

I find the current system for matching various component options via regex matches on the compset long name in config_component.xml and similar files to be fragile and error-prone. We've had a number of bugs over the years due to too-specific or too-general regex matches in some component's xml file, and even though I consider myself decent with regexes, I still get nervous any time I need to write a new match in config_component.xml. I would like cime to do some initial parsing of the compset long name, creating a set of variables that could then be used for matches in config_component.xml, etc., without resorting to regex matching the whole compset long name. This way, the regex parsing would be done in a single place, which could be made robust and well-tested. For each component (I'm using lnd as an example), we'd have variables like:

lnd_name: Any number of alphabetic characters - e.g., "CLM"
lnd_version: Any number of numeric characters following the lnd_name - e.g., "50" (may be empty for some components)
lnd_opts: Any number of options following the '%'. This is a key one. It would allow us to replace xml that currently looks like this:

<value compset="_CLM45%[^_]*SP">-bgc sp</value>
<value compset="_CLM45%[^_]*BGC">-bgc bgc</value>
<value compset="_CLM50%[^_]*SP">-bgc sp</value>
<value compset="_CLM50%[^_]*BGC">-bgc bgc</value>

with simply

<value lnd_opts="SP">-bgc sp</value>
<value lnd_opts="BGC">-bgc bgc</value>

lnd_type: One of 'active', 'data', 'stub' or 'coupler_test'. It's important to be able to match on lnd_type='active'. The other 3 are less important, because we're unlikely to rename DLND, SLND, XLND, but I'm including them for consistency.

For some more examples and thinking around this issue, see #1522 (comment) and #1522 (comment) and some follow-on comments in that issue.

ekluzek · 2020-12-24T21:50:53Z

I think this makes a ton of sense and I can see where often it's the way we should have been doing things all along. Separating out the "options" for each component is something that we've already setup because we use the "%" character to delineate the options. So this is a straightforward change that would make so many things much clearer.

I thought I had an exception to @billsacks case for things where there's a dependence on other components. But, it looks like that case works as well.

An example of this is LND_TUNING_MODE where you currently have this...

value compset="DATM%GSWP3.+_CLM45" >clm4_5_GSWP3v1</value>

In @billsacks formula this would become...

value atm_name="DATM" lnd_version="45" atm_opts="GSWP3" >clm4_5_GSWP3v1</value>

There's also this example of DATM_PRESAERO where a multiple component match is done..

<value compset="_DATM.*_DICE.*_DOCN.*_DROF">none</value>

which would become...

<value atm_name="DATM" ice_name="DICE" ocn_name="DOCN" rof_name="DROF" >none</value>

It could also be...

<value atm_type="data" ice_type="data" ocn_type="data" rof_type="data" >none</value>

taking advantage of the component type category. That might often be the better and more clearer thing to do.

So I think this covers everything we'd want it to do, and it makes it clearer. The xxx_opts would still often have to be a reg-ex because of multiple possible options. So the rules about exact match verses a regular expression would need to be clear everywhere.

There is also this case for CLM_CO2_TYPE that I suppose would still use compset...

  `<value compset="HIST.*_DATM" >diagnostic</value>`

I suppose it would become...

  `<value compset="HIST" atm_name="DATM">diagnostic</value>`

The things you'd still need the full compset for are for the starting period and any possible ending on the compset. If these become variables they could be queried as well in the same way.

jedwards4b · 2020-12-24T22:28:50Z

A word of caution - every new attribute in the xml requires new code and logic in the python. This change simply shifts the complicated logic from the xml file in the form of regex to the python file in the form of multiple attributes. In the opening message @billsacks states that the regex is error prone, but the original issue has been out there for 5 years and I don't think that there have been that many problems with it.

billsacks · 2020-12-24T22:49:31Z

every new attribute in the xml requires new code and logic in the python. This change simply shifts the complicated logic from the xml file in the form of regex to the python file in the form of multiple attributes.

The difference is that, rather than having hundreds of places where we do regex matches on the compset for this kind of logic, we would do it in one place, covered by tests.

I don't think that there have been that many problems with it.

It's true that we haven't had that many KNOWN problems. But we have a lot of regexes on compset that aren't quite right. And as we support more and more components, with more and more options, it's inevitable that we're going to have situations where current matches work incorrectly, due to being too general or too specific. I would turn this question around and ask: How confident are we that all of our regexes on compset are working right all of the time? My confidence is low on that point.

billsacks · 2021-11-12T21:17:30Z

Probably a prerequisite for #4125 given that the plan for that may involve allowing order independence in the compset long name.

billsacks · 2021-11-12T21:33:32Z

@jedwards4b was wondering in a discussion today whether it's important to split out each component's piece into individual pieces like lnd_name, lnd_version and lnd_opts, as opposed to just having an lnd string that contains all of it. It's true that the further separation isn't important if the only goal is to allow order independence of components in the compset long name. However, part of the goal of this issue (in my mind) is also to allow us to replace fragile regexes with something more fool-proof - for example, avoiding issues like the one I needed to fix in ESCOMP/CESM_CICE5@ac908ac . See also the example in my initial comment in this issue for motivation.

github-actions · 2023-05-16T02:16:58Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

billsacks · 2023-05-16T15:17:50Z

I still think this is worth doing at some point.

github-actions · 2023-06-16T02:27:58Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

billsacks added ty: enhancement tp: CIMElib labels Dec 24, 2020

billsacks added this to CESM: infrastructure / cross-component SE priorities Nov 12, 2021

billsacks moved this to Needs Prioritization in CESM: infrastructure / cross-component SE priorities Nov 12, 2021

billsacks moved this from Needs Prioritization to Todo ~ weeks in CESM: infrastructure / cross-component SE priorities Nov 12, 2021

billsacks moved this from Todo ~ weeks to Todo ~ months in CESM: infrastructure / cross-component SE priorities Feb 4, 2022

github-actions bot added the Stale label May 16, 2023

billsacks removed the Stale label May 16, 2023

github-actions bot added the Stale label Jun 16, 2023

billsacks added Low Priority and removed Stale labels Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add variables that can replace compset regex matches in config_component and similar files #3816

Add variables that can replace compset regex matches in config_component and similar files #3816

billsacks commented Dec 24, 2020

ekluzek commented Dec 24, 2020

jedwards4b commented Dec 24, 2020

billsacks commented Dec 24, 2020

billsacks commented Nov 12, 2021 •

edited

Loading

billsacks commented Nov 12, 2021

github-actions bot commented May 16, 2023

billsacks commented May 16, 2023

github-actions bot commented Jun 16, 2023

Add variables that can replace compset regex matches in config_component and similar files #3816

Add variables that can replace compset regex matches in config_component and similar files #3816

Comments

billsacks commented Dec 24, 2020

ekluzek commented Dec 24, 2020

jedwards4b commented Dec 24, 2020

billsacks commented Dec 24, 2020

billsacks commented Nov 12, 2021 • edited Loading

billsacks commented Nov 12, 2021

github-actions bot commented May 16, 2023

billsacks commented May 16, 2023

github-actions bot commented Jun 16, 2023

billsacks commented Nov 12, 2021 •

edited

Loading