-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add variables that can replace compset regex matches in config_component and similar files #3816
Comments
I think this makes a ton of sense and I can see where often it's the way we should have been doing things all along. Separating out the "options" for each component is something that we've already setup because we use the "%" character to delineate the options. So this is a straightforward change that would make so many things much clearer. I thought I had an exception to @billsacks case for things where there's a dependence on other components. But, it looks like that case works as well. An example of this is LND_TUNING_MODE where you currently have this...
In @billsacks formula this would become...
There's also this example of DATM_PRESAERO where a multiple component match is done..
which would become...
It could also be...
taking advantage of the component type category. That might often be the better and more clearer thing to do. So I think this covers everything we'd want it to do, and it makes it clearer. The xxx_opts would still often have to be a reg-ex because of multiple possible options. So the rules about exact match verses a regular expression would need to be clear everywhere. There is also this case for CLM_CO2_TYPE that I suppose would still use compset...
I suppose it would become...
The things you'd still need the full compset for are for the starting period and any possible ending on the compset. If these become variables they could be queried as well in the same way. |
A word of caution - every new attribute in the xml requires new code and logic in the python. This change simply shifts the complicated logic from the xml file in the form of regex to the python file in the form of multiple attributes. In the opening message @billsacks states that the regex is error prone, but the original issue has been out there for 5 years and I don't think that there have been that many problems with it. |
The difference is that, rather than having hundreds of places where we do regex matches on the compset for this kind of logic, we would do it in one place, covered by tests.
It's true that we haven't had that many KNOWN problems. But we have a lot of regexes on compset that aren't quite right. And as we support more and more components, with more and more options, it's inevitable that we're going to have situations where current matches work incorrectly, due to being too general or too specific. I would turn this question around and ask: How confident are we that all of our regexes on compset are working right all of the time? My confidence is low on that point. |
Probably a prerequisite for #4125 given that the plan for that may involve allowing order independence in the compset long name. |
@jedwards4b was wondering in a discussion today whether it's important to split out each component's piece into individual pieces like lnd_name, lnd_version and lnd_opts, as opposed to just having an lnd string that contains all of it. It's true that the further separation isn't important if the only goal is to allow order independence of components in the compset long name. However, part of the goal of this issue (in my mind) is also to allow us to replace fragile regexes with something more fool-proof - for example, avoiding issues like the one I needed to fix in ESCOMP/CESM_CICE5@ac908ac . See also the example in my initial comment in this issue for motivation. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
I still think this is worth doing at some point. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This idea has been a constant refrain for me over the last few years, but I have brought it up in the context of other issues – see especially #119 and #1522. It's finally time that it gets its own issue.
I find the current system for matching various component options via regex matches on the compset long name in config_component.xml and similar files to be fragile and error-prone. We've had a number of bugs over the years due to too-specific or too-general regex matches in some component's xml file, and even though I consider myself decent with regexes, I still get nervous any time I need to write a new match in config_component.xml. I would like cime to do some initial parsing of the compset long name, creating a set of variables that could then be used for matches in config_component.xml, etc., without resorting to regex matching the whole compset long name. This way, the regex parsing would be done in a single place, which could be made robust and well-tested. For each component (I'm using lnd as an example), we'd have variables like:
lnd_name: Any number of alphabetic characters - e.g., "CLM"
lnd_version: Any number of numeric characters following the lnd_name - e.g., "50" (may be empty for some components)
lnd_opts: Any number of options following the '%'. This is a key one. It would allow us to replace xml that currently looks like this:
with simply
For some more examples and thinking around this issue, see #1522 (comment) and #1522 (comment) and some follow-on comments in that issue.
The text was updated successfully, but these errors were encountered: