-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor process_wildcards and add support for TFM_MIG #166
Conversation
Nice! 🚀 Demos 4 are all at 100% now, except for 4-all, which apparently misses 1 record. Is there a duplicate record in the ground truth that is causing this (since the missing record is not reported and GDX diff sees no difference)? |
"PastYears": model.past_years, | ||
"ModelYears": model.model_years, | ||
}.items(): | ||
for k, v in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious as to why it is better to use a list tuples here instead of a dictionary. Should the dictionary below be converted as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is because iterating through a set results in a non-deterministic order (see also #50 and #67). This makes it very difficult to debug regressions, because I rely on the --verbose
flag and a diff tool to find out which transformation caused the regression. But with nondeterministic behaviour, there are too many changes in the diff..
I'm considering adding a check to CI that runs the tool on a small benchmark ~5-10 times and ensures that all intermediate tables are identical across runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the dictionary below unchanged because the loop only assigns to a dictionary, so the order doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark idea sounds good!
Actually should we rename |
For the
|
Nice catch, yes,
Is the solution to fix the ground truth, or to modify
Regarding If the new row added by the *0.5 udpate does not retain the Unless this is the behaviour of Veda and we really want to emulate it, I prefer the simpler/easier-to-understand behaviour of The trouble with adding 2 rows with 0.5 and 2 to the output is how should the user determine which row overwrites the other? Is it clear that the update table in the file that is specified last in the command line argument results in the last row? |
How about just dropping duplicates when calculating the number of rows used in calculating the match? I guess, there is a similar issue when reporting the number of rows per parameter (generated and GT). |
Well, one would normally use However, once we support Since you are updating the transform, I believe it makes sense to change the behaviour from updating the row in place to adding a new row already (without doing anything about |
Comment @olejandro :
Remember that there are two important order considerations here. TFM_UPD only works when the TFM_UPD table is present in an alphabetically later scenario than the source data to be used for the UPD, and the final value seen by GAMS will be according to the scenario order as defined by the user for the GAMS run. The final value could thus, in fact, be any one of {1, 0.5, 2}. Comment @siddharth-krishna :
The idea of TFM_UPD is not to update (change) the source data. The idea is that the scenarios U(i) where the TFM_UPDs are specified will define parameter values that depend on a (alphabetically preceding) source scenario (even when using Sourcescen the source scenario must be alphabetically preceding). The values in the source scenario should not be changed at all. In other words, the resulting value (e.g. 0.5*1) must be inserted into the U(i) scenarios where the TFM_UPD tables are specified, and not to the scenario(s) where the source data is specified. The final values seen by GAMS will depend on the scenario order, as defined by the user for the GAMS run. I suspect that it wouldn't even be possible to implement an approach using "inplace" updating such that it would work fully consistently with the *.DD files VEDA produces. In my view, the new row added by the *0.5 udpate would thus need to be identified with the same scenario name as the TFM_UPD table. In that way "stacked updates" would work as they work when using DD files that VEDA produces.
The user specifies the order of the scenarios for each GAMS run, and therefore you must have a mechanism for defining that order. I am not sure if the command line argument is a good way for that, if it includes Excel file names (which do not have one-to-one correspondence with scenarios. For example, the Base scenario may consist of many Excel files, and each Subres scenario consists of two files. |
@Antti-L thanks for the details on how Veda handles TFM_UPD. @olejandro I tried to change the code so that TFM_UPD adds rows to the FI_T table instead of inplace updating it. But regression tests fail as additional rows are generated. For example, in Demo 1, ACT_COST output is:
Whereas the ground truth only has rows 4,5, and 7 (header is row 0). Furthermore Gams errors with I guess if we do this we also need to remove duplicate rows for parameters (counting all columns except values)? If so, I can make this change and the removing duplicates in a separate PR. |
If you are not going to write out the DD file for each scenario (like VEDA does), but only a single DD file for all data, then I think yes, you would need to do that. But I think ideally the tool would write out the data by scenario, because it is more transparent, and each scenario may have also some specific GAMS statements (like $ONEPS for zero handling). Is that not planned? If not, how would you handle e.g. $ONEPS / $OFFEPS, which the user may request for individual scenarios? However, of course you could also write all scenarios into a single DD file, but successively (all data for one scenario, then all data for the next etc.). You just need to use also $ONMULTI for that to work (like with separate files). Ahh, and yes, I see that the |
@Antti-L thanks for you feedback. We should open an issue on @siddharth-krishna, ok, I will merge this then. Nice to see that the change in handling |
known_columns = config.known_columns[datatypes.Tag.tfm_ins] | query_columns | ||
if table.tag == datatypes.Tag.tfm_mig: | ||
# Also allow attribute2, year2 etc for TFM_MIG tables | ||
known_columns.update( | ||
(c + "2" for c in config.known_columns[datatypes.Tag.tfm_ins]) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
known_columns = config.known_columns[datatypes.Tag.tfm_ins] | query_columns | |
if table.tag == datatypes.Tag.tfm_mig: | |
# Also allow attribute2, year2 etc for TFM_MIG tables | |
known_columns.update( | |
(c + "2" for c in config.known_columns[datatypes.Tag.tfm_ins]) | |
) | |
known_columns = config.known_columns[datatypes.Tag(table.tag)] | query_columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not be table specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, we could also expand the tags file with info on whether a specific column is a query column and generate a list of query columns from there.
"PastYears": model.past_years, | ||
"ModelYears": model.model_years, | ||
}.items(): | ||
for k, v in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark idea sounds good!
Oh I was going to revert the last commit before merging! But no problem, the next PR should bring the additional rows down again. |
Ups... Sorry, my fault! It fits well here though. 🤷 |
Fixes #83
Hold off on the merge until I cleanup some remaining TODOs. But I thought I'd get your feedback on the main direction of the PR.