-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix an error with reprojected diffs in non-text output #1035
Conversation
Non-text diff writers need to lookup CRS when doing reprojection, and they do so by checking whether the CRS has changed in the diff. When --no-sort-keys is given, it's possible for the meta deltas to be lazy, and if so then the diff writer will fail to find the CRS in the diff. This change fixes the issue by making meta deltas always non-lazy, and adds a regression test
# Invalidate this DeltaDiff; it's not safe to consume it again after this. | ||
# `data` is the underlying contents of UserDict, which we inherit from. | ||
# So overriding it to a non-dict will cause all dict methods to raise exceptions. | ||
# > TypeError: argument of type 'InvalidatedDeltaDiff' is not iterable | ||
self.data = InvalidatedDeltaDiff( | ||
"DeltaDiff can't be used after iter_items() has been called" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So previously .iter_items()
could only ever be called once, even if the DeltaDiff was initialised with a static dict
rather than an iterator. Now in that case we can call .iter_items()
repeatedly.
I can conceptually see how that might fix the bug (it'd be good to explain what the various calls to DeltaDiff are), but seems like we're making the API unintuitive? "Sometimes you can call .iter_items()
repeatedly, other times it'll throw an InvalidatedDeltaDiff
error".
I guess #1034 will clean this up, since you'll end up with a different class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm, looked into this further. .keys()
is called during figuring out the crs transform.
kart/diff.py:249: in diff
diff_writer.write_diff(diff_format=diff_format)
kart/base_diff_writer.py:340: in write_diff
self.has_changes |= self.write_ds_diff_for_path(
kart/base_diff_writer.py:355: in write_ds_diff_for_path
self.write_ds_diff(ds_path, ds_diff, diff_format=diff_format)
kart/json_diff_writers.py:336: in write_ds_diff
self.write_filtered_dataset_deltas(ds_path, ds_diff)
kart/json_diff_writers.py:361: in write_filtered_dataset_deltas
old_transform, new_transform = self.get_geometry_transforms(ds_path, ds_diff)
kart/base_diff_writer.py:667: in get_geometry_transforms
old_crs, new_crs = self.get_old_and_new_crs(
kart/base_diff_writer.py:569: in get_old_and_new_crs
return self._get_old_and_new_table_crs(ds_path, ds_diff, context=context)
kart/base_diff_writer.py:584: in _get_old_and_new_table_crs
for k, v in meta_diff.items()
Is this on a meta datasetdiff, or on the full delta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep- the issue shows up when loading a CRS from the meta
diff while consuming the feature
diff – where the meta
diff has already been consumed/output.
Both meta
and feature
are DeltaDiffs, but only feature
is lazy-loaded from a generator. So that's why this indentation change fixes the bug – it allows the meta
diff to be consumed more than once because it's not lazy-loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think the API is unintuitive? The contract is, if you make a DeltaDiff from an iterator, you can't use it more than once. If you make it from a tuple/dict you can use it as many times as you like.
You are right that #1034 avoids any ambiguity; getting that working would be an improvement on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, ok.
I also noticed via pytest that .__repr__()
calls .__str__()
(c/- RichDict
) which then calls .keys()
and consumes the lazy iterator too, we should fix that for lazy diffs.
#1034 will make this all so much cleaner... 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think the API is unintuitive? The contract is, if you make a DeltaDiff from an iterator, you can't use it more than once. If you make it from a tuple/dict you can use it as many times as you like.
The concept is fine. What's unintuitive is that it's unclear what consumes it and when...
cf:
- I have a
LazyDeltaDiff
, it only has.iteritems()
and.add()
and.resolve()
. After I've called.iteritems()
once I can't use the object further. - I have a
DeltaDiff
, I can call all the dict methods or anything else on it as much as I want.
(made up class names)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll get this merged then proceed with #1034; perhaps I can make that one work
Description
Non-text diff writers need to lookup CRS when doing reprojection, and they do so by checking whether the CRS has changed in the diff.
When --no-sort-keys is given, the meta deltas DeltaDiff is treated as lazy. Then the diff writer will fail to find the CRS in the diff.
This change fixes the issue by making the DeltaDiff.iter_items() method only invalidate the diff if the diff was actually lazily populated, and adds a regression test
no changelog needed since this is a new bug introduced in #1032
Related links:
Checklist: