Fix an error with reprojected diffs in non-text output #1035

craigds · 2025-01-09T03:48:51Z

Description

Non-text diff writers need to lookup CRS when doing reprojection, and they do so by checking whether the CRS has changed in the diff.

When --no-sort-keys is given, the meta deltas DeltaDiff is treated as lazy. Then the diff writer will fail to find the CRS in the diff.

This change fixes the issue by making the DeltaDiff.iter_items() method only invalidate the diff if the diff was actually lazily populated, and adds a regression test

no changelog needed since this is a new bug introduced in #1032

Checklist:

Have you reviewed your own change?
Have you included test(s)?
Have you updated the changelog?

Non-text diff writers need to lookup CRS when doing reprojection, and they do so by checking whether the CRS has changed in the diff. When --no-sort-keys is given, it's possible for the meta deltas to be lazy, and if so then the diff writer will fail to find the CRS in the diff. This change fixes the issue by making meta deltas always non-lazy, and adds a regression test

rcoup · 2025-01-09T11:37:13Z

kart/diff_structs.py

+            # Invalidate this DeltaDiff; it's not safe to consume it again after this.
+            # `data` is the underlying contents of UserDict, which we inherit from.
+            # So overriding it to a non-dict will cause all dict methods to raise exceptions.
+            #    > TypeError: argument of type 'InvalidatedDeltaDiff' is not iterable
+            self.data = InvalidatedDeltaDiff(
+                "DeltaDiff can't be used after iter_items() has been called"
+            )


So previously .iter_items() could only ever be called once, even if the DeltaDiff was initialised with a static dict rather than an iterator. Now in that case we can call .iter_items() repeatedly.

I can conceptually see how that might fix the bug (it'd be good to explain what the various calls to DeltaDiff are), but seems like we're making the API unintuitive? "Sometimes you can call .iter_items() repeatedly, other times it'll throw an InvalidatedDeltaDiff error".

I guess #1034 will clean this up, since you'll end up with a different class.

mmm, looked into this further. .keys() is called during figuring out the crs transform.

kart/diff.py:249: in diff diff_writer.write_diff(diff_format=diff_format) kart/base_diff_writer.py:340: in write_diff self.has_changes |= self.write_ds_diff_for_path( kart/base_diff_writer.py:355: in write_ds_diff_for_path self.write_ds_diff(ds_path, ds_diff, diff_format=diff_format) kart/json_diff_writers.py:336: in write_ds_diff self.write_filtered_dataset_deltas(ds_path, ds_diff) kart/json_diff_writers.py:361: in write_filtered_dataset_deltas old_transform, new_transform = self.get_geometry_transforms(ds_path, ds_diff) kart/base_diff_writer.py:667: in get_geometry_transforms old_crs, new_crs = self.get_old_and_new_crs( kart/base_diff_writer.py:569: in get_old_and_new_crs return self._get_old_and_new_table_crs(ds_path, ds_diff, context=context) kart/base_diff_writer.py:584: in _get_old_and_new_table_crs for k, v in meta_diff.items()

Is this on a meta datasetdiff, or on the full delta?

Yep- the issue shows up when loading a CRS from the meta diff while consuming the feature diff – where the meta diff has already been consumed/output.

Both meta and feature are DeltaDiffs, but only feature is lazy-loaded from a generator. So that's why this indentation change fixes the bug – it allows the meta diff to be consumed more than once because it's not lazy-loaded.

I don't really think the API is unintuitive? The contract is, if you make a DeltaDiff from an iterator, you can't use it more than once. If you make it from a tuple/dict you can use it as many times as you like.

You are right that #1034 avoids any ambiguity; getting that working would be an improvement on this.

yeah, ok.

I also noticed via pytest that .__repr__() calls .__str__() (c/- RichDict) which then calls .keys() and consumes the lazy iterator too, we should fix that for lazy diffs.

#1034 will make this all so much cleaner... 😉

I don't really think the API is unintuitive? The contract is, if you make a DeltaDiff from an iterator, you can't use it more than once. If you make it from a tuple/dict you can use it as many times as you like.

The concept is fine. What's unintuitive is that it's unclear what consumes it and when...

cf:

I have a LazyDeltaDiff, it only has .iteritems() and .add() and .resolve(). After I've called .iteritems() once I can't use the object further.

I have a DeltaDiff, I can call all the dict methods or anything else on it as much as I want.

(made up class names)

I'll get this merged then proceed with #1034; perhaps I can make that one work

craigds requested a review from olsen232 January 9, 2025 03:48

rcoup reviewed Jan 9, 2025

View reviewed changes

craigds requested a review from rcoup January 9, 2025 22:08

olsen232 approved these changes Jan 9, 2025

View reviewed changes

rcoup approved these changes Jan 9, 2025

View reviewed changes

craigds merged commit 6d1b2d8 into master Jan 9, 2025
35 checks passed

craigds deleted the fix-reprojected-lazy-diff-errors branch January 9, 2025 22:38

rcoup mentioned this pull request Jan 9, 2025

LazyDeltaDiff #1034

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix an error with reprojected diffs in non-text output #1035

Fix an error with reprojected diffs in non-text output #1035

craigds commented Jan 9, 2025 •

edited

Loading

rcoup Jan 9, 2025

rcoup Jan 9, 2025

craigds Jan 9, 2025

craigds Jan 9, 2025

rcoup Jan 9, 2025 •

edited

Loading

rcoup Jan 9, 2025 •

edited

Loading

craigds Jan 9, 2025

Fix an error with reprojected diffs in non-text output #1035

Fix an error with reprojected diffs in non-text output #1035

Conversation

craigds commented Jan 9, 2025 • edited Loading

Description

Related links:

Checklist:

rcoup Jan 9, 2025

Choose a reason for hiding this comment

rcoup Jan 9, 2025

Choose a reason for hiding this comment

craigds Jan 9, 2025

Choose a reason for hiding this comment

craigds Jan 9, 2025

Choose a reason for hiding this comment

rcoup Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

rcoup Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

craigds Jan 9, 2025

Choose a reason for hiding this comment

craigds commented Jan 9, 2025 •

edited

Loading

rcoup Jan 9, 2025 •

edited

Loading

rcoup Jan 9, 2025 •

edited

Loading