Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an error with reprojected diffs in non-text output #1035

Merged
merged 1 commit into from
Jan 9, 2025

Conversation

craigds
Copy link
Member

@craigds craigds commented Jan 9, 2025

Description

Non-text diff writers need to lookup CRS when doing reprojection, and they do so by checking whether the CRS has changed in the diff.

When --no-sort-keys is given, the meta deltas DeltaDiff is treated as lazy. Then the diff writer will fail to find the CRS in the diff.

This change fixes the issue by making the DeltaDiff.iter_items() method only invalidate the diff if the diff was actually lazily populated, and adds a regression test

no changelog needed since this is a new bug introduced in #1032

Related links:

Checklist:

  • Have you reviewed your own change?
  • Have you included test(s)?
  • Have you updated the changelog?

Non-text diff writers need to lookup CRS when doing reprojection,
and they do so by checking whether the CRS has changed in the diff.

When --no-sort-keys is given, it's possible for the meta deltas to be
lazy, and if so then the diff writer will fail to find the CRS in the
diff.

This change fixes the issue by making meta deltas always non-lazy, and
adds a regression test
@craigds craigds requested a review from olsen232 January 9, 2025 03:48
Comment on lines +525 to +531
# Invalidate this DeltaDiff; it's not safe to consume it again after this.
# `data` is the underlying contents of UserDict, which we inherit from.
# So overriding it to a non-dict will cause all dict methods to raise exceptions.
# > TypeError: argument of type 'InvalidatedDeltaDiff' is not iterable
self.data = InvalidatedDeltaDiff(
"DeltaDiff can't be used after iter_items() has been called"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So previously .iter_items() could only ever be called once, even if the DeltaDiff was initialised with a static dict rather than an iterator. Now in that case we can call .iter_items() repeatedly.

I can conceptually see how that might fix the bug (it'd be good to explain what the various calls to DeltaDiff are), but seems like we're making the API unintuitive? "Sometimes you can call .iter_items() repeatedly, other times it'll throw an InvalidatedDeltaDiff error".

I guess #1034 will clean this up, since you'll end up with a different class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, looked into this further. .keys() is called during figuring out the crs transform.

kart/diff.py:249: in diff
    diff_writer.write_diff(diff_format=diff_format)
kart/base_diff_writer.py:340: in write_diff
    self.has_changes |= self.write_ds_diff_for_path(
kart/base_diff_writer.py:355: in write_ds_diff_for_path
    self.write_ds_diff(ds_path, ds_diff, diff_format=diff_format)
kart/json_diff_writers.py:336: in write_ds_diff
    self.write_filtered_dataset_deltas(ds_path, ds_diff)
kart/json_diff_writers.py:361: in write_filtered_dataset_deltas
    old_transform, new_transform = self.get_geometry_transforms(ds_path, ds_diff)
kart/base_diff_writer.py:667: in get_geometry_transforms
    old_crs, new_crs = self.get_old_and_new_crs(
kart/base_diff_writer.py:569: in get_old_and_new_crs
    return self._get_old_and_new_table_crs(ds_path, ds_diff, context=context)
kart/base_diff_writer.py:584: in _get_old_and_new_table_crs
    for k, v in meta_diff.items()

Is this on a meta datasetdiff, or on the full delta?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep- the issue shows up when loading a CRS from the meta diff while consuming the feature diff – where the meta diff has already been consumed/output.

Both meta and feature are DeltaDiffs, but only feature is lazy-loaded from a generator. So that's why this indentation change fixes the bug – it allows the meta diff to be consumed more than once because it's not lazy-loaded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think the API is unintuitive? The contract is, if you make a DeltaDiff from an iterator, you can't use it more than once. If you make it from a tuple/dict you can use it as many times as you like.

You are right that #1034 avoids any ambiguity; getting that working would be an improvement on this.

Copy link
Member

@rcoup rcoup Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, ok.

I also noticed via pytest that .__repr__() calls .__str__() (c/- RichDict) which then calls .keys() and consumes the lazy iterator too, we should fix that for lazy diffs.

#1034 will make this all so much cleaner... 😉

Copy link
Member

@rcoup rcoup Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think the API is unintuitive? The contract is, if you make a DeltaDiff from an iterator, you can't use it more than once. If you make it from a tuple/dict you can use it as many times as you like.

The concept is fine. What's unintuitive is that it's unclear what consumes it and when...

cf:

  • I have a LazyDeltaDiff, it only has .iteritems() and .add() and .resolve(). After I've called .iteritems() once I can't use the object further.
  • I have a DeltaDiff, I can call all the dict methods or anything else on it as much as I want.

(made up class names)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get this merged then proceed with #1034; perhaps I can make that one work

@craigds craigds requested a review from rcoup January 9, 2025 22:08
@craigds craigds merged commit 6d1b2d8 into master Jan 9, 2025
35 checks passed
@craigds craigds deleted the fix-reprojected-lazy-diff-errors branch January 9, 2025 22:38
@rcoup rcoup mentioned this pull request Jan 9, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants